VACUUM PARALLEL option vs. max_parallel_maintenance_workers
If I read the code correctly, the VACUUM PARALLEL option is capped by
the active max_parallel_maintenance_workers setting. So if I write
VACUUM (PARALLEL 5), it will still only do 2 by default. Is that
correct? The documentation (VACUUM man page) seems to indicate a
different behavior.
I haven't been able to set up something to test or verify this either way.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Sep 19, 2020 at 1:58 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
If I read the code correctly, the VACUUM PARALLEL option is capped by
the active max_parallel_maintenance_workers setting. So if I write
VACUUM (PARALLEL 5), it will still only do 2 by default. Is that
correct?
Yeah, but there is another factor also which is the number of indexes
that support parallel vacuum operation.
The documentation (VACUUM man page) seems to indicate a
different behavior.
I think we can change the documentation for parallel option to explain
it better. How about: "Perform index vacuum and index cleanup phases
of VACUUM in parallel using integer background workers (for the
details of each vacuum phase, please refer to Table 27.37). The number
of workers is determined based on the number of indexes on the
relation that support parallel vacuum operation which is limited by
number of workers specified with PARALLEL option if any which is
further limited by max_parallel_maintenance_workers." instead of what
is currently there?
--
With Regards,
Amit Kapila.
On 2020-09-19 11:37, Amit Kapila wrote:
I think we can change the documentation for parallel option to explain
it better. How about: "Perform index vacuum and index cleanup phases
of VACUUM in parallel using integer background workers (for the
details of each vacuum phase, please refer to Table 27.37). The number
of workers is determined based on the number of indexes on the
relation that support parallel vacuum operation which is limited by
number of workers specified with PARALLEL option if any which is
further limited by max_parallel_maintenance_workers." instead of what
is currently there?
I think the implemented behavior is wrong. The VACUUM PARALLEL option
should override the max_parallel_maintenance_worker setting.
Otherwise, what's the point of the command option?
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Sep 19, 2020 at 4:28 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-09-19 11:37, Amit Kapila wrote:
I think we can change the documentation for parallel option to explain
it better. How about: "Perform index vacuum and index cleanup phases
of VACUUM in parallel using integer background workers (for the
details of each vacuum phase, please refer to Table 27.37). The number
of workers is determined based on the number of indexes on the
relation that support parallel vacuum operation which is limited by
number of workers specified with PARALLEL option if any which is
further limited by max_parallel_maintenance_workers." instead of what
is currently there?I think the implemented behavior is wrong.
It is the same as what we do for other parallel operations, for
example, we limit the number of parallel workers for parallel create
index by 'max_parallel_maintenance_workers' and parallel scan
operations are limited by 'max_parallel_workers_per_gather'.
The VACUUM PARALLEL option
should override the max_parallel_maintenance_worker setting.Otherwise, what's the point of the command option?
It is for the cases where the user has a better idea of workload. We
can launch only a limited number of parallel workers
'max_parallel_workers' in the system, so sometimes users would like to
use it as per their requirement. If the user omits this option, then
we internally compute the required number of workers but again those
are limited by max_* guc's.
--
With Regards,
Amit Kapila.
On 2020-09-19 13:24, Amit Kapila wrote:
I think the implemented behavior is wrong.
It is the same as what we do for other parallel operations, for
example, we limit the number of parallel workers for parallel create
index by 'max_parallel_maintenance_workers' and parallel scan
operations are limited by 'max_parallel_workers_per_gather'.
But in those cases we don't provide user-visible options to specify a
per-command setting, so it's not the same thing, is it?
The VACUUM PARALLEL option
should override the max_parallel_maintenance_worker setting.Otherwise, what's the point of the command option?
It is for the cases where the user has a better idea of workload. We
can launch only a limited number of parallel workers
'max_parallel_workers' in the system, so sometimes users would like to
use it as per their requirement.
Right, but my point is, it doesn't actually do that correctly. I can't
just say, oh, I have a maintenance window, I'd like to run a really fast
VACUUM. The PARALLEL option is capped by the setting you'd normally use
anyway, so specifying it is useless.
The only thing it can do right now is if you want to run a manual VACUUM
less parallel than by default. But I don't see how that is often useful.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Sep 20, 2020 at 7:15 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-09-19 13:24, Amit Kapila wrote:
I think the implemented behavior is wrong.
It is the same as what we do for other parallel operations, for
example, we limit the number of parallel workers for parallel create
index by 'max_parallel_maintenance_workers' and parallel scan
operations are limited by 'max_parallel_workers_per_gather'.But in those cases we don't provide user-visible options to specify a
per-command setting, so it's not the same thing, is it?
Not exactly but there also we have a way for the user to set the value
(using 'parallel_workers' during Create Table or Alter Table) which
will guide the parallel scans.
The VACUUM PARALLEL option
should override the max_parallel_maintenance_worker setting.Otherwise, what's the point of the command option?
It is for the cases where the user has a better idea of workload. We
can launch only a limited number of parallel workers
'max_parallel_workers' in the system, so sometimes users would like to
use it as per their requirement.Right, but my point is, it doesn't actually do that correctly. I can't
just say, oh, I have a maintenance window, I'd like to run a really fast
VACUUM. The PARALLEL option is capped by the setting you'd normally use
anyway, so specifying it is useless.
Yeah, because by default we choose the maximum number of possible
workers for Vacuum.
The only thing it can do right now is if you want to run a manual VACUUM
less parallel than by default. But I don't see how that is often useful.
Say when indexes that support parallel scan are not very big then we
don't need the default behavior because it will use more resources
while providing not much additional benefit.
What according to you should be the behavior here and how will it be
better than current?
--
With Regards,
Amit Kapila.
On 2020-09-21 05:48, Amit Kapila wrote:
What according to you should be the behavior here and how will it be
better than current?
I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers
(up to the number of indexes), even if max_parallel_maintenance_workers
is 2.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Sep 21, 2020 at 12:45 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-09-21 05:48, Amit Kapila wrote:
What according to you should be the behavior here and how will it be
better than current?I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers
(up to the number of indexes), even if max_parallel_maintenance_workers
is 2.
So you want it to disregard max_parallel_maintenance_workers but all
parallel operations have to regard one of the max_parallel_* option so
that it can respect max_parallel_workers beyond which the system won't
allow more parallel workers. Now, if we won't respect one of the
max_parallel_* option, it will unnecessarily try to register those
many workers even though it won't be able to start those many workers.
I think it is better to keep the limit for workers for scans and
maintenance operations separately so that the user is allowed to
perform different parallel operations in the system.
--
With Regards,
Amit Kapila.
On Mon, 21 Sep 2020 at 19:15, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-09-21 05:48, Amit Kapila wrote:
What according to you should be the behavior here and how will it be
better than current?I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers
(up to the number of indexes), even if max_parallel_maintenance_workers
is 2.
It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption. I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.
If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.
Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.
David
On Tue, Sep 22, 2020 at 12:50 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Mon, 21 Sep 2020 at 19:15, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:On 2020-09-21 05:48, Amit Kapila wrote:
What according to you should be the behavior here and how will it be
better than current?I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers
(up to the number of indexes), even if max_parallel_maintenance_workers
is 2.It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption. I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.
This is exactly my feeling too. But how about changing documentation a
bit as proposed above [1]/messages/by-id/CAA4eK1LQWXS_4RwLo+WT7jusGnBkUvXO73xQOCsydWLYBpLBEg@mail.gmail.com to make it precise.
[1]: /messages/by-id/CAA4eK1LQWXS_4RwLo+WT7jusGnBkUvXO73xQOCsydWLYBpLBEg@mail.gmail.com
--
With Regards,
Amit Kapila.
On 2020-09-26 07:32, Amit Kapila wrote:
This is exactly my feeling too. But how about changing documentation a
bit as proposed above [1] to make it precise.[1] - /messages/by-id/CAA4eK1LQWXS_4RwLo+WT7jusGnBkUvXO73xQOCsydWLYBpLBEg@mail.gmail.com
Yes, making the documentation more precise would be good. Right now,
it's a bit confusing and unclear (using phrases like "based on").
Someone who wants to the the VACUUM PARALLEL option presumably wants
precise control, so specifying the exact rules would be desirable.
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 29, 2020 at 3:13 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
On 2020-09-26 07:32, Amit Kapila wrote:
This is exactly my feeling too. But how about changing documentation a
bit as proposed above [1] to make it precise.[1] - /messages/by-id/CAA4eK1LQWXS_4RwLo+WT7jusGnBkUvXO73xQOCsydWLYBpLBEg@mail.gmail.com
Yes, making the documentation more precise would be good. Right now,
it's a bit confusing and unclear (using phrases like "based on").
Someone who wants to the the VACUUM PARALLEL option presumably wants
precise control, so specifying the exact rules would be desirable.
I have changed the docs to make this clear. Let me know what you think?
--
With Regards,
Amit Kapila.
Attachments:
0001-Change-the-docs-for-PARALLEL-option-of-Vacuum.patchapplication/octet-stream; name=0001-Change-the-docs-for-PARALLEL-option-of-Vacuum.patchDownload
From 26538dc62de8b8f77990f585e5270e5185d6521e Mon Sep 17 00:00:00 2001
From: Amit Kapila <akapila@postgresql.org>
Date: Wed, 30 Sep 2020 18:41:07 +0530
Subject: [PATCH] Change the docs for PARALLEL option of Vacuum.
The rules to choose the number of parallel workers to perform parallel
vacuum operation were not clearly specified.
Reported-by: Peter Eisentraut
Author: Amit Kapila
Reviewed-by: Peter Eisentraut
Backpatch-through: 13, where it was introduced
Discussion: https://postgr.es/m/36aa8aea-61b7-eb3c-263b-648e0cb117b7@2ndquadrant.com
---
doc/src/sgml/ref/vacuum.sgml | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 26ede69..21ab57d 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -235,22 +235,22 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
Perform index vacuum and index cleanup phases of <command>VACUUM</command>
in parallel using <replaceable class="parameter">integer</replaceable>
background workers (for the details of each vacuum phase, please
- refer to <xref linkend="vacuum-phases"/>). In plain <command>VACUUM</command>
- (without <literal>FULL</literal>), if the <literal>PARALLEL</literal> option
- is omitted, then the number of workers is determined based on the number of
- indexes on the relation that support parallel vacuum operation and is further
- limited by <xref linkend="guc-max-parallel-maintenance-workers"/>. An index
- can participate in parallel vacuum if and only if the size of the index is
- more than <xref linkend="guc-min-parallel-index-scan-size"/>. Please note
- that it is not guaranteed that the number of parallel workers specified in
- <replaceable class="parameter">integer</replaceable> will be used during
- execution. It is possible for a vacuum to run with fewer workers than
- specified, or even with no workers at all. Only one worker can be used per
- index. So parallel workers are launched only when there are at least
- <literal>2</literal> indexes in the table. Workers for vacuum are launched
- before the start of each phase and exit at the end of the phase. These
- behaviors might change in a future release. This option can't be used with
- the <literal>FULL</literal> option.
+ refer to <xref linkend="vacuum-phases"/>). The number of workers used
+ to perform the operation is equal to the number of indexes on the
+ relation that support parallel vacuum which is limited by the number of
+ workers specified with <literal>PARALLEL</literal> option if any which is
+ further limited by <xref linkend="guc-max-parallel-maintenance-workers"/>.
+ An index can participate in parallel vacuum if and only if the size of the
+ index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
+ Please note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will be
+ used during execution. It is possible for a vacuum to run with fewer
+ workers than specified, or even with no workers at all. Only one worker
+ can be used per index. So parallel workers are launched only when there
+ are at least <literal>2</literal> indexes in the table. Workers for
+ vacuum are launched before the start of each phase and exit at the end of
+ the phase. These behaviors might change in a future release. This
+ option can't be used with the <literal>FULL</literal> option.
</para>
</listitem>
</varlistentry>
--
1.8.3.1
On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption. I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.
There is a subtle difference between these two cases. In the case of a
query, there may be multiple table scans involved, all under the same
Gather node. So a limit on the Gather node is to some degree a
separate constraint on the overall query plan from the reloption
applied to a particular table. So there is at least some kind of an
argument that it's sensible to combine those limits somehow. I'm not
sure I believe it, though. The user probably wants exactly the number
of workers they specify, not the GUC value.
However, in the VACUUM case, there's no possibility of distinguishing
between the parallel operation as a whole and the expectations for a
particular table. It's a single operation. It doesn't have a Gather
node that might be subject to one limit and a Seq Scan that can be
subjected to some other limit. So to me the idea that the command line
option doesn't override the GUC is completely strange, especially
because OTHER parameters to VACUUM *do* have precisely the effect of
overriding GUCs or reloptions. In particular, FREEZE overrides
vacuum_freeze_min_age and vacuum_freeze_table_age, and TRUNCATE and
INDEX_CLEANUP override the vacuum_truncate and vacuum_index_cleanup
reloptions.
So I agree with Peter that the implemented behavior is wrong.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption. I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.There is a subtle difference between these two cases. In the case of a
query, there may be multiple table scans involved, all under the same
Gather node. So a limit on the Gather node is to some degree a
separate constraint on the overall query plan from the reloption
applied to a particular table. So there is at least some kind of an
argument that it's sensible to combine those limits somehow. I'm not
sure I believe it, though. The user probably wants exactly the number
of workers they specify, not the GUC value.However, in the VACUUM case, there's no possibility of distinguishing
between the parallel operation as a whole and the expectations for a
particular table. It's a single operation.
But the same is true for the 'Create Index' operation as well where we
follow the same thing. We will use the number of workers as specified
in reloption (parallel_workers) which is then limited by
max_parallel_maintenance_workers.
--
With Regards,
Amit Kapila.
On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption. I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.There is a subtle difference between these two cases. In the case of a
query, there may be multiple table scans involved, all under the same
Gather node. So a limit on the Gather node is to some degree a
separate constraint on the overall query plan from the reloption
applied to a particular table. So there is at least some kind of an
argument that it's sensible to combine those limits somehow. I'm not
sure I believe it, though. The user probably wants exactly the number
of workers they specify, not the GUC value.However, in the VACUUM case, there's no possibility of distinguishing
between the parallel operation as a whole and the expectations for a
particular table. It's a single operation.But the same is true for the 'Create Index' operation as well where we
follow the same thing. We will use the number of workers as specified
in reloption (parallel_workers) which is then limited by
max_parallel_maintenance_workers.
Both opinions have a valid point.
To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well? That is, when PARALLEL option without
an integer is specified or VACUUM command without PARALLEL option, the
parallel degree is the number of indexes that support parallel vacuum
and are bigger than min_parallel_index_scan_size. If the
parallel_workers reloption of the table is set we use it instead. In
both cases, the parallel degree is capped by
max_parallel_maintenance_workers. OTOH when PARALLEL option with an
integer is specified, the parallel degree is the specified integer
value and it's capped by max_parallel_workers and the number of
indexes that support parallel vacuum and are bigger than
min_parallel_index_scan_size.
That way the default behavior and the behavior of PARALLEL option
without an integer is similar to parallel CREATE INDEX. In addition to
it, VACUUM command has an additional way to control the parallel
degree beyond max_parallel_maintenance_workers limit by using the
command option.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 3, 2020 at 7:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
But the same is true for the 'Create Index' operation as well where we
follow the same thing. We will use the number of workers as specified
in reloption (parallel_workers) which is then limited by
max_parallel_maintenance_workers.
Well, that seems pretty weird to me too, but surely we want them both
to be consistent.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Oct 3, 2020 at 9:25 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well?
That seems like a terrible idea to me. I don't see why the number of
workers that some user thinks should be used to perform a scan on the
table as part of the query should be the same as the number of workers
that should be used for a maintenance operation. We get in trouble
every time we try to reuse a setting for an unrelated purpose.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption. I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.There is a subtle difference between these two cases. In the case of a
query, there may be multiple table scans involved, all under the same
Gather node. So a limit on the Gather node is to some degree a
separate constraint on the overall query plan from the reloption
applied to a particular table. So there is at least some kind of an
argument that it's sensible to combine those limits somehow. I'm not
sure I believe it, though. The user probably wants exactly the number
of workers they specify, not the GUC value.However, in the VACUUM case, there's no possibility of distinguishing
between the parallel operation as a whole and the expectations for a
particular table. It's a single operation.But the same is true for the 'Create Index' operation as well where we
follow the same thing. We will use the number of workers as specified
in reloption (parallel_workers) which is then limited by
max_parallel_maintenance_workers.Both opinions have a valid point.
To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well? That is, when PARALLEL option without
an integer is specified or VACUUM command without PARALLEL option, the
parallel degree is the number of indexes that support parallel vacuum
and are bigger than min_parallel_index_scan_size. If the
parallel_workers reloption of the table is set we use it instead. In
both cases, the parallel degree is capped by
max_parallel_maintenance_workers. OTOH when PARALLEL option with an
integer is specified, the parallel degree is the specified integer
value and it's capped by max_parallel_workers and the number of
indexes that support parallel vacuum and are bigger than
min_parallel_index_scan_size.
This seems more difficult to explain and have more variable parts. I
think one of the blogs I recently read about this work [1]https://pganalyze.com/blog/postgres13-better-performance-monitoring-usability?utm_source=PostgresWeeklyPrimary09302020 (see
section:
Parallel VACUUM & Better Support for Append-only Workloads) explains
the currently implemented behavior (related to the workers) nicely and
in simple words. Now unless I or the person who wrote that blog missed
something it appears to me that the current implemented behavior is
understood by others who might not be even directly involved in this
work which to some extent indicates that users will be able to use
currently implemented behavior without difficulty. I think we can keep
the current behavior as it is and wait to see if we see any complaints
from the users trying to use it.
--
With Regards,
Amit Kapila.
At Sat, 3 Oct 2020 22:25:14 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in
On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption. I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.There is a subtle difference between these two cases. In the case of a
query, there may be multiple table scans involved, all under the same
Gather node. So a limit on the Gather node is to some degree a
separate constraint on the overall query plan from the reloption
applied to a particular table. So there is at least some kind of an
argument that it's sensible to combine those limits somehow. I'm not
sure I believe it, though. The user probably wants exactly the number
of workers they specify, not the GUC value.However, in the VACUUM case, there's no possibility of distinguishing
between the parallel operation as a whole and the expectations for a
particular table. It's a single operation.But the same is true for the 'Create Index' operation as well where we
follow the same thing. We will use the number of workers as specified
in reloption (parallel_workers) which is then limited by
max_parallel_maintenance_workers.Both opinions have a valid point.
I think the purpose of the variable is to cap the number of workers
that the system *automatically determines*. It seems reasolable to
ignore the limit as far as it is commanded by a super user. But I
don't think a non-superuser doesn't have such a pvigilege. On the
other hand I'm not sure whether it's the right thing to allow super
users to exhaust the reserved capacity and whether it's worth that
complexity.
To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well? That is, when PARALLEL option without
The varialble is thougt as the number of workers for paralle-scan of
create index. It is totally different characteristcs from that for
parallel vacuum. If we had parallel_maintenance_workers, it'd be
usable for vacuum, but I don't want to add that reloption too much..
an integer is specified or VACUUM command without PARALLEL option, the
parallel degree is the number of indexes that support parallel vacuum
and are bigger than min_parallel_index_scan_size. If the
parallel_workers reloption of the table is set we use it instead. In
both cases, the parallel degree is capped by
max_parallel_maintenance_workers. OTOH when PARALLEL option with an
integer is specified, the parallel degree is the specified integer
value and it's capped by max_parallel_workers and the number of
indexes that support parallel vacuum and are bigger than
min_parallel_index_scan_size.That way the default behavior and the behavior of PARALLEL option
without an integer is similar to parallel CREATE INDEX. In addition to
it, VACUUM command has an additional way to control the parallel
degree beyond max_parallel_maintenance_workers limit by using the
command option.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Mon, 5 Oct 2020 at 11:21, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Oct 3, 2020 at 9:25 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well?That seems like a terrible idea to me. I don't see why the number of
workers that some user thinks should be used to perform a scan on the
table as part of the query should be the same as the number of workers
that should be used for a maintenance operation.
Agreed. But the same is true for parallel REINDEX? It's also a
maintenance operation.
In any case, the thing would get more complex if lazy vacuum or vacuum
full were to support parallel operation on table scan in the future.
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 5, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well? That is, when PARALLEL option without
an integer is specified or VACUUM command without PARALLEL option, the
parallel degree is the number of indexes that support parallel vacuum
and are bigger than min_parallel_index_scan_size. If the
parallel_workers reloption of the table is set we use it instead. In
both cases, the parallel degree is capped by
max_parallel_maintenance_workers. OTOH when PARALLEL option with an
integer is specified, the parallel degree is the specified integer
value and it's capped by max_parallel_workers and the number of
indexes that support parallel vacuum and are bigger than
min_parallel_index_scan_size.This seems more difficult to explain and have more variable parts. I
think one of the blogs I recently read about this work [1] (see
section:
Parallel VACUUM & Better Support for Append-only Workloads) explains
the currently implemented behavior (related to the workers) nicely and
in simple words. Now unless I or the person who wrote that blog missed
something it appears to me that the current implemented behavior is
understood by others who might not be even directly involved in this
work which to some extent indicates that users will be able to use
currently implemented behavior without difficulty. I think we can keep
the current behavior as it is and wait to see if we see any complaints
from the users trying to use it.
I am planning to commit the patch (by early next week) posted above
thread [1]/messages/by-id/CAA4eK1Km5VVmdPpdMNSA414uRFJKVw8r_A7ORpL-0pHnYfGpLw@mail.gmail.com to make the docs consistent with what we have in code. Do
let me know if you think otherwise or if you have better ideas? We can
think of improving the code as a separate patch if we think it is
important.
[1]: /messages/by-id/CAA4eK1Km5VVmdPpdMNSA414uRFJKVw8r_A7ORpL-0pHnYfGpLw@mail.gmail.com
--
With Regards,
Amit Kapila.
On Thu, Oct 15, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Oct 5, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well? That is, when PARALLEL option without
an integer is specified or VACUUM command without PARALLEL option, the
parallel degree is the number of indexes that support parallel vacuum
and are bigger than min_parallel_index_scan_size. If the
parallel_workers reloption of the table is set we use it instead. In
both cases, the parallel degree is capped by
max_parallel_maintenance_workers. OTOH when PARALLEL option with an
integer is specified, the parallel degree is the specified integer
value and it's capped by max_parallel_workers and the number of
indexes that support parallel vacuum and are bigger than
min_parallel_index_scan_size.This seems more difficult to explain and have more variable parts. I
think one of the blogs I recently read about this work [1] (see
section:
Parallel VACUUM & Better Support for Append-only Workloads) explains
the currently implemented behavior (related to the workers) nicely and
in simple words. Now unless I or the person who wrote that blog missed
something it appears to me that the current implemented behavior is
understood by others who might not be even directly involved in this
work which to some extent indicates that users will be able to use
currently implemented behavior without difficulty. I think we can keep
the current behavior as it is and wait to see if we see any complaints
from the users trying to use it.I am planning to commit the patch (by early next week) posted above
thread [1] to make the docs consistent with what we have in code.
Pushed.
--
With Regards,
Amit Kapila.