A reloption for partitioned tables - parallel_workers

Started by Seamus Absherealmost 5 years ago34 messages
#1Seamus Abshere
seamus@abshere.net

hi,

It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com).

Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..f1ade035ac 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
 	 * If the user has set the parallel_workers reloption, use that; otherwise
 	 * select a default number of workers.
 	 */
+	// I want to affect this
 	if (rel->rel_parallel_workers != -1)
 		parallel_workers = rel->rel_parallel_workers;
 	else

so I do this

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..597b209bfb 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum options, bool validate)
 bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
-	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
-	 */
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 									  RELOPT_KIND_PARTITIONED,
-									  0, NULL, 0);
+									  sizeof(StdRdOptions),
+									  tab, lengthof(tab));
 }

That "works":

postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 33);
ALTER TABLE
postgres=# select relname, relkind, reloptions from pg_class where relname = 'test_3pd_cstore_partitioned';
relname | relkind | reloptions
-----------------------------+---------+-----------------------
test_3pd_cstore_partitioned | p | {parallel_workers=33}
(1 row)

But it seems to be ignored:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..c68835ce38 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
 	 * If the user has set the parallel_workers reloption, use that; otherwise
 	 * select a default number of workers.
 	 */
+	// I want to affect this, but this assertion always passes
+	Assert(rel->rel_parallel_workers == -1)
 	if (rel->rel_parallel_workers != -1)
 		parallel_workers = rel->rel_parallel_workers;
 	else

Thanks and please forgive my code pasting etiquette as this is my first post to pgsql-hackers and I'm not quite sure what the right format is.

Thank you,
Seamus

#2Amit Langote
amitlangote09@gmail.com
In reply to: Seamus Abshere (#1)
Re: A reloption for partitioned tables - parallel_workers

Hi Seamus,

On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <seamus@abshere.net> wrote:

It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com).

Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..f1ade035ac 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
* If the user has set the parallel_workers reloption, use that; otherwise
* select a default number of workers.
*/
+       // I want to affect this
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
else

so I do this

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..597b209bfb 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum options, bool validate)
bytea *
partitioned_table_reloptions(Datum reloptions, bool validate)
{
-       /*
-        * There are no options for partitioned tables yet, but this is able to do
-        * some validation.
-        */
+       static const relopt_parse_elt tab[] = {
+               {"parallel_workers", RELOPT_TYPE_INT,
+               offsetof(StdRdOptions, parallel_workers)},
+       };
+
return (bytea *) build_reloptions(reloptions, validate,
RELOPT_KIND_PARTITIONED,
-                                                                         0, NULL, 0);
+                                                                         sizeof(StdRdOptions),
+                                                                         tab, lengthof(tab));
}

That "works":

postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 33);
ALTER TABLE
postgres=# select relname, relkind, reloptions from pg_class where relname = 'test_3pd_cstore_partitioned';
relname | relkind | reloptions
-----------------------------+---------+-----------------------
test_3pd_cstore_partitioned | p | {parallel_workers=33}
(1 row)

But it seems to be ignored:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..c68835ce38 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
* If the user has set the parallel_workers reloption, use that; otherwise
* select a default number of workers.
*/
+       // I want to affect this, but this assertion always passes
+       Assert(rel->rel_parallel_workers == -1)
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
else

You may see by inspecting the callers of compute_parallel_worker()
that it never gets called on a partitioned table, only its leaf
partitions. Maybe you could try calling compute_parallel_worker()
somewhere in add_paths_to_append_rel(), which has this code to figure
out parallel_workers to use for a parallel Append path for a given
partitioned table:

/* Find the highest number of workers requested for any subpath. */
foreach(lc, partial_subpaths)
{
Path *path = lfirst(lc);

parallel_workers = Max(parallel_workers, path->parallel_workers);
}
Assert(parallel_workers > 0);

/*
* If the use of parallel append is permitted, always request at least
* log2(# of children) workers. We assume it can be useful to have
* extra workers in this case because they will be spread out across
* the children. The precise formula is just a guess, but we don't
* want to end up with a radically different answer for a table with N
* partitions vs. an unpartitioned table with the same data, so the
* use of some kind of log-scaling here seems to make some sense.
*/
if (enable_parallel_append)
{
parallel_workers = Max(parallel_workers,
fls(list_length(live_childrels)));
parallel_workers = Min(parallel_workers,
max_parallel_workers_per_gather);
}
Assert(parallel_workers > 0);

/* Generate a partial append path. */
appendpath = create_append_path(root, rel, NIL, partial_subpaths,
NIL, NULL, parallel_workers,
enable_parallel_append,
-1);

Note that the 'rel' in this code refers to the partitioned table for
which an Append path is being considered, so compute_parallel_worker()
using that 'rel' would use the partitioned table's
rel_parallel_workers as you are trying to do.

--
Amit Langote
EDB: http://www.enterprisedb.com

#3Seamus Abshere
seamus@abshere.net
In reply to: Amit Langote (#2)
1 attachment(s)
Re: A reloption for partitioned tables - parallel_workers

hi,

Here we go, my first patch... solves /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com

Best,
Seamus

Show quoted text

On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:

Hi Seamus,

On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <seamus@abshere.net> wrote:

It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com).

Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..f1ade035ac 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,7 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
* If the user has set the parallel_workers reloption, use that; otherwise
* select a default number of workers.
*/
+       // I want to affect this
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
else

so I do this

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..597b209bfb 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1961,13 +1961,15 @@ build_local_reloptions(local_relopts *relopts, Datum options, bool validate)
bytea *
partitioned_table_reloptions(Datum reloptions, bool validate)
{
-       /*
-        * There are no options for partitioned tables yet, but this is able to do
-        * some validation.
-        */
+       static const relopt_parse_elt tab[] = {
+               {"parallel_workers", RELOPT_TYPE_INT,
+               offsetof(StdRdOptions, parallel_workers)},
+       };
+
return (bytea *) build_reloptions(reloptions, validate,
RELOPT_KIND_PARTITIONED,
-                                                                         0, NULL, 0);
+                                                                         sizeof(StdRdOptions),
+                                                                         tab, lengthof(tab));
}

That "works":

postgres=# alter table test_3pd_cstore_partitioned set (parallel_workers = 33);
ALTER TABLE
postgres=# select relname, relkind, reloptions from pg_class where relname = 'test_3pd_cstore_partitioned';
relname | relkind | reloptions
-----------------------------+---------+-----------------------
test_3pd_cstore_partitioned | p | {parallel_workers=33}
(1 row)

But it seems to be ignored:

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..c68835ce38 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3751,6 +3751,8 @@ compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
* If the user has set the parallel_workers reloption, use that; otherwise
* select a default number of workers.
*/
+       // I want to affect this, but this assertion always passes
+       Assert(rel->rel_parallel_workers == -1)
if (rel->rel_parallel_workers != -1)
parallel_workers = rel->rel_parallel_workers;
else

You may see by inspecting the callers of compute_parallel_worker()
that it never gets called on a partitioned table, only its leaf
partitions. Maybe you could try calling compute_parallel_worker()
somewhere in add_paths_to_append_rel(), which has this code to figure
out parallel_workers to use for a parallel Append path for a given
partitioned table:

/* Find the highest number of workers requested for any subpath. */
foreach(lc, partial_subpaths)
{
Path *path = lfirst(lc);

parallel_workers = Max(parallel_workers, path->parallel_workers);
}
Assert(parallel_workers > 0);

/*
* If the use of parallel append is permitted, always request at least
* log2(# of children) workers. We assume it can be useful to have
* extra workers in this case because they will be spread out across
* the children. The precise formula is just a guess, but we don't
* want to end up with a radically different answer for a table with N
* partitions vs. an unpartitioned table with the same data, so the
* use of some kind of log-scaling here seems to make some sense.
*/
if (enable_parallel_append)
{
parallel_workers = Max(parallel_workers,
fls(list_length(live_childrels)));
parallel_workers = Min(parallel_workers,
max_parallel_workers_per_gather);
}
Assert(parallel_workers > 0);

/* Generate a partial append path. */
appendpath = create_append_path(root, rel, NIL, partial_subpaths,
NIL, NULL, parallel_workers,
enable_parallel_append,
-1);

Note that the 'rel' in this code refers to the partitioned table for
which an Append path is being considered, so compute_parallel_worker()
using that 'rel' would use the partitioned table's
rel_parallel_workers as you are trying to do.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

0001-Allow-setting-parallel_workers-on-partitioned-tables.patchapplication/octet-stream; name=0001-Allow-setting-parallel_workers-on-partitioned-tables.patchDownload
From d36510fb3f2317b6194e78e83af1c22fcc27a7bd Mon Sep 17 00:00:00 2001
From: Seamus Abshere <seamus@abshere.net>
Date: Mon, 15 Feb 2021 10:34:32 -0500
Subject: [PATCH] Allow setting parallel_workers on partitioned tables.
To: pgsql-hackers@postgresql.org

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase (in my case, 700%) when used
with cstore_fdw column store partitions.
---
 src/backend/access/common/reloptions.c | 12 +++++++++---
 src/backend/optimizer/path/allpaths.c  | 16 ++++++++++++++++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..029a73325e 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
 	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
+	 * Currently the only setting known to be useful for partitioned tables
+	 * is parallel_workers.
 	 */
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 									  RELOPT_KIND_PARTITIONED,
-									  0, NULL, 0);
+									  sizeof(StdRdOptions),
+									  tab, lengthof(tab));
 }
 
 /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..4c4e17a8c6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1467,6 +1467,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
+		/* First see if there is a root-level setting for parallel_workers */
+		parallel_workers = compute_parallel_worker(
+			rel,
+			-1,
+			-1,
+			max_parallel_workers_per_gather
+		);
+
 		/* Find the highest number of workers requested for any subpath. */
 		foreach(lc, partial_subpaths)
 		{
@@ -1522,6 +1530,14 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 		ListCell   *lc;
 		int			parallel_workers = 0;
 
+		/* First see if there is a root-level setting for parallel_workers */
+		parallel_workers = compute_parallel_worker(
+			rel,
+			-1,
+			-1,
+			max_parallel_workers_per_gather
+		);
+
 		/*
 		 * Find the highest number of workers requested for any partial
 		 * subpath.
-- 
2.27.0

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#2)
Re: A reloption for partitioned tables - parallel_workers

On Mon, 2021-02-15 at 17:53 +0900, Amit Langote wrote:

On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <seamus@abshere.net> wrote:

It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables,
at least if they're made up of fancy column store partitions (see
/messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com).
Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:

You may see by inspecting the callers of compute_parallel_worker()
that it never gets called on a partitioned table, only its leaf
partitions. Maybe you could try calling compute_parallel_worker()
somewhere in add_paths_to_append_rel(), which has this code to figure
out parallel_workers to use for a parallel Append path for a given
partitioned table:

/* Find the highest number of workers requested for any subpath. */
foreach(lc, partial_subpaths)
{
Path *path = lfirst(lc);

parallel_workers = Max(parallel_workers, path->parallel_workers);
}
Assert(parallel_workers > 0);

/*
* If the use of parallel append is permitted, always request at least
* log2(# of children) workers. We assume it can be useful to have
* extra workers in this case because they will be spread out across
* the children. The precise formula is just a guess, but we don't
* want to end up with a radically different answer for a table with N
* partitions vs. an unpartitioned table with the same data, so the
* use of some kind of log-scaling here seems to make some sense.
*/
if (enable_parallel_append)
{
parallel_workers = Max(parallel_workers,
fls(list_length(live_childrels)));
parallel_workers = Min(parallel_workers,
max_parallel_workers_per_gather);
}
Assert(parallel_workers > 0);

Note that the 'rel' in this code refers to the partitioned table for
which an Append path is being considered, so compute_parallel_worker()
using that 'rel' would use the partitioned table's
rel_parallel_workers as you are trying to do.

Note that there is a second chunk of code quite like that one a few
lines down from there that would also have to be modified.

I am +1 on allowing to override the degree of parallelism on a parallel
append. If "parallel_workers" on the partitioned table is an option for
that, it might be a simple solution. On the other hand, perhaps it would
be less confusing to have a different storage parameter name rather than
having "parallel_workers" do double duty.

Also, since there is a design rule that storage parameters can only be used
on partitions, we would have to change that - is that a problem for anybody?

There is another related consideration that doesn't need to be addressed
by this patch, but that is somewhat related: if the executor prunes some
partitions, the degree of parallelism is unaffected, right?
So if the planner decides to use 24 workers for 25 partitions, and the
executor discards all but one of these partition scans, we would end up
with 24 workers scanning a single partition.

I am not sure how that could be improved.

Yours,
Laurenz Albe

#5Amit Langote
amitlangote09@gmail.com
In reply to: Seamus Abshere (#3)
Re: A reloption for partitioned tables - parallel_workers

Hi Seamus,

Please see my reply below.

On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <seamus@abshere.net> wrote:

On Mon, Feb 15, 2021, at 3:53 AM, Amit Langote wrote:

On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <seamus@abshere.net> wrote:

It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables, at least if they're made up of fancy column store partitions (see /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com).

Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:

You may see by inspecting the callers of compute_parallel_worker()
that it never gets called on a partitioned table, only its leaf
partitions. Maybe you could try calling compute_parallel_worker()
somewhere in add_paths_to_append_rel(), which has this code to figure
out parallel_workers to use for a parallel Append path for a given
partitioned table:

/* Find the highest number of workers requested for any subpath. */
foreach(lc, partial_subpaths)
{
Path *path = lfirst(lc);

parallel_workers = Max(parallel_workers, path->parallel_workers);
}
Assert(parallel_workers > 0);

/*
* If the use of parallel append is permitted, always request at least
* log2(# of children) workers. We assume it can be useful to have
* extra workers in this case because they will be spread out across
* the children. The precise formula is just a guess, but we don't
* want to end up with a radically different answer for a table with N
* partitions vs. an unpartitioned table with the same data, so the
* use of some kind of log-scaling here seems to make some sense.
*/
if (enable_parallel_append)
{
parallel_workers = Max(parallel_workers,
fls(list_length(live_childrels)));
parallel_workers = Min(parallel_workers,
max_parallel_workers_per_gather);
}
Assert(parallel_workers > 0);

/* Generate a partial append path. */
appendpath = create_append_path(root, rel, NIL, partial_subpaths,
NIL, NULL, parallel_workers,
enable_parallel_append,
-1);

Note that the 'rel' in this code refers to the partitioned table for
which an Append path is being considered, so compute_parallel_worker()
using that 'rel' would use the partitioned table's
rel_parallel_workers as you are trying to do.

Here we go, my first patch... solves /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com

Thanks for sending the patch here.

It seems you haven't made enough changes for reloptions code to
recognize parallel_workers as valid for partitioned tables, because
even with the patch applied, I get this:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp set (parallel_workers = 1);
ERROR: unrecognized parameter "parallel_workers"

You need this:

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index 029a73325e..9eb8a0c10d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
        {
            "parallel_workers",
            "Number of parallel processes that can be used per
executor node for this relation.",
-           RELOPT_KIND_HEAP,
+           RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
            ShareUpdateExclusiveLock
        },
        -1, 0, 1024

which tells reloptions parsing code that parallel_workers is
acceptable for both heap and partitioned relations.

Other comments on the patch:

* This function calling style, where the first argument is not placed
on the same line as the function itself, is not very common in
Postgres:

+       /* First see if there is a root-level setting for parallel_workers */
+       parallel_workers = compute_parallel_worker(
+           rel,
+           -1,
+           -1,
+           max_parallel_workers_per_gather
+

This makes the new code look very different from the rest of the
codebase. Better to stick to existing styles.

2. It might be a good idea to use this opportunity to add a function,
say compute_append_parallel_workers(), for the code that does what the
function name says. Then the patch will simply add the new
compute_parallel_worker() call at the top of that function.

3. I think we should consider the Append parent relation's
parallel_workers ONLY if it is a partitioned relation, because it
doesn't make a lot of sense for other types of parent relations. So
the new code should look like this:

if (IS_PARTITIONED_REL(rel))
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

4. Maybe it also doesn't make sense to consider the parent relation's
parallel_workers if Parallel Append is disabled
(enable_parallel_append = off). That's because with a simple
(non-parallel) Append running under Gather, all launched parallel
workers process the same partition before moving to the next one.
OTOH, one's intention of setting parallel_workers on the parent
partitioned table would most likely be to distribute workers across
partitions, which is only possible with parallel Append
(enable_parallel_append = on). So, the new code should look like
this:

if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

BTW, please consider bottom-posting like I've done in this reply,
because that makes it easier to follow discussions involving patch
reviews that can potentially take many emails to reach conclusions.

--
Amit Langote
EDB: http://www.enterprisedb.com

#6Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#4)
Re: A reloption for partitioned tables - parallel_workers

Hi,

On Tue, Feb 16, 2021 at 1:06 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Mon, 2021-02-15 at 17:53 +0900, Amit Langote wrote:

On Mon, Feb 15, 2021 at 5:28 PM Seamus Abshere <seamus@abshere.net> wrote:

It turns out parallel_workers may be a useful reloption for certain uses of partitioned tables,
at least if they're made up of fancy column store partitions (see
/messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com).
Would somebody tell me what I'm doing wrong? I would love to submit a patch but I'm stuck:

You may see by inspecting the callers of compute_parallel_worker()
that it never gets called on a partitioned table, only its leaf
partitions. Maybe you could try calling compute_parallel_worker()
somewhere in add_paths_to_append_rel(), which has this code to figure
out parallel_workers to use for a parallel Append path for a given
partitioned table:

/* Find the highest number of workers requested for any subpath. */
foreach(lc, partial_subpaths)
{
Path *path = lfirst(lc);

parallel_workers = Max(parallel_workers, path->parallel_workers);
}
Assert(parallel_workers > 0);

/*
* If the use of parallel append is permitted, always request at least
* log2(# of children) workers. We assume it can be useful to have
* extra workers in this case because they will be spread out across
* the children. The precise formula is just a guess, but we don't
* want to end up with a radically different answer for a table with N
* partitions vs. an unpartitioned table with the same data, so the
* use of some kind of log-scaling here seems to make some sense.
*/
if (enable_parallel_append)
{
parallel_workers = Max(parallel_workers,
fls(list_length(live_childrels)));
parallel_workers = Min(parallel_workers,
max_parallel_workers_per_gather);
}
Assert(parallel_workers > 0);

Note that the 'rel' in this code refers to the partitioned table for
which an Append path is being considered, so compute_parallel_worker()
using that 'rel' would use the partitioned table's
rel_parallel_workers as you are trying to do.

Note that there is a second chunk of code quite like that one a few
lines down from there that would also have to be modified.

I am +1 on allowing to override the degree of parallelism on a parallel
append. If "parallel_workers" on the partitioned table is an option for
that, it might be a simple solution. On the other hand, perhaps it would
be less confusing to have a different storage parameter name rather than
having "parallel_workers" do double duty.

Also, since there is a design rule that storage parameters can only be used
on partitions, we would have to change that - is that a problem for anybody?

I am not aware of a rule that suggests that parallel_workers is always
interpreted using storage-level considerations. If that is indeed a
popular interpretation at this point, then yes, we should be open to
considering a new name for the parameter that this patch wants to add.

Maybe parallel_append_workers? Perhaps not a bad idea in this patch's
case, but considering that we may want to expand the support of
cross-partition parallelism to operations other than querying, maybe
something else?

This reminds me of something I forgot to mention in my review of the
patch -- it should also update the documentation of parallel_workers
on the CREATE TABLE page to mention that it will be interpreted a bit
differently for partitioned tables than for regular storage-bearing
relations. Workers specified for partitioned tables would be
distributed by the executor over its partitions, unlike with
storage-bearing relations, where the distribution of specified workers
is controlled by the AM using storage-level considerations.

There is another related consideration that doesn't need to be addressed
by this patch, but that is somewhat related: if the executor prunes some
partitions, the degree of parallelism is unaffected, right?

That's correct. Launched workers could be less than planned, but that
would not have anything to do with executor pruning.

So if the planner decides to use 24 workers for 25 partitions, and the
executor discards all but one of these partition scans, we would end up
with 24 workers scanning a single partition.

I do remember pondering this when testing my patches to improve the
performance of executing a generic plan to scan a partitioned table
where runtime pruning is possible. Here is an example:

create table hp (a int) partition by hash (a);
select 'create table hp' || i || ' partition of hp for values with
(modulus 100, remainder ' || i || ');' from generate_series(0, 99) i;
\gexec
insert into hp select generate_series(0, 1000000);
alter table hp set (parallel_workers = 16);
set plan_cache_mode to force_generic_plan ;
set max_parallel_workers_per_gather to 16;
prepare q as select * from hp where a = $1;

explain (analyze, verbose) execute q (1);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..14426.50 rows=5675 width=4) (actual
time=2.370..25.002 rows=1 loops=1)
Output: hp.a
Workers Planned: 16
Workers Launched: 7
-> Parallel Append (cost=0.00..12859.00 rows=400 width=4) (actual
time=0.006..0.384 rows=0 loops=8)
Worker 0: actual time=0.001..0.001 rows=0 loops=1
Worker 1: actual time=0.001..0.001 rows=0 loops=1
Worker 2: actual time=0.001..0.001 rows=0 loops=1
Worker 3: actual time=0.001..0.001 rows=0 loops=1
Worker 4: actual time=0.001..0.001 rows=0 loops=1
Worker 5: actual time=0.001..0.001 rows=0 loops=1
Worker 6: actual time=0.001..0.001 rows=0 loops=1
Subplans Removed: 99
-> Parallel Seq Scan on public.hp40 hp_1 (cost=0.00..126.50
rows=33 width=4) (actual time=0.041..3.060 rows=1 loops=1)
Output: hp_1.a
Filter: (hp_1.a = $1)
Rows Removed by Filter: 9813
Planning Time: 5.543 ms
Execution Time: 25.139 ms
(19 rows)

deallocate q;
set max_parallel_workers_per_gather to 0;
prepare q as select * from hp where a = $1;
explain (analyze, verbose) execute q (1);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..18754.88 rows=5675 width=4) (actual
time=0.029..2.474 rows=1 loops=1)
Subplans Removed: 99
-> Seq Scan on public.hp40 hp_1 (cost=0.00..184.25 rows=56
width=4) (actual time=0.028..2.471 rows=1 loops=1)
Output: hp_1.a
Filter: (hp_1.a = $1)
Rows Removed by Filter: 9813
Planning Time: 2.231 ms
Execution Time: 2.535 ms
(8 rows)

Comparing the Execution Times above, it's clear that Gather and
workers are pure overhead in this case.

Although in cases where one expects runtime pruning to be useful, the
plan itself is very unlikely to be parallelized. For example, when the
individual partition scans are Index Scans.

deallocate q;
create index on hp (a);
alter table hp set (parallel_workers = 16);
analyze;
set max_parallel_workers_per_gather to 16;
prepare q as select * from hp where a = $1;
explain (analyze, verbose) execute q (1);
QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------
-
Append (cost=0.29..430.75 rows=100 width=4) (actual
time=0.043..0.046 rows=1 loops=1)
Subplans Removed: 99
-> Index Only Scan using hp40_a_idx on public.hp40 hp_1
(cost=0.29..4.30 rows=1 width=4) (actual time=0.042..0.044 rows=1
loops=1)
Output: hp_1.a
Index Cond: (hp_1.a = $1)
Heap Fetches: 0
Planning Time: 13.769 ms
Execution Time: 0.115 ms
(8 rows)

I am not sure how that could be improved.

The planner currently ignores runtime pruning optimization when
assigning costs to the Append path, so fixing that would be a good
start. There are efforts underway for that, such as [1]https://commitfest.postgresql.org/32/2829/.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1]: https://commitfest.postgresql.org/32/2829/

#7Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#6)
Re: A reloption for partitioned tables - parallel_workers

On Tue, 2021-02-16 at 16:29 +0900, Amit Langote wrote:

I am +1 on allowing to override the degree of parallelism on a parallel
append. If "parallel_workers" on the partitioned table is an option for
that, it might be a simple solution. On the other hand, perhaps it would
be less confusing to have a different storage parameter name rather than
having "parallel_workers" do double duty.
Also, since there is a design rule that storage parameters can only be used
on partitions, we would have to change that - is that a problem for anybody?

I am not aware of a rule that suggests that parallel_workers is always
interpreted using storage-level considerations. If that is indeed a
popular interpretation at this point, then yes, we should be open to
considering a new name for the parameter that this patch wants to add.

Well, https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
says:

"Specifying these parameters for partitioned tables is not supported,
but you may specify them for individual leaf partitions."

If we re-purpose "parallel_workers" like this, we'd have to change this.

Then for a normal table, "parallel_workers" would mean how many workers
work on a parallel table scan. For a partitioned table, it determines
how many workers work on a parallel append.

Perhaps that is similar enough that it is not confusing.

Yours,
Laurenz Albe

#8Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#7)
Re: A reloption for partitioned tables - parallel_workers

On Tue, Feb 16, 2021 at 11:02 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2021-02-16 at 16:29 +0900, Amit Langote wrote:

I am +1 on allowing to override the degree of parallelism on a parallel
append. If "parallel_workers" on the partitioned table is an option for
that, it might be a simple solution. On the other hand, perhaps it would
be less confusing to have a different storage parameter name rather than
having "parallel_workers" do double duty.
Also, since there is a design rule that storage parameters can only be used
on partitions, we would have to change that - is that a problem for anybody?

I am not aware of a rule that suggests that parallel_workers is always
interpreted using storage-level considerations. If that is indeed a
popular interpretation at this point, then yes, we should be open to
considering a new name for the parameter that this patch wants to add.

Well, https://www.postgresql.org/docs/current/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS
says:

"Specifying these parameters for partitioned tables is not supported,
but you may specify them for individual leaf partitions."

If we re-purpose "parallel_workers" like this, we'd have to change this.

Right, as I mentioned in my reply, the patch will need to update the
documentation.

Then for a normal table, "parallel_workers" would mean how many workers
work on a parallel table scan. For a partitioned table, it determines
how many workers work on a parallel append.

Perhaps that is similar enough that it is not confusing.

I tend to agree with that.

--
Amit Langote
EDB: http://www.enterprisedb.com

#9Hou, Zhijie
houzj.fnst@cn.fujitsu.com
In reply to: Seamus Abshere (#3)
RE: A reloption for partitioned tables - parallel_workers

hi,

Here we go, my first patch... solves
/messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520
ed55@www.fastmail.com

Hi,

partitioned_table_reloptions(Datum reloptions, bool validate)
 {
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 									  RELOPT_KIND_PARTITIONED,
-									  0, NULL, 0);
+									  sizeof(StdRdOptions),
+									  tab, lengthof(tab));
 }

I noticed that you plan to store the parallel_workers in the same struct(StdRdOptions) as normal heap relation.
It seems better to store it in a separate struct.

And as commit 1bbd608 said:
----

Splitting things has the advantage to
make the information stored in rd_options include only the necessary
information, reducing the amount of memory used for a relcache entry
with partitioned tables if new reloptions are introduced at this level.

----

What do you think?

Best regards,
Houzj

#10Amit Langote
amitlangote09@gmail.com
In reply to: Hou, Zhijie (#9)
Re: A reloption for partitioned tables - parallel_workers

On Thu, Feb 18, 2021 at 6:06 PM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:

Here we go, my first patch... solves
/messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520
ed55@www.fastmail.com

Hi,

partitioned_table_reloptions(Datum reloptions, bool validate)
{
+       static const relopt_parse_elt tab[] = {
+               {"parallel_workers", RELOPT_TYPE_INT,
+               offsetof(StdRdOptions, parallel_workers)},
+       };
+
return (bytea *) build_reloptions(reloptions, validate,
RELOPT_KIND_PARTITIONED,
-                                                                         0, NULL, 0);
+                                                                         sizeof(StdRdOptions),
+                                                                         tab, lengthof(tab));
}

I noticed that you plan to store the parallel_workers in the same struct(StdRdOptions) as normal heap relation.
It seems better to store it in a separate struct.

And as commit 1bbd608 said:
----

Splitting things has the advantage to
make the information stored in rd_options include only the necessary
information, reducing the amount of memory used for a relcache entry
with partitioned tables if new reloptions are introduced at this level.

----

What do you think?

That may be a good idea. So instead of referring to the
parallel_workers in StdRdOptions, define a new one, say,
PartitionedTableRdOptions as follows and refer to it in
partitioned_table_reloptions():

typedef struct PartitionedTableRdOptions
{
int32 vl_len_; /* varlena header (do not touch directly!) */
int parallel_workers; /* max number of parallel workers */
} PartitionedTableRdOptions;

--
Amit Langote
EDB: http://www.enterprisedb.com

#11Amit Langote
amitlangote09@gmail.com
In reply to: Amit Langote (#5)
1 attachment(s)
Re: A reloption for partitioned tables - parallel_workers

On Tue, Feb 16, 2021 at 3:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <seamus@abshere.net> wrote:

Here we go, my first patch... solves /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com

Thanks for sending the patch here.

It seems you haven't made enough changes for reloptions code to
recognize parallel_workers as valid for partitioned tables, because
even with the patch applied, I get this:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp set (parallel_workers = 1);
ERROR: unrecognized parameter "parallel_workers"

You need this:

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index 029a73325e..9eb8a0c10d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
{
"parallel_workers",
"Number of parallel processes that can be used per
executor node for this relation.",
-           RELOPT_KIND_HEAP,
+           RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
ShareUpdateExclusiveLock
},
-1, 0, 1024

which tells reloptions parsing code that parallel_workers is
acceptable for both heap and partitioned relations.

Other comments on the patch:

* This function calling style, where the first argument is not placed
on the same line as the function itself, is not very common in
Postgres:

+       /* First see if there is a root-level setting for parallel_workers */
+       parallel_workers = compute_parallel_worker(
+           rel,
+           -1,
+           -1,
+           max_parallel_workers_per_gather
+

This makes the new code look very different from the rest of the
codebase. Better to stick to existing styles.

2. It might be a good idea to use this opportunity to add a function,
say compute_append_parallel_workers(), for the code that does what the
function name says. Then the patch will simply add the new
compute_parallel_worker() call at the top of that function.

3. I think we should consider the Append parent relation's
parallel_workers ONLY if it is a partitioned relation, because it
doesn't make a lot of sense for other types of parent relations. So
the new code should look like this:

if (IS_PARTITIONED_REL(rel))
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

4. Maybe it also doesn't make sense to consider the parent relation's
parallel_workers if Parallel Append is disabled
(enable_parallel_append = off). That's because with a simple
(non-parallel) Append running under Gather, all launched parallel
workers process the same partition before moving to the next one.
OTOH, one's intention of setting parallel_workers on the parent
partitioned table would most likely be to distribute workers across
partitions, which is only possible with parallel Append
(enable_parallel_append = on). So, the new code should look like
this:

if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

Here is an updated version of the Seamus' patch that takes into
account these and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Seamus, please register this patch in the next commit-fest:
https://commitfest.postgresql.org/32/

If you haven't already, you will need to create a community account to
use that site.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patchapplication/octet-stream; name=v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patchDownload
From da17ee93e5031156bb07a2ef940a7d3dc95b43c7 Mon Sep 17 00:00:00 2001
From: Seamus Abshere <seamus@abshere.net>
Date: Mon, 15 Feb 2021 10:34:32 -0500
Subject: [PATCH v2] Allow setting parallel_workers on partitioned tables.

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase (in my case, 700%) when used
with cstore_fdw column store partitions.
---
 doc/src/sgml/ref/create_table.sgml     |  14 +--
 src/backend/access/common/reloptions.c |  14 ++-
 src/backend/optimizer/path/allpaths.c  | 114 +++++++++++++------------
 src/include/utils/rel.h                |  10 +++
 4 files changed, 90 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 3b2b227683..2ed1150928 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     If a table parameter value is set and the
     equivalent <literal>toast.</literal> parameter is not, the TOAST table
     will use the table's parameter value.
-    Specifying these parameters for partitioned tables is not supported,
-    but you may specify them for individual leaf partitions.
+    Specifying most of these parameters for partitioned tables is not
+    supported, but you may specify them for individual leaf partitions;
+    refer to the description of individual parameters for more details.
    </para>
 
    <variablelist>
@@ -1401,9 +1402,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      <para>
       This sets the number of workers that should be used to assist a parallel
       scan of this table.  If not set, the system will determine a value based
-      on the relation size.  The actual number of workers chosen by the planner
-      or by utility statements that use parallel scans may be less, for example
-      due to the setting of <xref linkend="guc-max-worker-processes"/>.
+      on the relation size.  When set on a partitioned table, the specified
+      number of workers will work on distinct partitions, so the number of
+      partitions affected by the parallel operation should be taken into
+      account.  The actual number of workers chosen by the planner or by
+      utility statements that use parallel scans may be less, for example due
+      to the setting of <xref linkend="guc-max-worker-processes"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
 		{
 			"parallel_workers",
 			"Number of parallel processes that can be used per executor node for this relation.",
-			RELOPT_KIND_HEAP,
+			RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
 			ShareUpdateExclusiveLock
 		},
 		-1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
 	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
+	 * Currently the only setting known to be useful for partitioned tables
+	 * is parallel_workers.
 	 */
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(PartitionedTableRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 									  RELOPT_KIND_PARTITIONED,
-									  0, NULL, 0);
+									  sizeof(PartitionedTableRdOptions),
+									  tab, lengthof(tab));
 }
 
 /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..a0803cf955 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1268,6 +1268,59 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	add_paths_to_append_rel(root, rel, live_childrels);
 }
 
+/*
+ * compute_append_parallel_workers
+ * 		Computes the number of workers to assign to scan the subpaths appended
+ * 		by a given Append path
+ */
+static int
+compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+								int num_live_children,
+								bool parallel_append)
+{
+	ListCell   *lc;
+	int			parallel_workers = 0;
+
+	/*
+	 * For partitioned rels, first see if there is a root-level setting for
+	 * parallel_workers.  But only consider if a Parallel Append plan is
+	 * to be considered.
+	 */
+	if (IS_PARTITIONED_REL(rel) && parallel_append)
+		parallel_workers =
+			compute_parallel_worker(rel, -1, -1,
+									max_parallel_workers_per_gather);
+
+	/* Find the highest number of workers requested for any subpath. */
+	foreach(lc, subpaths)
+	{
+		Path	   *path = lfirst(lc);
+
+		parallel_workers = Max(parallel_workers, path->parallel_workers);
+	}
+	Assert(parallel_workers > 0 || subpaths == NIL);
+
+	/*
+	 * If the use of parallel append is permitted, always request at least
+	 * log2(# of children) workers.  We assume it can be useful to have
+	 * extra workers in this case because they will be spread out across
+	 * the children.  The precise formula is just a guess, but we don't
+	 * want to end up with a radically different answer for a table with N
+	 * partitions vs. an unpartitioned table with the same data, so the
+	 * use of some kind of log-scaling here seems to make some sense.
+	 */
+	if (parallel_append)
+	{
+		parallel_workers = Max(parallel_workers,
+							   fls(num_live_children));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+	}
+	Assert(parallel_workers > 0);
+
+	return parallel_workers;
+}
+
 
 /*
  * add_paths_to_append_rel
@@ -1464,35 +1517,10 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid && partial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/* Find the highest number of workers requested for any subpath. */
-		foreach(lc, partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
-
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
-
-		/*
-		 * If the use of parallel append is permitted, always request at least
-		 * log2(# of children) workers.  We assume it can be useful to have
-		 * extra workers in this case because they will be spread out across
-		 * the children.  The precise formula is just a guess, but we don't
-		 * want to end up with a radically different answer for a table with N
-		 * partitions vs. an unpartitioned table with the same data, so the
-		 * use of some kind of log-scaling here seems to make some sense.
-		 */
-		if (enable_parallel_append)
-		{
-			parallel_workers = Max(parallel_workers,
-								   fls(list_length(live_childrels)));
-			parallel_workers = Min(parallel_workers,
-								   max_parallel_workers_per_gather);
-		}
-		Assert(parallel_workers > 0);
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, partial_subpaths,
+											list_length(live_childrels),
+											enable_parallel_append);
 
 		/* Generate a partial append path. */
 		appendpath = create_append_path(root, rel, NIL, partial_subpaths,
@@ -1519,30 +1547,10 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
-
-		/*
-		 * Find the highest number of workers requested for any partial
-		 * subpath.
-		 */
-		foreach(lc, pa_partial_subpaths)
-		{
-			Path	   *path = lfirst(lc);
-
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-
-		/*
-		 * Same formula here as above.  It's even more important in this
-		 * instance because the non-partial paths won't contribute anything to
-		 * the planned number of parallel workers.
-		 */
-		parallel_workers = Max(parallel_workers,
-							   fls(list_length(live_childrels)));
-		parallel_workers = Min(parallel_workers,
-							   max_parallel_workers_per_gather);
-		Assert(parallel_workers > 0);
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, pa_partial_subpaths,
+											list_length(live_childrels),
+											true);
 
 		appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
 										pa_partial_subpaths,
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 10b63982c0..fe114e0856 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
 } StdRdOptions;
 
+/*
+ * PartitionedTableRdOptions
+ * 		Contents of rd_options for partitioned tables
+ */
+typedef struct PartitionedTableRdOptions
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int			parallel_workers;	/* max number of parallel workers */
+} PartitionedTableRdOptions;
+
 #define HEAP_MIN_FILLFACTOR			10
 #define HEAP_DEFAULT_FILLFACTOR		100
 
-- 
2.24.1

#12Seamus Abshere
seamus@abshere.net
In reply to: Amit Langote (#11)
Re: A reloption for partitioned tables - parallel_workers

hi Amit,

Thanks so much for doing this. I had created

https://commitfest.postgresql.org/32/2987/

and it looks like it now shows your patch as the one to use. Let me know if I should do anything else.

Best,
Seamus

Show quoted text

On Fri, Feb 19, 2021, at 2:30 AM, Amit Langote wrote:

On Tue, Feb 16, 2021 at 3:05 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <seamus@abshere.net> wrote:

Here we go, my first patch... solves /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com

Thanks for sending the patch here.

It seems you haven't made enough changes for reloptions code to
recognize parallel_workers as valid for partitioned tables, because
even with the patch applied, I get this:

create table rp (a int) partition by range (a);
create table rp1 partition of rp for values from (minvalue) to (0);
create table rp2 partition of rp for values from (0) to (maxvalue);
alter table rp set (parallel_workers = 1);
ERROR: unrecognized parameter "parallel_workers"

You need this:

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index 029a73325e..9eb8a0c10d 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
{
"parallel_workers",
"Number of parallel processes that can be used per
executor node for this relation.",
-           RELOPT_KIND_HEAP,
+           RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
ShareUpdateExclusiveLock
},
-1, 0, 1024

which tells reloptions parsing code that parallel_workers is
acceptable for both heap and partitioned relations.

Other comments on the patch:

* This function calling style, where the first argument is not placed
on the same line as the function itself, is not very common in
Postgres:

+       /* First see if there is a root-level setting for parallel_workers */
+       parallel_workers = compute_parallel_worker(
+           rel,
+           -1,
+           -1,
+           max_parallel_workers_per_gather
+

This makes the new code look very different from the rest of the
codebase. Better to stick to existing styles.

2. It might be a good idea to use this opportunity to add a function,
say compute_append_parallel_workers(), for the code that does what the
function name says. Then the patch will simply add the new
compute_parallel_worker() call at the top of that function.

3. I think we should consider the Append parent relation's
parallel_workers ONLY if it is a partitioned relation, because it
doesn't make a lot of sense for other types of parent relations. So
the new code should look like this:

if (IS_PARTITIONED_REL(rel))
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

4. Maybe it also doesn't make sense to consider the parent relation's
parallel_workers if Parallel Append is disabled
(enable_parallel_append = off). That's because with a simple
(non-parallel) Append running under Gather, all launched parallel
workers process the same partition before moving to the next one.
OTOH, one's intention of setting parallel_workers on the parent
partitioned table would most likely be to distribute workers across
partitions, which is only possible with parallel Append
(enable_parallel_append = on). So, the new code should look like
this:

if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

Here is an updated version of the Seamus' patch that takes into
account these and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Seamus, please register this patch in the next commit-fest:
https://commitfest.postgresql.org/32/

If you haven't already, you will need to create a community account to
use that site.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:
* v2-0001-Allow-setting-parallel_workers-on-partitioned-tab.patch

#13Amit Langote
amitlangote09@gmail.com
In reply to: Seamus Abshere (#12)
Re: A reloption for partitioned tables - parallel_workers

On Fri, Feb 19, 2021 at 11:54 PM Seamus Abshere <seamus@abshere.net> wrote:

On Fri, Feb 19, 2021, at 2:30 AM, Amit Langote wrote:

Here is an updated version of the Seamus' patch that takes into
account these and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Seamus, please register this patch in the next commit-fest:
https://commitfest.postgresql.org/32/

If you haven't already, you will need to create a community account to
use that site.

hi Amit,

Thanks so much for doing this. I had created

https://commitfest.postgresql.org/32/2987/

Ah, sorry, I had not checked. I updated the entry to add you as the author.

and it looks like it now shows your patch as the one to use. Let me know if I should do anything else.

You could take a look at the latest patch and see if you find any
problems with my or others' suggestions that I implemented in the v2
patch. Also, please add regression tests for the new feature in
src/test/regress/sql/select_parallel.sql.

--
Amit Langote
EDB: http://www.enterprisedb.com

#14Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#11)
Re: A reloption for partitioned tables - parallel_workers

On Fri, 2021-02-19 at 16:30 +0900, Amit Langote wrote:

On Tue, Feb 16, 2021 at 1:35 AM Seamus Abshere <seamus@abshere.net> wrote:

Here we go, my first patch... solves /messages/by-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com

Here is an updated version of the Seamus' patch that takes into
account these and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Yes, there should be regression tests.

I gave the patch a spin, and it allows to raise the number of workers for
a parallel append as advertised.

--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     If a table parameter value is set and the
     equivalent <literal>toast.</literal> parameter is not, the TOAST table
     will use the table's parameter value.
-    Specifying these parameters for partitioned tables is not supported,
-    but you may specify them for individual leaf partitions.
+    Specifying most of these parameters for partitioned tables is not
+    supported, but you may specify them for individual leaf partitions;
+    refer to the description of individual parameters for more details.
    </para>

This doesn't make me happy. Since the options themselves do not say if they
are supported on partitioned tables or not, the reader is left in the dark.

Perhaps:

These options, with the exception of <literal>parallel_workers</literal>,
are not supported on partitioned tables, but you may specify them for individual
leaf partitions.

@@ -1401,9 +1402,12 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      <para>
       This sets the number of workers that should be used to assist a parallel
       scan of this table.  If not set, the system will determine a value based
-      on the relation size.  The actual number of workers chosen by the planner
-      or by utility statements that use parallel scans may be less, for example
-      due to the setting of <xref linkend="guc-max-worker-processes"/>.
+      on the relation size.  When set on a partitioned table, the specified
+      number of workers will work on distinct partitions, so the number of
+      partitions affected by the parallel operation should be taken into
+      account.  The actual number of workers chosen by the planner or by
+      utility statements that use parallel scans may be less, for example due
+      to the setting of <xref linkend="guc-max-worker-processes"/>.
      </para>
     </listitem>
    </varlistentry>

The reader is left to believe that the default number of workers depends on the
size of the partitioned table, which is not entirely true.

Perhaps:

If not set, the system will determine a value based on the relation size and
the number of scanned partitions.

--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1268,6 +1268,59 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
    add_paths_to_append_rel(root, rel, live_childrels);
 }
+/*
+ * compute_append_parallel_workers
+ *         Computes the number of workers to assign to scan the subpaths appended
+ *         by a given Append path
+ */
+static int
+compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+                               int num_live_children,
+                               bool parallel_append)

The new function should have a prototype.

+{
+   ListCell   *lc;
+   int         parallel_workers = 0;
+
+   /*
+    * For partitioned rels, first see if there is a root-level setting for
+    * parallel_workers.  But only consider if a Parallel Append plan is
+    * to be considered.
+    */
+   if (IS_PARTITIONED_REL(rel) && parallel_append)
+       parallel_workers =
+           compute_parallel_worker(rel, -1, -1,
+                                   max_parallel_workers_per_gather);
+
+   /* Find the highest number of workers requested for any subpath. */
+   foreach(lc, subpaths)
+   foreach(lc, subpaths)
+   {
+       Path       *path = lfirst(lc);
+
+       parallel_workers = Max(parallel_workers, path->parallel_workers);
+   }
+   Assert(parallel_workers > 0 || subpaths == NIL);
+
+   /*
+    * If the use of parallel append is permitted, always request at least
+    * log2(# of children) workers.  We assume it can be useful to have
+    * extra workers in this case because they will be spread out across
+    * the children.  The precise formula is just a guess, but we don't
+    * want to end up with a radically different answer for a table with N
+    * partitions vs. an unpartitioned table with the same data, so the
+    * use of some kind of log-scaling here seems to make some sense.
+    */
+   if (parallel_append)
+   {
+       parallel_workers = Max(parallel_workers,
+                              fls(num_live_children));
+       parallel_workers = Min(parallel_workers,
+                              max_parallel_workers_per_gather);
+   }
+   Assert(parallel_workers > 0);
+
+   return parallel_workers;
+}

That means that it is not possible to *lower* the number of parallel workers
with this reloption, which seems to me a valid use case.

I think that if the option is set, it should override the number of workers
inherited from the partitions, and it should override the log2 default.

Yours,
Laurenz Albe

#15houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Langote (#11)
RE: A reloption for partitioned tables - parallel_workers

4. Maybe it also doesn't make sense to consider the parent relation's
parallel_workers if Parallel Append is disabled
(enable_parallel_append = off). That's because with a simple
(non-parallel) Append running under Gather, all launched parallel
workers process the same partition before moving to the next one.
OTOH, one's intention of setting parallel_workers on the parent
partitioned table would most likely be to distribute workers across
partitions, which is only possible with parallel Append
(enable_parallel_append = on). So, the new code should look like
this:

if (IS_PARTITIONED_REL(rel) && enable_parallel_append)
parallel_workers = compute_parallel_worker(rel, -1, -1,
max_parallel_workers_per_gather);

Here is an updated version of the Seamus' patch that takes into account these
and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Seamus, please register this patch in the next commit-fest:
https://commitfest.postgresql.org/32/

If you haven't already, you will need to create a community account to use that
site.

It seems the patch does not include the code that get the parallel_workers from new struct " PartitionedTableRdOptions ",
Did I miss something ?

Best regards,
houzj

#16Amit Langote
amitlangote09@gmail.com
In reply to: houzj.fnst@fujitsu.com (#15)
Re: A reloption for partitioned tables - parallel_workers

Hi,

On Tue, Feb 23, 2021 at 3:12 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

Here is an updated version of the Seamus' patch that takes into account these
and other comments received on this thread so far.
Maybe warrants adding some tests too but I haven't.

Seamus, please register this patch in the next commit-fest:
https://commitfest.postgresql.org/32/

If you haven't already, you will need to create a community account to use that
site.

It seems the patch does not include the code that get the parallel_workers from new struct " PartitionedTableRdOptions ",
Did I miss something ?

Aren't the following hunks in the v2 patch what you meant?

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
  {
  "parallel_workers",
  "Number of parallel processes that can be used per executor node for
this relation.",
- RELOPT_KIND_HEAP,
+ RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
  ShareUpdateExclusiveLock
  },
  -1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
  /*
- * There are no options for partitioned tables yet, but this is able to do
- * some validation.
+ * Currently the only setting known to be useful for partitioned tables
+ * is parallel_workers.
  */
+ static const relopt_parse_elt tab[] = {
+ {"parallel_workers", RELOPT_TYPE_INT,
+ offsetof(PartitionedTableRdOptions, parallel_workers)},
+ };
+
  return (bytea *) build_reloptions(reloptions, validate,
    RELOPT_KIND_PARTITIONED,
-   0, NULL, 0);
+   sizeof(PartitionedTableRdOptions),
+   tab, lengthof(tab));
 }

/*

diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 10b63982c0..fe114e0856 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
  bool vacuum_truncate; /* enables vacuum to truncate a relation */
 } StdRdOptions;
+/*
+ * PartitionedTableRdOptions
+ * Contents of rd_options for partitioned tables
+ */
+typedef struct PartitionedTableRdOptions
+{
+ int32 vl_len_; /* varlena header (do not touch directly!) */
+ int parallel_workers; /* max number of parallel workers */
+} PartitionedTableRdOptions;
+
 #define HEAP_MIN_FILLFACTOR 10
 #define HEAP_DEFAULT_FILLFACTOR 100

--
Amit Langote
EDB: http://www.enterprisedb.com

#17houzj.fnst@fujitsu.com
houzj.fnst@fujitsu.com
In reply to: Amit Langote (#16)
RE: A reloption for partitioned tables - parallel_workers

It seems the patch does not include the code that get the
parallel_workers from new struct " PartitionedTableRdOptions ", Did I miss

something ?

Aren't the following hunks in the v2 patch what you meant?

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
{
"parallel_workers",
"Number of parallel processes that can be used per executor node for this
relation.",
- RELOPT_KIND_HEAP,
+ RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
ShareUpdateExclusiveLock
},
-1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
partitioned_table_reloptions(Datum reloptions, bool validate)  {
/*
- * There are no options for partitioned tables yet, but this is able to do
- * some validation.
+ * Currently the only setting known to be useful for partitioned tables
+ * is parallel_workers.
*/
+ static const relopt_parse_elt tab[] = { {"parallel_workers",
+ RELOPT_TYPE_INT, offsetof(PartitionedTableRdOptions,
+ parallel_workers)}, };
+
return (bytea *) build_reloptions(reloptions, validate,
RELOPT_KIND_PARTITIONED,
-   0, NULL, 0);
+   sizeof(PartitionedTableRdOptions),
+   tab, lengthof(tab));
}

/*

diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index
10b63982c0..fe114e0856 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
bool vacuum_truncate; /* enables vacuum to truncate a relation */  }
StdRdOptions;
+/*
+ * PartitionedTableRdOptions
+ * Contents of rd_options for partitioned tables  */ typedef struct
+PartitionedTableRdOptions {
+ int32 vl_len_; /* varlena header (do not touch directly!) */  int
+parallel_workers; /* max number of parallel workers */ }
+PartitionedTableRdOptions;
+
#define HEAP_MIN_FILLFACTOR 10
#define HEAP_DEFAULT_FILLFACTOR 100

Hi,

I am not sure.
IMO, for normal table, we use the following macro to get the parallel_workers:
----------------------
/*
* RelationGetParallelWorkers
* Returns the relation's parallel_workers reloption setting.
* Note multiple eval of argument!
*/
#define RelationGetParallelWorkers(relation, defaultpw) \
((relation)->rd_options ? \
((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
----------------------

Since we add new struct " PartitionedTableRdOptions ", It seems we need to get parallel_workers in different way.
Do we need similar macro to get partitioned table's parallel_workers ?

Like:
#define PartitionedTableGetParallelWorkers(relation, defaultpw) \
xxx
(PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))

Best regards,
houzj

#18Amit Langote
amitlangote09@gmail.com
In reply to: houzj.fnst@fujitsu.com (#17)
Re: A reloption for partitioned tables - parallel_workers

On Tue, Feb 23, 2021 at 7:24 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:

It seems the patch does not include the code that get the
parallel_workers from new struct " PartitionedTableRdOptions ", Did I miss

something ?

Aren't the following hunks in the v2 patch what you meant?

diff --git a/src/backend/access/common/reloptions.c
b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
{
"parallel_workers",
"Number of parallel processes that can be used per executor node for this
relation.",
- RELOPT_KIND_HEAP,
+ RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
ShareUpdateExclusiveLock
},
-1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
partitioned_table_reloptions(Datum reloptions, bool validate)  {
/*
- * There are no options for partitioned tables yet, but this is able to do
- * some validation.
+ * Currently the only setting known to be useful for partitioned tables
+ * is parallel_workers.
*/
+ static const relopt_parse_elt tab[] = { {"parallel_workers",
+ RELOPT_TYPE_INT, offsetof(PartitionedTableRdOptions,
+ parallel_workers)}, };
+
return (bytea *) build_reloptions(reloptions, validate,
RELOPT_KIND_PARTITIONED,
-   0, NULL, 0);
+   sizeof(PartitionedTableRdOptions),
+   tab, lengthof(tab));
}

/*

diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h index
10b63982c0..fe114e0856 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
bool vacuum_truncate; /* enables vacuum to truncate a relation */  }
StdRdOptions;
+/*
+ * PartitionedTableRdOptions
+ * Contents of rd_options for partitioned tables  */ typedef struct
+PartitionedTableRdOptions {
+ int32 vl_len_; /* varlena header (do not touch directly!) */  int
+parallel_workers; /* max number of parallel workers */ }
+PartitionedTableRdOptions;
+
#define HEAP_MIN_FILLFACTOR 10
#define HEAP_DEFAULT_FILLFACTOR 100

Hi,

I am not sure.
IMO, for normal table, we use the following macro to get the parallel_workers:
----------------------
/*
* RelationGetParallelWorkers
* Returns the relation's parallel_workers reloption setting.
* Note multiple eval of argument!
*/
#define RelationGetParallelWorkers(relation, defaultpw) \
((relation)->rd_options ? \
((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
----------------------

Since we add new struct " PartitionedTableRdOptions ", It seems we need to get parallel_workers in different way.
Do we need similar macro to get partitioned table's parallel_workers ?

Oh, you're right. The parallel_workers setting of a relation is only
accessed through this macro, even for partitioned tables, and I can
see that it is actually wrong to access a partitioned table's
parallel_workers through this macro as-is. Although I hadn't tested
that, so thanks for pointing that out.

Like:
#define PartitionedTableGetParallelWorkers(relation, defaultpw) \
xxx
(PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))

I'm thinking it would be better to just modify the existing macro to
check relkind to decide which struct pointer type to cast the value in
rd_options to.

I will post an updated patch later.

--
Amit Langote
EDB: http://www.enterprisedb.com

#19Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#18)
1 attachment(s)
Re: A reloption for partitioned tables - parallel_workers

On Mon, 2021-03-01 at 17:39 +0900, Amit Langote wrote:

I am not sure.
IMO, for normal table, we use the following macro to get the parallel_workers:
----------------------
/*
* RelationGetParallelWorkers
* Returns the relation's parallel_workers reloption setting.
* Note multiple eval of argument!
*/
#define RelationGetParallelWorkers(relation, defaultpw) \
((relation)->rd_options ? \
((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
----------------------
Since we add new struct " PartitionedTableRdOptions ", It seems we need to get parallel_workers in different way.
Do we need similar macro to get partitioned table's parallel_workers ?

Oh, you're right. The parallel_workers setting of a relation is only
accessed through this macro, even for partitioned tables, and I can
see that it is actually wrong to access a partitioned table's
parallel_workers through this macro as-is. Although I hadn't tested
that, so thanks for pointing that out.

Like:
#define PartitionedTableGetParallelWorkers(relation, defaultpw) \
xxx
(PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))

I'm thinking it would be better to just modify the existing macro to
check relkind to decide which struct pointer type to cast the value in
rd_options to.

Here is an updated patch with this fix.

I added regression tests and adapted the documentation a bit.

I also added support for lowering the number of parallel workers.

Yours,
Laurenz Albe

Attachments:

v3-0001-Allow-setting-parallel_workers-on-partitioned-tab.patchtext/x-patch; charset=UTF-8; name=v3-0001-Allow-setting-parallel_workers-on-partitioned-tab.patchDownload
From d48874a61ac698dc284b83b7e3b147a71158d09d Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Mon, 1 Mar 2021 16:06:49 +0100
Subject: [PATCH] Allow setting parallel_workers on partitioned tables.

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase (in my case, 700%) when used
with cstore_fdw column store partitions.

This is also useful to lower the number of parallel workers, for
example when the executor is expected to prune most partitions.

Authors: Seamus Abshere, Amit Langote, Laurenz Albe
Discussion: https://postgr.es/m/95b1dd96-8634-4545-b1de-e2ac779beb44%40www.fastmail.com
---
 doc/src/sgml/ref/create_table.sgml            |  15 +-
 src/backend/access/common/reloptions.c        |  14 +-
 src/backend/optimizer/path/allpaths.c         | 155 ++++++++++--------
 src/include/utils/rel.h                       |  15 +-
 .../regress/expected/partition_aggregate.out  |  51 +++++-
 src/test/regress/sql/partition_aggregate.sql  |  17 +-
 6 files changed, 188 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 3b2b227683..c8c14850c1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     If a table parameter value is set and the
     equivalent <literal>toast.</literal> parameter is not, the TOAST table
     will use the table's parameter value.
-    Specifying these parameters for partitioned tables is not supported,
-    but you may specify them for individual leaf partitions.
+    These parameters, with the exception of <literal>parallel_workers</literal>,
+    are not supported on partitioned tables, but you may specify them for
+    individual leaf partitions.
    </para>
 
    <variablelist>
@@ -1401,9 +1402,13 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      <para>
       This sets the number of workers that should be used to assist a parallel
       scan of this table.  If not set, the system will determine a value based
-      on the relation size.  The actual number of workers chosen by the planner
-      or by utility statements that use parallel scans may be less, for example
-      due to the setting of <xref linkend="guc-max-worker-processes"/>.
+      on the relation size and the number of scanned partitions.
+      When set on a partitioned table, the specified number of workers will
+      work on distinct partitions, so the number of partitions affected by the
+      parallel operation should be taken into account.
+      The actual number of workers chosen by the planner or by utility
+      statements that use parallel scans may be less, for example due
+      to the setting of <xref linkend="guc-max-worker-processes"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
 		{
 			"parallel_workers",
 			"Number of parallel processes that can be used per executor node for this relation.",
-			RELOPT_KIND_HEAP,
+			RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
 			ShareUpdateExclusiveLock
 		},
 		-1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
 	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
+	 * Currently the only setting known to be useful for partitioned tables
+	 * is parallel_workers.
 	 */
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(PartitionedTableRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 									  RELOPT_KIND_PARTITIONED,
-									  0, NULL, 0);
+									  sizeof(PartitionedTableRdOptions),
+									  tab, lengthof(tab));
 }
 
 /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..e22cb6f33e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,6 +97,9 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								Index rti, RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 									Index rti, RangeTblEntry *rte);
+static int compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+										   int num_live_children,
+										   bool parallel_append);
 static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 										 List *live_childrels,
 										 List *all_child_pathkeys);
@@ -1268,6 +1271,65 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	add_paths_to_append_rel(root, rel, live_childrels);
 }
 
+/*
+ * compute_append_parallel_workers
+ * 		Computes the number of workers to assign to scan the subpaths appended
+ * 		by a given Append path
+ */
+static int
+compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+								int num_live_children,
+								bool parallel_append)
+{
+	ListCell   *lc;
+	int			parallel_workers = 0;
+
+	/*
+	 * For partitioned rels, first see if there is a root-level setting for
+	 * parallel_workers.  But only consider if a Parallel Append plan is
+	 * to be considered.
+	 */
+	if (IS_PARTITIONED_REL(rel) &&
+		parallel_append &&
+		rel->rel_parallel_workers != -1)
+	{
+		parallel_workers = Min(rel->rel_parallel_workers,
+							   max_parallel_workers_per_gather);
+
+		/* an explicit setting overrides heuristics */
+		return parallel_workers;
+	}
+
+	/* Find the highest number of workers requested for any subpath. */
+	foreach(lc, subpaths)
+	{
+		Path	   *path = lfirst(lc);
+
+		parallel_workers = Max(parallel_workers, path->parallel_workers);
+	}
+	Assert(parallel_workers > 0 || subpaths == NIL);
+
+	/*
+	 * If the use of parallel append is permitted, always request at least
+	 * log2(# of children) workers.  We assume it can be useful to have
+	 * extra workers in this case because they will be spread out across
+	 * the children.  The precise formula is just a guess, but we don't
+	 * want to end up with a radically different answer for a table with N
+	 * partitions vs. an unpartitioned table with the same data, so the
+	 * use of some kind of log-scaling here seems to make some sense.
+	 */
+	if (parallel_append)
+	{
+		parallel_workers = Max(parallel_workers,
+							   fls(num_live_children));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+	}
+	Assert(parallel_workers > 0);
+
+	return parallel_workers;
+}
+
 
 /*
  * add_paths_to_append_rel
@@ -1464,50 +1526,28 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid && partial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, partial_subpaths,
+											list_length(live_childrels),
+											enable_parallel_append);
 
-		/* Find the highest number of workers requested for any subpath. */
-		foreach(lc, partial_subpaths)
+		if (parallel_workers > 0)
 		{
-			Path	   *path = lfirst(lc);
+			/* Generate a partial append path. */
+			appendpath = create_append_path(root, rel, NIL, partial_subpaths,
+											NIL, NULL, parallel_workers,
+											enable_parallel_append,
+											-1);
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+			/*
+			 * Make sure any subsequent partial paths use the same row count
+			 * estimate.
+			 */
+			partial_rows = appendpath->path.rows;
 
-		/*
-		 * If the use of parallel append is permitted, always request at least
-		 * log2(# of children) workers.  We assume it can be useful to have
-		 * extra workers in this case because they will be spread out across
-		 * the children.  The precise formula is just a guess, but we don't
-		 * want to end up with a radically different answer for a table with N
-		 * partitions vs. an unpartitioned table with the same data, so the
-		 * use of some kind of log-scaling here seems to make some sense.
-		 */
-		if (enable_parallel_append)
-		{
-			parallel_workers = Max(parallel_workers,
-								   fls(list_length(live_childrels)));
-			parallel_workers = Min(parallel_workers,
-								   max_parallel_workers_per_gather);
+			/* Add the path */
+			add_partial_path(rel, (Path *) appendpath);
 		}
-		Assert(parallel_workers > 0);
-
-		/* Generate a partial append path. */
-		appendpath = create_append_path(root, rel, NIL, partial_subpaths,
-										NIL, NULL, parallel_workers,
-										enable_parallel_append,
-										-1);
-
-		/*
-		 * Make sure any subsequent partial paths use the same row count
-		 * estimate.
-		 */
-		partial_rows = appendpath->path.rows;
-
-		/* Add the path. */
-		add_partial_path(rel, (Path *) appendpath);
 	}
 
 	/*
@@ -1519,36 +1559,19 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, pa_partial_subpaths,
+											list_length(live_childrels),
+											true);
 
-		/*
-		 * Find the highest number of workers requested for any partial
-		 * subpath.
-		 */
-		foreach(lc, pa_partial_subpaths)
+		if (parallel_workers > 0)
 		{
-			Path	   *path = lfirst(lc);
-
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
+			appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
+											pa_partial_subpaths,
+											NIL, NULL, parallel_workers, true,
+											partial_rows);
+			add_partial_path(rel, (Path *) appendpath);
 		}
-
-		/*
-		 * Same formula here as above.  It's even more important in this
-		 * instance because the non-partial paths won't contribute anything to
-		 * the planned number of parallel workers.
-		 */
-		parallel_workers = Max(parallel_workers,
-							   fls(list_length(live_childrels)));
-		parallel_workers = Min(parallel_workers,
-							   max_parallel_workers_per_gather);
-		Assert(parallel_workers > 0);
-
-		appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
-										pa_partial_subpaths,
-										NIL, NULL, parallel_workers, true,
-										partial_rows);
-		add_partial_path(rel, (Path *) appendpath);
 	}
 
 	/*
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 10b63982c0..efa7b714cb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
 } StdRdOptions;
 
+/*
+ * PartitionedTableRdOptions
+ * 		Contents of rd_options for partitioned tables
+ */
+typedef struct PartitionedTableRdOptions
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int			parallel_workers;	/* max number of parallel workers */
+} PartitionedTableRdOptions;
+
 #define HEAP_MIN_FILLFACTOR			10
 #define HEAP_DEFAULT_FILLFACTOR		100
 
@@ -359,7 +369,10 @@ typedef struct StdRdOptions
  */
 #define RelationGetParallelWorkers(relation, defaultpw) \
 	((relation)->rd_options ? \
-	 ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
+	 ((relation)->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ? \
+	  ((PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : \
+	  ((StdRdOptions *) (relation)->rd_options)->parallel_workers \
+	 ) : (defaultpw))
 
 /* ViewOptions->check_option values */
 typedef enum ViewOptCheckOption
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index dfa4b036b5..611dfc87ff 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -1166,9 +1166,58 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
  29 | 4500 |   500
 (12 rows)
 
--- Parallelism within partitionwise aggregates
+-- Override "parallel_workers" for a partitioned table
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
+SET max_parallel_workers_per_gather TO 8;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Append
+         ->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+               Filter: (b = 42)
+(9 rows)
+
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 6);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Gather
+   Workers Planned: 6
+   ->  Parallel Append
+         ->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+               Filter: (b = 42)
+(9 rows)
+
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         Filter: (b = 42)
+(7 rows)
+
+ALTER TABLE pagg_tab_ml RESET (parallel_workers);
+-- Parallelism within partitionwise aggregates
+SET max_parallel_workers_per_gather TO 2;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, thus we will have a partial aggregation for them.
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index c17294b15b..c058e2e181 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -253,10 +253,23 @@ EXPLAIN (COSTS OFF)
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3;
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3;
 
--- Parallelism within partitionwise aggregates
-
+-- Override "parallel_workers" for a partitioned table
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
+SET max_parallel_workers_per_gather TO 8;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 6);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml RESET (parallel_workers);
+
+-- Parallelism within partitionwise aggregates
+
+SET max_parallel_workers_per_gather TO 2;
 
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
-- 
2.26.2

#20Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#19)
Re: A reloption for partitioned tables - parallel_workers

On Tue, Mar 2, 2021 at 12:10 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

Here is an updated patch with this fix.

Thanks for updating the patch. I was about to post an updated version
myself but you beat me to it.

I added regression tests and adapted the documentation a bit.

I also added support for lowering the number of parallel workers.

+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                    QUERY PLAN
+---------------------------------------------------
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         Filter: (b = 42)
+(7 rows)

I got the same result with my implementation, but I am wondering if
setting parallel_workers=0 on the parent table shouldn't really
disable a regular (non-parallel-aware) Append running under Gather
even if it does Parallel Append (parallel-aware)? So in this test
case, there should have been a Gather atop Append, with individual
partitions scanned using Parallel Seq Scan where applicable.

--
Amit Langote
EDB: http://www.enterprisedb.com

#21Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#20)
Re: A reloption for partitioned tables - parallel_workers

On Tue, 2021-03-02 at 11:23 +0900, Amit Langote wrote:

+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                    QUERY PLAN
+---------------------------------------------------
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         Filter: (b = 42)
+(7 rows)

I got the same result with my implementation, but I am wondering if
setting parallel_workers=0 on the parent table shouldn't really
disable a regular (non-parallel-aware) Append running under Gather
even if it does Parallel Append (parallel-aware)? So in this test
case, there should have been a Gather atop Append, with individual
partitions scanned using Parallel Seq Scan where applicable.

I am not sure, but I tend to think that if you specify no
parallel workers, you want no parallel workers.

But I noticed the following:

SET enable_partitionwise_aggregate = on;

EXPLAIN (COSTS OFF)
SELECT count(*) FROM pagg_tab_ml;
QUERY PLAN
------------------------------------------------------------------------------
Finalize Aggregate
-> Gather
Workers Planned: 4
-> Parallel Append
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
(14 rows)

The default number of parallel workers is taken, because the append is
on an upper relation, not the partitioned table itself.

One would wish that "parallel_workers" somehow percolated up, but I
have no idea how that should work.

Yours,
Laurenz Albe

#22Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#21)
1 attachment(s)
Re: A reloption for partitioned tables - parallel_workers

On Tue, Mar 2, 2021 at 5:47 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2021-03-02 at 11:23 +0900, Amit Langote wrote:

+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                    QUERY PLAN
+---------------------------------------------------
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         Filter: (b = 42)
+(7 rows)

I got the same result with my implementation, but I am wondering if
setting parallel_workers=0 on the parent table shouldn't really
disable a regular (non-parallel-aware) Append running under Gather
even if it does Parallel Append (parallel-aware)? So in this test
case, there should have been a Gather atop Append, with individual
partitions scanned using Parallel Seq Scan where applicable.

I am not sure, but I tend to think that if you specify no
parallel workers, you want no parallel workers.

I am thinking that one would set parallel_workers on a parent
partitioned table to control only how many workers a Parallel Append
can spread across partitions or use parallel_workers=0 to disable this
form of partition parallelism. However, one may still want the
individual partitions to be scanned in parallel, where workers only
spread across the partition's blocks. IMO, we should try to keep
those two forms of parallelism separately configurable.

But I noticed the following:

SET enable_partitionwise_aggregate = on;

EXPLAIN (COSTS OFF)
SELECT count(*) FROM pagg_tab_ml;
QUERY PLAN
------------------------------------------------------------------------------
Finalize Aggregate
-> Gather
Workers Planned: 4
-> Parallel Append
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
(14 rows)

The default number of parallel workers is taken, because the append is
on an upper relation, not the partitioned table itself.

One would wish that "parallel_workers" somehow percolated up,

I would have liked that too.

but I
have no idea how that should work.

It appears that we don't set the fields of an upper relation such that
IS_PARTITIONED_REL() would return true for it, like we do for base and
join relations. In compute_append_parallel_workers(), we're requiring
it to be true to even look at the relation's rel_parallel_workers. We
can set those properties in *some* grouping rels, for example, when
the aggregation is grouped on the input relation's partition key.
That would make it possible for the Append on such grouping relations
to refer to their input partitioned relation's rel_parallel_workers.
For example, with the attached PoC patch:

SET parallel_setup_cost TO 0;
SET max_parallel_workers_per_gather TO 8;
SET enable_partitionwise_aggregate = on;

alter table pagg_tab_ml set (parallel_workers=5);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
QUERY PLAN
---------------------------------------------------------------------
Gather
Workers Planned: 5
-> Parallel Append
-> HashAggregate
Group Key: pagg_tab_ml_5.a
-> Append
-> Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-> Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
-> HashAggregate
Group Key: pagg_tab_ml.a
-> Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> HashAggregate
Group Key: pagg_tab_ml_2.a
-> Append
-> Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
(16 rows)

alter table pagg_tab_ml set (parallel_workers=0);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
QUERY PLAN
------------------------------------------------------------------------------------------
Append
-> Finalize GroupAggregate
Group Key: pagg_tab_ml.a
-> Gather Merge
Workers Planned: 1
-> Sort
Sort Key: pagg_tab_ml.a
-> Partial HashAggregate
Group Key: pagg_tab_ml.a
-> Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> Finalize GroupAggregate
Group Key: pagg_tab_ml_2.a
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: pagg_tab_ml_2.a
-> Parallel Append
-> Partial HashAggregate
Group Key: pagg_tab_ml_2.a
-> Parallel Seq Scan on
pagg_tab_ml_p2_s1 pagg_tab_ml_2
-> Partial HashAggregate
Group Key: pagg_tab_ml_3.a
-> Parallel Seq Scan on
pagg_tab_ml_p2_s2 pagg_tab_ml_3
-> Finalize GroupAggregate
Group Key: pagg_tab_ml_5.a
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: pagg_tab_ml_5.a
-> Parallel Append
-> Partial HashAggregate
Group Key: pagg_tab_ml_5.a
-> Parallel Seq Scan on
pagg_tab_ml_p3_s1 pagg_tab_ml_5
-> Partial HashAggregate
Group Key: pagg_tab_ml_6.a
-> Parallel Seq Scan on
pagg_tab_ml_p3_s2 pagg_tab_ml_6
(36 rows)

alter table pagg_tab_ml set (parallel_workers=9);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
QUERY PLAN
---------------------------------------------------------------------
Gather
Workers Planned: 8
-> Parallel Append
-> HashAggregate
Group Key: pagg_tab_ml_5.a
-> Append
-> Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-> Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
-> HashAggregate
Group Key: pagg_tab_ml.a
-> Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> HashAggregate
Group Key: pagg_tab_ml_2.a
-> Append
-> Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
(16 rows)

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

grouping-append-parallel_workers.patchapplication/octet-stream; name=grouping-append-parallel_workers.patchDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 545b56bcaf..ebb99a71d2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3938,6 +3938,9 @@ make_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 	grouped_rel->fdwroutine = input_rel->fdwroutine;
 
+	/* Copy parallel_workers. */
+	grouped_rel->rel_parallel_workers = input_rel->rel_parallel_workers;
+
 	return grouped_rel;
 }
 
@@ -7605,6 +7608,20 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	Assert(patype != PARTITIONWISE_AGGREGATE_PARTIAL ||
 		   partially_grouped_rel != NULL);
 
+	/* Like input rel, make fully grouped rel appear partitioned too. */
+	Assert(IS_PARTITIONED_REL(input_rel));
+	if (patype == PARTITIONWISE_AGGREGATE_FULL)
+	{
+		grouped_rel->part_scheme = input_rel->part_scheme;
+		grouped_rel->partexprs = input_rel->partexprs;
+		grouped_rel->nullable_partexprs = input_rel->nullable_partexprs;
+		grouped_rel->boundinfo = input_rel->boundinfo;
+		grouped_rel->nparts = nparts;
+		Assert(grouped_rel->part_rels == NULL);
+		grouped_rel->part_rels =
+			(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
+	}
+
 	/* Add paths for partitionwise aggregation/grouping. */
 	for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
 	{
@@ -7680,6 +7697,8 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			set_cheapest(child_grouped_rel);
 			grouped_live_children = lappend(grouped_live_children,
 											child_grouped_rel);
+			Assert(grouped_rel->part_rels[cnt_parts] == NULL);
+			grouped_rel->part_rels[cnt_parts] = child_grouped_rel;
 		}
 
 		pfree(appinfos);
#23Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#22)
Re: A reloption for partitioned tables - parallel_workers

On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:

On Tue, Mar 2, 2021 at 5:47 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Tue, 2021-03-02 at 11:23 +0900, Amit Langote wrote:

I got the same result with my implementation, but I am wondering if
setting parallel_workers=0 on the parent table shouldn't really
disable a regular (non-parallel-aware) Append running under Gather
even if it does Parallel Append (parallel-aware)? So in this test
case, there should have been a Gather atop Append, with individual
partitions scanned using Parallel Seq Scan where applicable.

I am not sure, but I tend to think that if you specify no
parallel workers, you want no parallel workers.

I am thinking that one would set parallel_workers on a parent
partitioned table to control only how many workers a Parallel Append
can spread across partitions or use parallel_workers=0 to disable this
form of partition parallelism. However, one may still want the
individual partitions to be scanned in parallel, where workers only
spread across the partition's blocks. IMO, we should try to keep
those two forms of parallelism separately configurable.

I see your point.

I thought that PostgreSQL might consider such a plan anyway, but
I am not deep enough into the partitioning code to know.

Thinking this further, wouldn't that mean that we get into a
conflict if someone sets "parallel_workers" on both a partition and
the partitioned table? Which setting should win?

SET enable_partitionwise_aggregate = on;

EXPLAIN (COSTS OFF)
SELECT count(*) FROM pagg_tab_ml;
QUERY PLAN
------------------------------------------------------------------------------
Finalize Aggregate
-> Gather
Workers Planned: 4
-> Parallel Append
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_3
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_1
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_4
-> Partial Aggregate
-> Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_2
(14 rows)

The default number of parallel workers is taken, because the append is
on an upper relation, not the partitioned table itself.

One would wish that "parallel_workers" somehow percolated up,

It appears that we don't set the fields of an upper relation such that
IS_PARTITIONED_REL() would return true for it, like we do for base and
join relations. In compute_append_parallel_workers(), we're requiring
it to be true to even look at the relation's rel_parallel_workers. We
can set those properties in *some* grouping rels, for example, when
the aggregation is grouped on the input relation's partition key.
That would make it possible for the Append on such grouping relations
to refer to their input partitioned relation's rel_parallel_workers.
For example, with the attached PoC patch:

SET parallel_setup_cost TO 0;
SET max_parallel_workers_per_gather TO 8;
SET enable_partitionwise_aggregate = on;

alter table pagg_tab_ml set (parallel_workers=5);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
QUERY PLAN
---------------------------------------------------------------------
Gather
Workers Planned: 5
-> Parallel Append
-> HashAggregate
Group Key: pagg_tab_ml_5.a
-> Append
-> Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-> Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
-> HashAggregate
Group Key: pagg_tab_ml.a
-> Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> HashAggregate
Group Key: pagg_tab_ml_2.a
-> Append
-> Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
(16 rows)

alter table pagg_tab_ml set (parallel_workers=0);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
QUERY PLAN
------------------------------------------------------------------------------------------
Append
-> Finalize GroupAggregate
Group Key: pagg_tab_ml.a
-> Gather Merge
Workers Planned: 1
-> Sort
Sort Key: pagg_tab_ml.a
-> Partial HashAggregate
Group Key: pagg_tab_ml.a
-> Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> Finalize GroupAggregate
Group Key: pagg_tab_ml_2.a
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: pagg_tab_ml_2.a
-> Parallel Append
-> Partial HashAggregate
Group Key: pagg_tab_ml_2.a
-> Parallel Seq Scan on
pagg_tab_ml_p2_s1 pagg_tab_ml_2
-> Partial HashAggregate
Group Key: pagg_tab_ml_3.a
-> Parallel Seq Scan on
pagg_tab_ml_p2_s2 pagg_tab_ml_3
-> Finalize GroupAggregate
Group Key: pagg_tab_ml_5.a
-> Gather Merge
Workers Planned: 2
-> Sort
Sort Key: pagg_tab_ml_5.a
-> Parallel Append
-> Partial HashAggregate
Group Key: pagg_tab_ml_5.a
-> Parallel Seq Scan on
pagg_tab_ml_p3_s1 pagg_tab_ml_5
-> Partial HashAggregate
Group Key: pagg_tab_ml_6.a
-> Parallel Seq Scan on
pagg_tab_ml_p3_s2 pagg_tab_ml_6
(36 rows)

alter table pagg_tab_ml set (parallel_workers=9);

EXPLAIN (COSTS OFF) SELECT a, count(*) FROM pagg_tab_ml GROUP BY 1;
QUERY PLAN
---------------------------------------------------------------------
Gather
Workers Planned: 8
-> Parallel Append
-> HashAggregate
Group Key: pagg_tab_ml_5.a
-> Append
-> Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
-> Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
-> HashAggregate
Group Key: pagg_tab_ml.a
-> Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
-> HashAggregate
Group Key: pagg_tab_ml_2.a
-> Append
-> Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
-> Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
(16 rows)

That looks good!

One could imagine similar behavior for partitionwise joins, but
it might be difficult to decide which side should determine the
number of parallel workers.

I think that with this addition, this patch would make a useful improvement.

Yours,
Laurenz Albe

#24Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#22)
1 attachment(s)
Re: A reloption for partitioned tables - parallel_workers

On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:

For example, with the attached PoC patch:

I have incorporated your POC patch and added a regression test.

I didn't test it thoroughly though.

Yours,
Laurenz Albe

Attachments:

v4-0001-Allow-setting-parallel_workers-on-partitioned-tables.patchtext/x-patch; charset=UTF-8; name*0=v4-0001-Allow-setting-parallel_workers-on-partitioned-tables.patc; name*1=hDownload
From 34f7da98b34bc1dbf7daca9e2ca6055c80d77f43 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Fri, 5 Mar 2021 14:43:34 +0100
Subject: [PATCH] Allow setting parallel_workers on partitioned tables.

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase when used with cstore_fdw
column store partitions.

This is also useful to lower the number of parallel workers, for
example when the executor is expected to prune most partitions.

The setting also affects parallel appends for partitionwise aggregates
on a partitioned table, but not partitionwise joins.

Authors: Seamus Abshere, Amit Langote, Laurenz Albe
Discussion: https://postgr.es/m/95b1dd96-8634-4545-b1de-e2ac779beb44%40www.fastmail.com
---
 doc/src/sgml/ref/create_table.sgml            |  15 +-
 src/backend/access/common/reloptions.c        |  14 +-
 src/backend/optimizer/path/allpaths.c         | 155 ++++++++++--------
 src/backend/optimizer/plan/planner.c          |  19 +++
 src/include/utils/rel.h                       |  15 +-
 .../regress/expected/partition_aggregate.out  |  73 ++++++++-
 src/test/regress/sql/partition_aggregate.sql  |  19 ++-
 7 files changed, 231 insertions(+), 79 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 3b2b227683..c8c14850c1 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1337,8 +1337,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     If a table parameter value is set and the
     equivalent <literal>toast.</literal> parameter is not, the TOAST table
     will use the table's parameter value.
-    Specifying these parameters for partitioned tables is not supported,
-    but you may specify them for individual leaf partitions.
+    These parameters, with the exception of <literal>parallel_workers</literal>,
+    are not supported on partitioned tables, but you may specify them for
+    individual leaf partitions.
    </para>
 
    <variablelist>
@@ -1401,9 +1402,13 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      <para>
       This sets the number of workers that should be used to assist a parallel
       scan of this table.  If not set, the system will determine a value based
-      on the relation size.  The actual number of workers chosen by the planner
-      or by utility statements that use parallel scans may be less, for example
-      due to the setting of <xref linkend="guc-max-worker-processes"/>.
+      on the relation size and the number of scanned partitions.
+      When set on a partitioned table, the specified number of workers will
+      work on distinct partitions, so the number of partitions affected by the
+      parallel operation should be taken into account.
+      The actual number of workers chosen by the planner or by utility
+      statements that use parallel scans may be less, for example due
+      to the setting of <xref linkend="guc-max-worker-processes"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index c687d3ee9e..f8443d2361 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -377,7 +377,7 @@ static relopt_int intRelOpts[] =
 		{
 			"parallel_workers",
 			"Number of parallel processes that can be used per executor node for this relation.",
-			RELOPT_KIND_HEAP,
+			RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
 			ShareUpdateExclusiveLock
 		},
 		-1, 0, 1024
@@ -1962,12 +1962,18 @@ bytea *
 partitioned_table_reloptions(Datum reloptions, bool validate)
 {
 	/*
-	 * There are no options for partitioned tables yet, but this is able to do
-	 * some validation.
+	 * Currently the only setting known to be useful for partitioned tables
+	 * is parallel_workers.
 	 */
+	static const relopt_parse_elt tab[] = {
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(PartitionedTableRdOptions, parallel_workers)},
+	};
+
 	return (bytea *) build_reloptions(reloptions, validate,
 									  RELOPT_KIND_PARTITIONED,
-									  0, NULL, 0);
+									  sizeof(PartitionedTableRdOptions),
+									  tab, lengthof(tab));
 }
 
 /*
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..e22cb6f33e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,6 +97,9 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								Index rti, RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 									Index rti, RangeTblEntry *rte);
+static int compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+										   int num_live_children,
+										   bool parallel_append);
 static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 										 List *live_childrels,
 										 List *all_child_pathkeys);
@@ -1268,6 +1271,65 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	add_paths_to_append_rel(root, rel, live_childrels);
 }
 
+/*
+ * compute_append_parallel_workers
+ * 		Computes the number of workers to assign to scan the subpaths appended
+ * 		by a given Append path
+ */
+static int
+compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+								int num_live_children,
+								bool parallel_append)
+{
+	ListCell   *lc;
+	int			parallel_workers = 0;
+
+	/*
+	 * For partitioned rels, first see if there is a root-level setting for
+	 * parallel_workers.  But only consider if a Parallel Append plan is
+	 * to be considered.
+	 */
+	if (IS_PARTITIONED_REL(rel) &&
+		parallel_append &&
+		rel->rel_parallel_workers != -1)
+	{
+		parallel_workers = Min(rel->rel_parallel_workers,
+							   max_parallel_workers_per_gather);
+
+		/* an explicit setting overrides heuristics */
+		return parallel_workers;
+	}
+
+	/* Find the highest number of workers requested for any subpath. */
+	foreach(lc, subpaths)
+	{
+		Path	   *path = lfirst(lc);
+
+		parallel_workers = Max(parallel_workers, path->parallel_workers);
+	}
+	Assert(parallel_workers > 0 || subpaths == NIL);
+
+	/*
+	 * If the use of parallel append is permitted, always request at least
+	 * log2(# of children) workers.  We assume it can be useful to have
+	 * extra workers in this case because they will be spread out across
+	 * the children.  The precise formula is just a guess, but we don't
+	 * want to end up with a radically different answer for a table with N
+	 * partitions vs. an unpartitioned table with the same data, so the
+	 * use of some kind of log-scaling here seems to make some sense.
+	 */
+	if (parallel_append)
+	{
+		parallel_workers = Max(parallel_workers,
+							   fls(num_live_children));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+	}
+	Assert(parallel_workers > 0);
+
+	return parallel_workers;
+}
+
 
 /*
  * add_paths_to_append_rel
@@ -1464,50 +1526,28 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid && partial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, partial_subpaths,
+											list_length(live_childrels),
+											enable_parallel_append);
 
-		/* Find the highest number of workers requested for any subpath. */
-		foreach(lc, partial_subpaths)
+		if (parallel_workers > 0)
 		{
-			Path	   *path = lfirst(lc);
+			/* Generate a partial append path. */
+			appendpath = create_append_path(root, rel, NIL, partial_subpaths,
+											NIL, NULL, parallel_workers,
+											enable_parallel_append,
+											-1);
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+			/*
+			 * Make sure any subsequent partial paths use the same row count
+			 * estimate.
+			 */
+			partial_rows = appendpath->path.rows;
 
-		/*
-		 * If the use of parallel append is permitted, always request at least
-		 * log2(# of children) workers.  We assume it can be useful to have
-		 * extra workers in this case because they will be spread out across
-		 * the children.  The precise formula is just a guess, but we don't
-		 * want to end up with a radically different answer for a table with N
-		 * partitions vs. an unpartitioned table with the same data, so the
-		 * use of some kind of log-scaling here seems to make some sense.
-		 */
-		if (enable_parallel_append)
-		{
-			parallel_workers = Max(parallel_workers,
-								   fls(list_length(live_childrels)));
-			parallel_workers = Min(parallel_workers,
-								   max_parallel_workers_per_gather);
+			/* Add the path */
+			add_partial_path(rel, (Path *) appendpath);
 		}
-		Assert(parallel_workers > 0);
-
-		/* Generate a partial append path. */
-		appendpath = create_append_path(root, rel, NIL, partial_subpaths,
-										NIL, NULL, parallel_workers,
-										enable_parallel_append,
-										-1);
-
-		/*
-		 * Make sure any subsequent partial paths use the same row count
-		 * estimate.
-		 */
-		partial_rows = appendpath->path.rows;
-
-		/* Add the path. */
-		add_partial_path(rel, (Path *) appendpath);
 	}
 
 	/*
@@ -1519,36 +1559,19 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, pa_partial_subpaths,
+											list_length(live_childrels),
+											true);
 
-		/*
-		 * Find the highest number of workers requested for any partial
-		 * subpath.
-		 */
-		foreach(lc, pa_partial_subpaths)
+		if (parallel_workers > 0)
 		{
-			Path	   *path = lfirst(lc);
-
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
+			appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
+											pa_partial_subpaths,
+											NIL, NULL, parallel_workers, true,
+											partial_rows);
+			add_partial_path(rel, (Path *) appendpath);
 		}
-
-		/*
-		 * Same formula here as above.  It's even more important in this
-		 * instance because the non-partial paths won't contribute anything to
-		 * the planned number of parallel workers.
-		 */
-		parallel_workers = Max(parallel_workers,
-							   fls(list_length(live_childrels)));
-		parallel_workers = Min(parallel_workers,
-							   max_parallel_workers_per_gather);
-		Assert(parallel_workers > 0);
-
-		appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
-										pa_partial_subpaths,
-										NIL, NULL, parallel_workers, true,
-										partial_rows);
-		add_partial_path(rel, (Path *) appendpath);
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 545b56bcaf..ebb99a71d2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3938,6 +3938,9 @@ make_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 	grouped_rel->fdwroutine = input_rel->fdwroutine;
 
+	/* Copy parallel_workers. */
+	grouped_rel->rel_parallel_workers = input_rel->rel_parallel_workers;
+
 	return grouped_rel;
 }
 
@@ -7605,6 +7608,20 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	Assert(patype != PARTITIONWISE_AGGREGATE_PARTIAL ||
 		   partially_grouped_rel != NULL);
 
+	/* Like input rel, make fully grouped rel appear partitioned too. */
+	Assert(IS_PARTITIONED_REL(input_rel));
+	if (patype == PARTITIONWISE_AGGREGATE_FULL)
+	{
+		grouped_rel->part_scheme = input_rel->part_scheme;
+		grouped_rel->partexprs = input_rel->partexprs;
+		grouped_rel->nullable_partexprs = input_rel->nullable_partexprs;
+		grouped_rel->boundinfo = input_rel->boundinfo;
+		grouped_rel->nparts = nparts;
+		Assert(grouped_rel->part_rels == NULL);
+		grouped_rel->part_rels =
+			(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
+	}
+
 	/* Add paths for partitionwise aggregation/grouping. */
 	for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
 	{
@@ -7680,6 +7697,8 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			set_cheapest(child_grouped_rel);
 			grouped_live_children = lappend(grouped_live_children,
 											child_grouped_rel);
+			Assert(grouped_rel->part_rels[cnt_parts] == NULL);
+			grouped_rel->part_rels[cnt_parts] = child_grouped_rel;
 		}
 
 		pfree(appinfos);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 10b63982c0..efa7b714cb 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -308,6 +308,16 @@ typedef struct StdRdOptions
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
 } StdRdOptions;
 
+/*
+ * PartitionedTableRdOptions
+ * 		Contents of rd_options for partitioned tables
+ */
+typedef struct PartitionedTableRdOptions
+{
+	int32		vl_len_;		/* varlena header (do not touch directly!) */
+	int			parallel_workers;	/* max number of parallel workers */
+} PartitionedTableRdOptions;
+
 #define HEAP_MIN_FILLFACTOR			10
 #define HEAP_DEFAULT_FILLFACTOR		100
 
@@ -359,7 +369,10 @@ typedef struct StdRdOptions
  */
 #define RelationGetParallelWorkers(relation, defaultpw) \
 	((relation)->rd_options ? \
-	 ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
+	 ((relation)->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ? \
+	  ((PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : \
+	  ((StdRdOptions *) (relation)->rd_options)->parallel_workers \
+	 ) : (defaultpw))
 
 /* ViewOptions->check_option values */
 typedef enum ViewOptCheckOption
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index dfa4b036b5..2dbdf39957 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -1166,9 +1166,80 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
  29 | 4500 |   500
 (12 rows)
 
--- Parallelism within partitionwise aggregates
+-- Override "parallel_workers" for a partitioned table
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
+SET max_parallel_workers_per_gather TO 8;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Append
+         ->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+               Filter: (b = 42)
+(9 rows)
+
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 6);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Gather
+   Workers Planned: 6
+   ->  Parallel Append
+         ->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+               Filter: (b = 42)
+(9 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT a, count(*) FROM pagg_tab_ml GROUP BY a;
+                             QUERY PLAN                              
+---------------------------------------------------------------------
+ Gather
+   Workers Planned: 6
+   ->  Parallel Append
+         ->  HashAggregate
+               Group Key: pagg_tab_ml_5.a
+               ->  Append
+                     ->  Seq Scan on pagg_tab_ml_p3_s1 pagg_tab_ml_5
+                     ->  Seq Scan on pagg_tab_ml_p3_s2 pagg_tab_ml_6
+         ->  HashAggregate
+               Group Key: pagg_tab_ml.a
+               ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml
+         ->  HashAggregate
+               Group Key: pagg_tab_ml_2.a
+               ->  Append
+                     ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+                     ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+(16 rows)
+
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         Filter: (b = 42)
+(7 rows)
+
+ALTER TABLE pagg_tab_ml RESET (parallel_workers);
+-- Parallelism within partitionwise aggregates
+SET max_parallel_workers_per_gather TO 2;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, thus we will have a partial aggregation for them.
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index c17294b15b..1269a86792 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -253,10 +253,25 @@ EXPLAIN (COSTS OFF)
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3;
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3;
 
--- Parallelism within partitionwise aggregates
-
+-- Override "parallel_workers" for a partitioned table
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
+SET max_parallel_workers_per_gather TO 8;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 6);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+EXPLAIN (COSTS OFF)
+SELECT a, count(*) FROM pagg_tab_ml GROUP BY a;
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml RESET (parallel_workers);
+
+-- Parallelism within partitionwise aggregates
+
+SET max_parallel_workers_per_gather TO 2;
 
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
-- 
2.26.2

#25Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#24)
Re: A reloption for partitioned tables - parallel_workers

On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:

For example, with the attached PoC patch:

I have incorporated your POC patch and added a regression test.

I didn't test it thoroughly though.

Thanks. Although, I wonder if we should rather consider it a
standalone patch to fix a partition planning code deficiency.

--
Amit Langote
EDB: http://www.enterprisedb.com

#26Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Amit Langote (#25)
Re: A reloption for partitioned tables - parallel_workers

On Fri, 2021-03-05 at 22:55 +0900, Amit Langote wrote:

On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:

For example, with the attached PoC patch:

I have incorporated your POC patch and added a regression test.

I didn't test it thoroughly though.

Thanks. Although, I wonder if we should rather consider it a
standalone patch to fix a partition planning code deficiency.

Oh - I didn't realize that your patch was independent.

Yours,
Laurenz Albe

#27Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#26)
2 attachment(s)
Re: A reloption for partitioned tables - parallel_workers

On Fri, Mar 5, 2021 at 11:06 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Fri, 2021-03-05 at 22:55 +0900, Amit Langote wrote:

On Fri, Mar 5, 2021 at 10:47 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2021-03-03 at 17:58 +0900, Amit Langote wrote:

For example, with the attached PoC patch:

I have incorporated your POC patch and added a regression test.

I didn't test it thoroughly though.

Thanks. Although, I wonder if we should rather consider it a
standalone patch to fix a partition planning code deficiency.

Oh - I didn't realize that your patch was independent.

Attached a new version rebased over c8f78b616, with the grouping
relation partitioning enhancements as a separate patch 0001. Sorry
about the delay.

I'd also like to change compute_append_parallel_workers(), as also
mentioned upthread, such that disabling Parallel Append by setting
parallel_workers=0 on a parent partitioned table does not also disable
the partitions themselves being scanned in parallel even though under
an Append. I didn't get time today to work on that though.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachments:

v5-0001-Mark-fully-grouped-relations-partitioned-if-input.patchapplication/octet-stream; name=v5-0001-Mark-fully-grouped-relations-partitioned-if-input.patchDownload
From 140866a7fbd18044301fc86580ff0602bec681c4 Mon Sep 17 00:00:00 2001
From: amitlan <amitlangote09@gmail.com>
Date: Thu, 18 Mar 2021 21:34:21 +0900
Subject: [PATCH v5 1/2] Mark fully grouped relations partitioned if input
 relation is

When a grouped aggregate is pushed to individual partitions of a
partitioned base/join relation (partitionwise aggregation enabled),
the parent grouping upper rel follows the same partitioning scheme as
the input parent base/join rel in principle, although that isn't
reflected in the former's RelOptInfo.

This commit fixes that situation.  The main benefit is that anything
that relies on using IS_PARTITIONED_REL() macro to apply certain
operations to partitioned relations can now also recognize grouping
rels.
---
 src/backend/optimizer/plan/planner.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 424d25cbd5..2317231be5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7620,6 +7620,23 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 	Assert(patype != PARTITIONWISE_AGGREGATE_PARTIAL ||
 		   partially_grouped_rel != NULL);
 
+	/*
+	 * Make fully grouped rels appear partitioned like the input rel with
+	 * proprties same as the latter.
+	 */
+	Assert(IS_PARTITIONED_REL(input_rel));
+	if (patype == PARTITIONWISE_AGGREGATE_FULL)
+	{
+		grouped_rel->part_scheme = input_rel->part_scheme;
+		grouped_rel->partexprs = input_rel->partexprs;
+		grouped_rel->nullable_partexprs = input_rel->nullable_partexprs;
+		grouped_rel->boundinfo = input_rel->boundinfo;
+		grouped_rel->nparts = nparts;
+		Assert(grouped_rel->part_rels == NULL);
+		grouped_rel->part_rels =
+			(RelOptInfo **) palloc0(sizeof(RelOptInfo *) * nparts);
+	}
+
 	/* Add paths for partitionwise aggregation/grouping. */
 	for (cnt_parts = 0; cnt_parts < nparts; cnt_parts++)
 	{
@@ -7695,6 +7712,8 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			set_cheapest(child_grouped_rel);
 			grouped_live_children = lappend(grouped_live_children,
 											child_grouped_rel);
+			Assert(grouped_rel->part_rels[cnt_parts] == NULL);
+			grouped_rel->part_rels[cnt_parts] = child_grouped_rel;
 		}
 
 		pfree(appinfos);
-- 
2.24.1

v5-0002-Allow-setting-parallel_workers-on-partitioned-tab.patchapplication/octet-stream; name=v5-0002-Allow-setting-parallel_workers-on-partitioned-tab.patchDownload
From af6e6ddff64829c1b79f01f57fcd241cd25e10f1 Mon Sep 17 00:00:00 2001
From: Laurenz Albe <laurenz.albe@cybertec.at>
Date: Mon, 1 Mar 2021 16:06:49 +0100
Subject: [PATCH v5 2/2] Allow setting parallel_workers on partitioned tables.

Sometimes it's beneficial to do

ALTER TABLE my_part_tbl SET (parallel_workers = 32);

when the query planner is planning too few workers to read from your
partitions. For example, if you have a partitioned table where each
partition is a foreign table that cannot itself have this setting on it.
Shown to give a 100%+ performance increase (in my case, 700%) when used
with cstore_fdw column store partitions.

This is also useful to lower the number of parallel workers, for
example when the executor is expected to prune most partitions.

Authors: Seamus Abshere, Amit Langote, Laurenz Albe
Discussion: https://postgr.es/m/95b1dd96-8634-4545-b1de-e2ac779beb44%40www.fastmail.com
---
 doc/src/sgml/ref/create_table.sgml            |  13 +-
 src/backend/access/common/reloptions.c        |   6 +-
 src/backend/optimizer/path/allpaths.c         | 155 ++++++++++--------
 src/backend/optimizer/plan/planner.c          |   3 +
 src/include/utils/rel.h                       |  22 ++-
 .../regress/expected/partition_aggregate.out  |  51 +++++-
 src/test/regress/sql/partition_aggregate.sql  |  17 +-
 7 files changed, 183 insertions(+), 84 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index 1fe4fb6e36..af0bb62625 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1339,7 +1339,8 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
     equivalent <literal>toast.</literal> parameter is not, the TOAST table
     will use the table's parameter value.
     These parameters, with the exception of
-    <literal>parallel_insert_enabled</literal>, are not supported on partitioned
+    <literal>parallel_insert_enabled</literal> and
+    <literal>parallel_workers</literal>, are not supported on partitioned
     tables, but may be specified for individual leaf partitions.
    </para>
 
@@ -1403,9 +1404,13 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      <para>
       This sets the number of workers that should be used to assist a parallel
       scan of this table.  If not set, the system will determine a value based
-      on the relation size.  The actual number of workers chosen by the planner
-      or by utility statements that use parallel scans may be less, for example
-      due to the setting of <xref linkend="guc-max-worker-processes"/>.
+      on the relation size and the number of scanned partitions.
+      When set on a partitioned table, the specified number of workers will
+      work on distinct partitions, so the number of partitions affected by the
+      parallel operation should be taken into account.
+      The actual number of workers chosen by the planner or by utility
+      statements that use parallel scans may be less, for example due
+      to the setting of <xref linkend="guc-max-worker-processes"/>.
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 5a0ae99750..e778800d56 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -386,7 +386,7 @@ static relopt_int intRelOpts[] =
 		{
 			"parallel_workers",
 			"Number of parallel processes that can be used per executor node for this relation.",
-			RELOPT_KIND_HEAP,
+			RELOPT_KIND_HEAP | RELOPT_KIND_PARTITIONED,
 			ShareUpdateExclusiveLock
 		},
 		-1, 0, 1024
@@ -1974,7 +1974,9 @@ partitioned_table_reloptions(Datum reloptions, bool validate)
 {
 	static const relopt_parse_elt tab[] = {
 		{"parallel_insert_enabled", RELOPT_TYPE_BOOL,
-		offsetof(PartitionedTableRdOptions, parallel_insert_enabled)}
+		offsetof(PartitionedTableRdOptions, parallel_insert_enabled)},
+		{"parallel_workers", RELOPT_TYPE_INT,
+		offsetof(PartitionedTableRdOptions, parallel_workers)},
 	};
 
 	return (bytea *) build_reloptions(reloptions, validate,
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d73ac562eb..e22cb6f33e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -97,6 +97,9 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								Index rti, RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 									Index rti, RangeTblEntry *rte);
+static int compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+										   int num_live_children,
+										   bool parallel_append);
 static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 										 List *live_childrels,
 										 List *all_child_pathkeys);
@@ -1268,6 +1271,65 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	add_paths_to_append_rel(root, rel, live_childrels);
 }
 
+/*
+ * compute_append_parallel_workers
+ * 		Computes the number of workers to assign to scan the subpaths appended
+ * 		by a given Append path
+ */
+static int
+compute_append_parallel_workers(RelOptInfo *rel, List *subpaths,
+								int num_live_children,
+								bool parallel_append)
+{
+	ListCell   *lc;
+	int			parallel_workers = 0;
+
+	/*
+	 * For partitioned rels, first see if there is a root-level setting for
+	 * parallel_workers.  But only consider if a Parallel Append plan is
+	 * to be considered.
+	 */
+	if (IS_PARTITIONED_REL(rel) &&
+		parallel_append &&
+		rel->rel_parallel_workers != -1)
+	{
+		parallel_workers = Min(rel->rel_parallel_workers,
+							   max_parallel_workers_per_gather);
+
+		/* an explicit setting overrides heuristics */
+		return parallel_workers;
+	}
+
+	/* Find the highest number of workers requested for any subpath. */
+	foreach(lc, subpaths)
+	{
+		Path	   *path = lfirst(lc);
+
+		parallel_workers = Max(parallel_workers, path->parallel_workers);
+	}
+	Assert(parallel_workers > 0 || subpaths == NIL);
+
+	/*
+	 * If the use of parallel append is permitted, always request at least
+	 * log2(# of children) workers.  We assume it can be useful to have
+	 * extra workers in this case because they will be spread out across
+	 * the children.  The precise formula is just a guess, but we don't
+	 * want to end up with a radically different answer for a table with N
+	 * partitions vs. an unpartitioned table with the same data, so the
+	 * use of some kind of log-scaling here seems to make some sense.
+	 */
+	if (parallel_append)
+	{
+		parallel_workers = Max(parallel_workers,
+							   fls(num_live_children));
+		parallel_workers = Min(parallel_workers,
+							   max_parallel_workers_per_gather);
+	}
+	Assert(parallel_workers > 0);
+
+	return parallel_workers;
+}
+
 
 /*
  * add_paths_to_append_rel
@@ -1464,50 +1526,28 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (partial_subpaths_valid && partial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, partial_subpaths,
+											list_length(live_childrels),
+											enable_parallel_append);
 
-		/* Find the highest number of workers requested for any subpath. */
-		foreach(lc, partial_subpaths)
+		if (parallel_workers > 0)
 		{
-			Path	   *path = lfirst(lc);
+			/* Generate a partial append path. */
+			appendpath = create_append_path(root, rel, NIL, partial_subpaths,
+											NIL, NULL, parallel_workers,
+											enable_parallel_append,
+											-1);
 
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
-		}
-		Assert(parallel_workers > 0);
+			/*
+			 * Make sure any subsequent partial paths use the same row count
+			 * estimate.
+			 */
+			partial_rows = appendpath->path.rows;
 
-		/*
-		 * If the use of parallel append is permitted, always request at least
-		 * log2(# of children) workers.  We assume it can be useful to have
-		 * extra workers in this case because they will be spread out across
-		 * the children.  The precise formula is just a guess, but we don't
-		 * want to end up with a radically different answer for a table with N
-		 * partitions vs. an unpartitioned table with the same data, so the
-		 * use of some kind of log-scaling here seems to make some sense.
-		 */
-		if (enable_parallel_append)
-		{
-			parallel_workers = Max(parallel_workers,
-								   fls(list_length(live_childrels)));
-			parallel_workers = Min(parallel_workers,
-								   max_parallel_workers_per_gather);
+			/* Add the path */
+			add_partial_path(rel, (Path *) appendpath);
 		}
-		Assert(parallel_workers > 0);
-
-		/* Generate a partial append path. */
-		appendpath = create_append_path(root, rel, NIL, partial_subpaths,
-										NIL, NULL, parallel_workers,
-										enable_parallel_append,
-										-1);
-
-		/*
-		 * Make sure any subsequent partial paths use the same row count
-		 * estimate.
-		 */
-		partial_rows = appendpath->path.rows;
-
-		/* Add the path. */
-		add_partial_path(rel, (Path *) appendpath);
 	}
 
 	/*
@@ -1519,36 +1559,19 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
 	if (pa_subpaths_valid && pa_nonpartial_subpaths != NIL)
 	{
 		AppendPath *appendpath;
-		ListCell   *lc;
-		int			parallel_workers = 0;
+		int			parallel_workers =
+			compute_append_parallel_workers(rel, pa_partial_subpaths,
+											list_length(live_childrels),
+											true);
 
-		/*
-		 * Find the highest number of workers requested for any partial
-		 * subpath.
-		 */
-		foreach(lc, pa_partial_subpaths)
+		if (parallel_workers > 0)
 		{
-			Path	   *path = lfirst(lc);
-
-			parallel_workers = Max(parallel_workers, path->parallel_workers);
+			appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
+											pa_partial_subpaths,
+											NIL, NULL, parallel_workers, true,
+											partial_rows);
+			add_partial_path(rel, (Path *) appendpath);
 		}
-
-		/*
-		 * Same formula here as above.  It's even more important in this
-		 * instance because the non-partial paths won't contribute anything to
-		 * the planned number of parallel workers.
-		 */
-		parallel_workers = Max(parallel_workers,
-							   fls(list_length(live_childrels)));
-		parallel_workers = Min(parallel_workers,
-							   max_parallel_workers_per_gather);
-		Assert(parallel_workers > 0);
-
-		appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
-										pa_partial_subpaths,
-										NIL, NULL, parallel_workers, true,
-										partial_rows);
-		add_partial_path(rel, (Path *) appendpath);
 	}
 
 	/*
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2317231be5..7e268fdaf6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3953,6 +3953,9 @@ make_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	grouped_rel->useridiscurrent = input_rel->useridiscurrent;
 	grouped_rel->fdwroutine = input_rel->fdwroutine;
 
+	/* Copy parallel_workers. */
+	grouped_rel->rel_parallel_workers = input_rel->rel_parallel_workers;
+
 	return grouped_rel;
 }
 
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 5375a37dd1..0026420b7c 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -354,15 +354,6 @@ typedef struct StdRdOptions
 	  (relation)->rd_rel->relkind == RELKIND_MATVIEW) ? \
 	 ((StdRdOptions *) (relation)->rd_options)->user_catalog_table : false)
 
-/*
- * RelationGetParallelWorkers
- *		Returns the relation's parallel_workers reloption setting.
- *		Note multiple eval of argument!
- */
-#define RelationGetParallelWorkers(relation, defaultpw) \
-	((relation)->rd_options ? \
-	 ((StdRdOptions *) (relation)->rd_options)->parallel_workers : (defaultpw))
-
 /* ViewOptions->check_option values */
 typedef enum ViewOptCheckOption
 {
@@ -434,6 +425,7 @@ typedef struct PartitionedTableRdOptions
 	int32		vl_len_;		/* varlena header (do not touch directly!) */
 	bool		parallel_insert_enabled;	/* enables planner's use of
 											 * parallel insert */
+	int			parallel_workers;	/* max number of parallel workers */
 } PartitionedTableRdOptions;
 
 /*
@@ -448,6 +440,18 @@ typedef struct PartitionedTableRdOptions
 	 ((StdRdOptions *) (relation)->rd_options)->parallel_insert_enabled) :		\
 	 (defaultpd))
 
+/*
+ * RelationGetParallelWorkers
+ *		Returns the relation's parallel_workers reloption setting.
+ *		Note multiple eval of argument!
+ */
+#define RelationGetParallelWorkers(relation, defaultpw) \
+	((relation)->rd_options ? \
+	 ((relation)->rd_rel->relkind == RELKIND_PARTITIONED_TABLE ? \
+	  ((PartitionedTableRdOptions *) (relation)->rd_options)->parallel_workers : \
+	  ((StdRdOptions *) (relation)->rd_options)->parallel_workers \
+	 ) : (defaultpw))
+
 /*
  * RelationIsValid
  *		True iff relation descriptor is valid.
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index dfa4b036b5..611dfc87ff 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -1166,9 +1166,58 @@ SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 O
  29 | 4500 |   500
 (12 rows)
 
--- Parallelism within partitionwise aggregates
+-- Override "parallel_workers" for a partitioned table
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
+SET max_parallel_workers_per_gather TO 8;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Append
+         ->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+               Filter: (b = 42)
+(9 rows)
+
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 6);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Gather
+   Workers Planned: 6
+   ->  Parallel Append
+         ->  Parallel Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+               Filter: (b = 42)
+         ->  Parallel Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+               Filter: (b = 42)
+(9 rows)
+
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Append
+   ->  Seq Scan on pagg_tab_ml_p1 pagg_tab_ml_1
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s1 pagg_tab_ml_2
+         Filter: (b = 42)
+   ->  Seq Scan on pagg_tab_ml_p2_s2 pagg_tab_ml_3
+         Filter: (b = 42)
+(7 rows)
+
+ALTER TABLE pagg_tab_ml RESET (parallel_workers);
+-- Parallelism within partitionwise aggregates
+SET max_parallel_workers_per_gather TO 2;
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
 -- PARTITION KEY, thus we will have a partial aggregation for them.
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index c17294b15b..c058e2e181 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -253,10 +253,23 @@ EXPLAIN (COSTS OFF)
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3;
 SELECT a, sum(b), count(*) FROM pagg_tab_ml GROUP BY a, b, c HAVING avg(b) > 7 ORDER BY 1, 2, 3;
 
--- Parallelism within partitionwise aggregates
-
+-- Override "parallel_workers" for a partitioned table
 SET min_parallel_table_scan_size TO '8kB';
 SET parallel_setup_cost TO 0;
+SET max_parallel_workers_per_gather TO 8;
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 6);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml SET (parallel_workers = 0);
+EXPLAIN (COSTS OFF)
+SELECT a FROM pagg_tab_ml WHERE b = 42;
+ALTER TABLE pagg_tab_ml RESET (parallel_workers);
+
+-- Parallelism within partitionwise aggregates
+
+SET max_parallel_workers_per_gather TO 2;
 
 -- Full aggregation at level 1 as GROUP BY clause matches with PARTITION KEY
 -- for level 1 only. For subpartitions, GROUP BY clause does not match with
-- 
2.24.1

#28David Rowley
dgrowleyml@gmail.com
In reply to: Amit Langote (#27)
Re: A reloption for partitioned tables - parallel_workers

On Fri, 19 Mar 2021 at 02:07, Amit Langote <amitlangote09@gmail.com> wrote:

Attached a new version rebased over c8f78b616, with the grouping
relation partitioning enhancements as a separate patch 0001. Sorry
about the delay.

I had a quick look at this and wondered if the partitioned table's
parallel workers shouldn't be limited to the sum of the parallel
workers of the Append's subpaths?

It seems a bit weird to me that the following case requests 4 workers:

# create table lp (a int) partition by list(a);
# create table lp1 partition of lp for values in(1);
# insert into lp select 1 from generate_series(1,10000000) x;
# alter table lp1 set (parallel_workers = 2);
# alter table lp set (parallel_workers = 4);
# set max_parallel_workers_per_Gather = 8;
# explain select count(*) from lp;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=97331.63..97331.64 rows=1 width=8)
-> Gather (cost=97331.21..97331.62 rows=4 width=8)
Workers Planned: 4
-> Partial Aggregate (cost=96331.21..96331.22 rows=1 width=8)
-> Parallel Seq Scan on lp1 lp (cost=0.00..85914.57
rows=4166657 width=0)
(5 rows)

I can see a good argument that there should only be 2 workers here.

If someone sets the partitioned table's parallel_workers high so that
they get a large number of workers when no partitions are pruned
during planning, do they really want the same number of workers in
queries where a large number of partitions are pruned?

This problem gets a bit more complex in generic plans where the
planner can't prune anything but run-time pruning prunes many
partitions. I'm not so sure what to do about that, but the problem
does exist today to a lesser extent with the current method of
determining the append parallel workers.

David

#29Laurenz Albe
laurenz.albe@cybertec.at
In reply to: David Rowley (#28)
Re: A reloption for partitioned tables - parallel_workers

On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:

On Fri, 19 Mar 2021 at 02:07, Amit Langote <amitlangote09@gmail.com> wrote:

Attached a new version rebased over c8f78b616, with the grouping
relation partitioning enhancements as a separate patch 0001. Sorry
about the delay.

I had a quick look at this and wondered if the partitioned table's
parallel workers shouldn't be limited to the sum of the parallel
workers of the Append's subpaths?

It seems a bit weird to me that the following case requests 4 workers:

# create table lp (a int) partition by list(a);
# create table lp1 partition of lp for values in(1);
# insert into lp select 1 from generate_series(1,10000000) x;
# alter table lp1 set (parallel_workers = 2);
# alter table lp set (parallel_workers = 4);
# set max_parallel_workers_per_Gather = 8;
# explain select count(*) from lp;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=97331.63..97331.64 rows=1 width=8)
-> Gather (cost=97331.21..97331.62 rows=4 width=8)
Workers Planned: 4
-> Partial Aggregate (cost=96331.21..96331.22 rows=1 width=8)
-> Parallel Seq Scan on lp1 lp (cost=0.00..85914.57
rows=4166657 width=0)
(5 rows)

I can see a good argument that there should only be 2 workers here.

Good point, I agree.

If someone sets the partitioned table's parallel_workers high so that
they get a large number of workers when no partitions are pruned
during planning, do they really want the same number of workers in
queries where a large number of partitions are pruned?

This problem gets a bit more complex in generic plans where the
planner can't prune anything but run-time pruning prunes many
partitions. I'm not so sure what to do about that, but the problem
does exist today to a lesser extent with the current method of
determining the append parallel workers.

Also a good point. That would require changing the actual number of
parallel workers at execution time, but that is tricky.
If we go with your suggestion above, we'd have to disambiguate if
the number of workers is set because a partition is large enough
to warrant a parallel scan (then it shouldn't be reduced if the executor
prunes partitions) or if it is because of the number of partitions
(then it should be reduced).

Currently, we don't reduce parallelism if the executor prunes
partitions, so this could be seen as an independent problem.

I don't know if Seamus is still working on that; if not, we might
mark it as "returned with feedback".

Perhaps Amit's patch 0001 should go in independently.

I'll mark the patch as "waiting for author".

Yours,
Laurenz Albe

#30Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#29)
Re: A reloption for partitioned tables - parallel_workers

On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2021-03-24 at 14:14 +1300, David Rowley wrote:

On Fri, 19 Mar 2021 at 02:07, Amit Langote <amitlangote09@gmail.com> wrote:

Attached a new version rebased over c8f78b616, with the grouping
relation partitioning enhancements as a separate patch 0001. Sorry
about the delay.

I had a quick look at this and wondered if the partitioned table's
parallel workers shouldn't be limited to the sum of the parallel
workers of the Append's subpaths?

It seems a bit weird to me that the following case requests 4 workers:

# create table lp (a int) partition by list(a);
# create table lp1 partition of lp for values in(1);
# insert into lp select 1 from generate_series(1,10000000) x;
# alter table lp1 set (parallel_workers = 2);
# alter table lp set (parallel_workers = 4);
# set max_parallel_workers_per_Gather = 8;
# explain select count(*) from lp;
QUERY PLAN
-------------------------------------------------------------------------------------------
Finalize Aggregate (cost=97331.63..97331.64 rows=1 width=8)
-> Gather (cost=97331.21..97331.62 rows=4 width=8)
Workers Planned: 4
-> Partial Aggregate (cost=96331.21..96331.22 rows=1 width=8)
-> Parallel Seq Scan on lp1 lp (cost=0.00..85914.57
rows=4166657 width=0)
(5 rows)

I can see a good argument that there should only be 2 workers here.

Good point, I agree.

If someone sets the partitioned table's parallel_workers high so that
they get a large number of workers when no partitions are pruned
during planning, do they really want the same number of workers in
queries where a large number of partitions are pruned?

This problem gets a bit more complex in generic plans where the
planner can't prune anything but run-time pruning prunes many
partitions. I'm not so sure what to do about that, but the problem
does exist today to a lesser extent with the current method of
determining the append parallel workers.

Also a good point. That would require changing the actual number of
parallel workers at execution time, but that is tricky.
If we go with your suggestion above, we'd have to disambiguate if
the number of workers is set because a partition is large enough
to warrant a parallel scan (then it shouldn't be reduced if the executor
prunes partitions) or if it is because of the number of partitions
(then it should be reduced).

Maybe we really want a parallel_append_workers for partitioned tables,
instead of piggybacking on parallel_workers?

I don't know if Seamus is still working on that; if not, we might
mark it as "returned with feedback".

I have to agree given the time left.

Perhaps Amit's patch 0001 should go in independently.

Perhaps, but maybe we should wait until something really needs that.

--
Amit Langote
EDB: http://www.enterprisedb.com

#31Daniel Gustafsson
daniel@yesql.se
In reply to: Amit Langote (#30)
Re: A reloption for partitioned tables - parallel_workers

On 6 Apr 2021, at 09:46, Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

I don't know if Seamus is still working on that; if not, we might
mark it as "returned with feedback".

I have to agree given the time left.

This thread has stalled and the patch no longer applies. I propose that we
mark this Returned with Feedback, is that Ok with you Amit?

--
Daniel Gustafsson https://vmware.com/

#32Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Daniel Gustafsson (#31)
Re: A reloption for partitioned tables - parallel_workers

On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:

On 6 Apr 2021, at 09:46, Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

I don't know if Seamus is still working on that; if not, we might
mark it as "returned with feedback".

I have to agree given the time left.

This thread has stalled and the patch no longer applies.  I propose that we
mark this Returned with Feedback, is that Ok with you Amit?

+1. That requires more thought.

Yours,
Laurenz Albe

#33Amit Langote
amitlangote09@gmail.com
In reply to: Laurenz Albe (#32)
Re: A reloption for partitioned tables - parallel_workers

On Sat, Sep 4, 2021 at 3:10 Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:

On 6 Apr 2021, at 09:46, Amit Langote <amitlangote09@gmail.com> wrote:
On Fri, Apr 2, 2021 at 11:36 PM Laurenz Albe <laurenz.albe@cybertec.at>

wrote:

I don't know if Seamus is still working on that; if not, we might
mark it as "returned with feedback".

I have to agree given the time left.

This thread has stalled and the patch no longer applies. I propose that

we

mark this Returned with Feedback, is that Ok with you Amit?

+1. That requires more thought.

Yes, I think so too.
--
Amit Langote
EDB: http://www.enterprisedb.com

#34Daniel Gustafsson
daniel@yesql.se
In reply to: Amit Langote (#33)
Re: A reloption for partitioned tables - parallel_workers

On 4 Sep 2021, at 01:17, Amit Langote <amitlangote09@gmail.com> wrote:
On Sat, Sep 4, 2021 at 3:10 Laurenz Albe <laurenz.albe@cybertec.at <mailto:laurenz.albe@cybertec.at>> wrote:
On Fri, 2021-09-03 at 18:24 +0200, Daniel Gustafsson wrote:

This thread has stalled and the patch no longer applies. I propose that we
mark this Returned with Feedback, is that Ok with you Amit?

+1. That requires more thought.

Yes, I think so too.

Done that way, it can always be resubmitted in a later CF.

--
Daniel Gustafsson https://vmware.com/