Parallel seq. plan is not coming against inheritance or partition table

Started by Ashutosh Sharmaalmost 9 years ago15 messages
#1Ashutosh Sharma
ashu.coek88@gmail.com
1 attachment(s)

Hi All,

From following git commit onwards, parallel seq scan is never getting
selected for inheritance or partitioned tables.

<git-commit>
commit 51ee6f3160d2e1515ed6197594bda67eb99dc2cc
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Feb 15 13:37:24 2017 -0500

Replace min_parallel_relation_size with two new GUCs.
</git-commit>

Steps to reproduce:
==============
create table t1 (a integer);

create table t1_1 (check (a >=1 and a < 1000000)) inherits (t1);
create table t1_2 (check (a >= 1000000 and a < 2000000)) inherits (t1);

insert into t1_1 select generate_series(1, 900000);
insert into t1_2 select generate_series(1000000, 1900000);

analyze t1;
analyze t1_1;
analyze t1_2;

explain analyze select * from t1 where a < 50000 OR a > 1950000;

EXPLAIN ANALYZE output:
====================
1) Prior to "Replace min_parallel_relation_size with two new GUCs" commit,

postgres=# explain analyze select * from t1 where a < 50000 OR a > 1950000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..25094.71 rows=48787 width=4) (actual
time=0.431..184.264 rows=49999 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Append (cost=0.00..19216.01 rows=20328 width=4) (actual
time=0.025..167.465 rows=16666 loops=3)
-> Parallel Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4)
(actual time=0.001..0.001 rows=0 loops=3)
Filter: ((a < 50000) OR (a > 1950000))
-> Parallel Seq Scan on t1_1 (cost=0.00..9608.00 rows=20252
width=4) (actual time=0.023..76.644 rows=16666 loops=3)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 283334
-> Parallel Seq Scan on t1_2 (cost=0.00..9608.01 rows=75
width=4) (actual time=89.505..89.505 rows=0 loops=3)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 300000
Planning time: 0.343 ms
Execution time: 188.624 ms
(14 rows)

2) From "Replace min_parallel_relation_size with two new GUCs" commit onwards,

postgres=# explain analyze select * from t1 where a < 50000 OR a > 1950000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..34966.01 rows=50546 width=4) (actual
time=0.021..375.747 rows=49999 loops=1)
-> Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4) (actual
time=0.004..0.004 rows=0 loops=1)
Filter: ((a < 50000) OR (a > 1950000))
-> Seq Scan on t1_1 (cost=0.00..17483.00 rows=50365 width=4)
(actual time=0.016..198.393 rows=49999 loops=1)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 850001
-> Seq Scan on t1_2 (cost=0.00..17483.01 rows=180 width=4)
(actual time=173.310..173.310 rows=0 loops=1)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 900001
Planning time: 0.812 ms
Execution time: 377.831 ms
(11 rows)

RCA:
====
From "Replace min_parallel_relation_size with two new GUCs" commit
onwards, we are not assigning parallel workers for the child rel with
zero heap pages. This means we won't be able to create a partial
append path as this requires all the child rels within an Append Node
to have a partial path. Please check the following code snippet from
set_append_rel_pathlist().

/* Same idea, but for a partial plan. */
if (childrel->partial_pathlist != NIL)
partial_subpaths = accumulate_append_subpath(partial_subpaths,
linitial(childrel->partial_pathlist));
else
partial_subpaths_valid = false;

.........
.........

/*
* Consider an append of partial unordered, unparameterized partial paths.
*/
if (partial_subpaths_valid)
{
...........
...........

/* Generate a partial append path. */
appendpath = create_append_path(rel, partial_subpaths, NULL,
parallel_workers);
add_partial_path(rel, (Path *) appendpath);
}

In short, no Gather path would be generated if baserel having an
Append Node contains any child rel without partial path. This means
just because one childRel having zero heap pages didn't get parallel
workers other childRels that was good enough to go for Parallel Seq
Scan had to go for normal seq scan which could be costlier.

Fix:
====
Attached is the patch that fixes this issue.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

Attachments:

assign_par_workers_for_empty_childRel_v1.patchinvalid/octet-stream; name=assign_par_workers_for_empty_childRel_v1.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 633b5c1..d65eb1b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -2934,6 +2934,13 @@ compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
 
 			parallel_workers = heap_parallel_workers;
 		}
+		else
+			/*
+			 * In case of inheritance child table with zero heap pages
+			 * we assign at least 1 parallel worker so as to generate
+			 * partial path.
+			 */
+			parallel_workers = 1;
 
 		if (index_pages > 0)
 		{
#2Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#1)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Sat, Mar 4, 2017 at 9:52 PM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

Hi All,

From following git commit onwards, parallel seq scan is never getting
selected for inheritance or partitioned tables.

<git-commit>
commit 51ee6f3160d2e1515ed6197594bda67eb99dc2cc
Author: Robert Haas <rhaas@postgresql.org>
Date: Wed Feb 15 13:37:24 2017 -0500

Replace min_parallel_relation_size with two new GUCs.
</git-commit>

Steps to reproduce:
==============
create table t1 (a integer);

create table t1_1 (check (a >=1 and a < 1000000)) inherits (t1);
create table t1_2 (check (a >= 1000000 and a < 2000000)) inherits (t1);

insert into t1_1 select generate_series(1, 900000);
insert into t1_2 select generate_series(1000000, 1900000);

analyze t1;
analyze t1_1;
analyze t1_2;

explain analyze select * from t1 where a < 50000 OR a > 1950000;

EXPLAIN ANALYZE output:
====================
1) Prior to "Replace min_parallel_relation_size with two new GUCs" commit,

postgres=# explain analyze select * from t1 where a < 50000 OR a > 1950000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..25094.71 rows=48787 width=4) (actual
time=0.431..184.264 rows=49999 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Append (cost=0.00..19216.01 rows=20328 width=4) (actual
time=0.025..167.465 rows=16666 loops=3)
-> Parallel Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4)
(actual time=0.001..0.001 rows=0 loops=3)
Filter: ((a < 50000) OR (a > 1950000))
-> Parallel Seq Scan on t1_1 (cost=0.00..9608.00 rows=20252
width=4) (actual time=0.023..76.644 rows=16666 loops=3)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 283334
-> Parallel Seq Scan on t1_2 (cost=0.00..9608.01 rows=75
width=4) (actual time=89.505..89.505 rows=0 loops=3)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 300000
Planning time: 0.343 ms
Execution time: 188.624 ms
(14 rows)

2) From "Replace min_parallel_relation_size with two new GUCs" commit onwards,

postgres=# explain analyze select * from t1 where a < 50000 OR a > 1950000;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Append (cost=0.00..34966.01 rows=50546 width=4) (actual
time=0.021..375.747 rows=49999 loops=1)
-> Seq Scan on t1 (cost=0.00..0.00 rows=1 width=4) (actual
time=0.004..0.004 rows=0 loops=1)
Filter: ((a < 50000) OR (a > 1950000))
-> Seq Scan on t1_1 (cost=0.00..17483.00 rows=50365 width=4)
(actual time=0.016..198.393 rows=49999 loops=1)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 850001
-> Seq Scan on t1_2 (cost=0.00..17483.01 rows=180 width=4)
(actual time=173.310..173.310 rows=0 loops=1)
Filter: ((a < 50000) OR (a > 1950000))
Rows Removed by Filter: 900001
Planning time: 0.812 ms
Execution time: 377.831 ms
(11 rows)

RCA:
====
From "Replace min_parallel_relation_size with two new GUCs" commit
onwards, we are not assigning parallel workers for the child rel with
zero heap pages. This means we won't be able to create a partial
append path as this requires all the child rels within an Append Node
to have a partial path.

Right, but OTOH, if we assign parallel workers by default, then it is
quite possible that it would result in much worse plans. Consider a
case where partition hierarchy has 1000 partitions and only one of
them is big enough to allow parallel workers. Now in this case, with
your proposed fix it will try to scan all the partitions in parallel
workers which I think can easily result in bad performance. I think
the right way to make such plans parallel is by using Parallel Append
node (https://commitfest.postgresql.org/13/987/). Alternatively, if
you want to force parallelism in cases like the one you have shown in
example, you can use Alter Table .. Set (parallel_workers = 1).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#2)
Re: Parallel seq. plan is not coming against inheritance or partition table

Right, but OTOH, if we assign parallel workers by default, then it is
quite possible that it would result in much worse plans. Consider a
case where partition hierarchy has 1000 partitions and only one of
them is big enough to allow parallel workers. Now in this case, with
your proposed fix it will try to scan all the partitions in parallel
workers which I think can easily result in bad performance.

Right. But, there can also be a case where 999 partitions are large
and eligible for PSS. In such case as well, PSS won't be selected.

I think

the right way to make such plans parallel is by using Parallel Append
node (https://commitfest.postgresql.org/13/987/). Alternatively, if
you want to force parallelism in cases like the one you have shown in
example, you can use Alter Table .. Set (parallel_workers = 1).

Okay, I was not aware of Parallel Append. Thanks.

With Regards,
Ashutosh Sharma
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#2)
1 attachment(s)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Sun, Mar 5, 2017 at 9:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

RCA:
====
From "Replace min_parallel_relation_size with two new GUCs" commit
onwards, we are not assigning parallel workers for the child rel with
zero heap pages. This means we won't be able to create a partial
append path as this requires all the child rels within an Append Node
to have a partial path.

Right, but OTOH, if we assign parallel workers by default, then it is
quite possible that it would result in much worse plans. Consider a
case where partition hierarchy has 1000 partitions and only one of
them is big enough to allow parallel workers. Now in this case, with
your proposed fix it will try to scan all the partitions in parallel
workers which I think can easily result in bad performance. I think
the right way to make such plans parallel is by using Parallel Append
node (https://commitfest.postgresql.org/13/987/). Alternatively, if
you want to force parallelism in cases like the one you have shown in
example, you can use Alter Table .. Set (parallel_workers = 1).

Well, I think this is a bug in
51ee6f3160d2e1515ed6197594bda67eb99dc2cc. The commit message doesn't
say anything about changing any behavior, just about renaming GUCs,
and I don't remember any discussion about changing the behavior
either, and the comment still implies that we have the old behavior
when we really don't, and parallel append isn't committed yet.

I think the problem here is that compute_parallel_worker() thinks it
can use 0 as a sentinel value that means "ignore this", but it can't
really, because a heap can actually contain 0 pages. Here's a patch
which does the following:

- changes the sentinel value to be -1
- adjusts the test so that a value of -1 is ignored when determining
whether the relation is too small for parallelism
- updates a comment that is out-of-date now that this is no longer
part of the seqscan logic specifically
- moves some variables to narrower scopes
- changes the function parameter types to be doubles, because
otherwise the call in cost_index() has an overflow hazard

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

compute-parallel-worker-fix.patchapplication/octet-stream; name=compute-parallel-worker-fix.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 633b5c1..f238643 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -679,7 +679,7 @@ create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	int			parallel_workers;
 
-	parallel_workers = compute_parallel_worker(rel, rel->pages, 0);
+	parallel_workers = compute_parallel_worker(rel, rel->pages, -1);
 
 	/* If any limit was set to zero, the user doesn't want a parallel scan. */
 	if (parallel_workers <= 0)
@@ -2880,16 +2880,16 @@ remove_unused_subquery_outputs(Query *subquery, RelOptInfo *rel)
  * be scanned and the size of the index to be scanned, then choose a minimum
  * of those.
  *
- * "heap_pages" is the number of pages from the table that we expect to scan.
- * "index_pages" is the number of pages from the index that we expect to scan.
+ * "heap_pages" is the number of pages from the table that we expect to scan, or
+ * -1 if we don't expect to scan any.
+ *
+ * "index_pages" is the number of pages from the index that we expect to scan, or
+ * -1 if we don't expect to scan any.
  */
 int
-compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
-						BlockNumber index_pages)
+compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages)
 {
 	int			parallel_workers = 0;
-	int			heap_parallel_workers = 1;
-	int			index_parallel_workers = 1;
 
 	/*
 	 * If the user has set the parallel_workers reloption, use that; otherwise
@@ -2899,23 +2899,23 @@ compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
 		parallel_workers = rel->rel_parallel_workers;
 	else
 	{
-		int			heap_parallel_threshold;
-		int			index_parallel_threshold;
-
 		/*
-		 * If this relation is too small to be worth a parallel scan, just
-		 * return without doing anything ... unless it's an inheritance child.
+		 * If too few pages are being justified to make a parallel scan worthwhile,
+		 * just return zero ... unless it's an inheritance child.
 		 * In that case, we want to generate a parallel path here anyway.  It
 		 * might not be worthwhile just for this relation, but when combined
 		 * with all of its inheritance siblings it may well pay off.
 		 */
-		if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
-			index_pages < (BlockNumber) min_parallel_index_scan_size &&
-			rel->reloptkind == RELOPT_BASEREL)
+		if (rel->reloptkind == RELOPT_BASEREL &&
+			((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+			(index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
 			return 0;
 
-		if (heap_pages > 0)
+		if (heap_pages >= 0)
 		{
+			int		heap_parallel_threshold;
+			int		heap_parallel_workers = 1;
+
 			/*
 			 * Select the number of workers based on the log of the size of
 			 * the relation.  This probably needs to be a good deal more
@@ -2935,8 +2935,11 @@ compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
 			parallel_workers = heap_parallel_workers;
 		}
 
-		if (index_pages > 0)
+		if (index_pages >= 0)
 		{
+			int		index_parallel_workers = 1;
+			int		index_parallel_threshold;
+
 			/* same calculation as for heap_pages above */
 			index_parallel_threshold = Max(min_parallel_index_scan_size, 1);
 			while (index_pages >= (BlockNumber) (index_parallel_threshold * 3))
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index c138f57..3be7382 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -601,8 +601,7 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 		 * order.
 		 */
 		path->path.parallel_workers = compute_parallel_worker(baserel,
-											   (BlockNumber) rand_heap_pages,
-												  (BlockNumber) index_pages);
+											   rand_heap_pages, index_pages);
 
 		/*
 		 * Fall out if workers can't be assigned for parallel scan, because in
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index ebda308..f26c6ae 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -54,8 +54,8 @@ extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
 extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
-extern int compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
-						BlockNumber index_pages);
+extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
+						double index_pages);
 
 #ifdef OPTIMIZER_DEBUG
 extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#4)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Tue, Mar 7, 2017 at 3:24 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Mar 5, 2017 at 9:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

RCA:
====
From "Replace min_parallel_relation_size with two new GUCs" commit
onwards, we are not assigning parallel workers for the child rel with
zero heap pages. This means we won't be able to create a partial
append path as this requires all the child rels within an Append Node
to have a partial path.

Right, but OTOH, if we assign parallel workers by default, then it is
quite possible that it would result in much worse plans. Consider a
case where partition hierarchy has 1000 partitions and only one of
them is big enough to allow parallel workers. Now in this case, with
your proposed fix it will try to scan all the partitions in parallel
workers which I think can easily result in bad performance. I think
the right way to make such plans parallel is by using Parallel Append
node (https://commitfest.postgresql.org/13/987/). Alternatively, if
you want to force parallelism in cases like the one you have shown in
example, you can use Alter Table .. Set (parallel_workers = 1).

Well, I think this is a bug in
51ee6f3160d2e1515ed6197594bda67eb99dc2cc. The commit message doesn't
say anything about changing any behavior, just about renaming GUCs,
and I don't remember any discussion about changing the behavior
either, and the comment still implies that we have the old behavior
when we really don't, and parallel append isn't committed yet.

I also think that commit didn't intend to change the behavior,
however, the point is how sensible is it to keep such behavior after
Parallel Append. I am not sure if this is the right time to consider
it or shall we wait till Parallel Append is committed.

I think the problem here is that compute_parallel_worker() thinks it
can use 0 as a sentinel value that means "ignore this", but it can't
really, because a heap can actually contain 0 pages. Here's a patch
which does the following:

- if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
- index_pages < (BlockNumber) min_parallel_index_scan_size &&
- rel->reloptkind == RELOPT_BASEREL)
+ if (rel->reloptkind == RELOPT_BASEREL &&
+ ((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+ (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
  return 0;

The purpose of considering both heap and index pages () for
min_parallel_* is that even if one of them is bigger than threshold
then we should try parallelism. This could be helpful for parallel
index scans where we consider parallel workers based on both heap and
index pages. Is there a reason for you to change that condition as
that is not even related to the problem being discussed?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#5)
Re: Parallel seq. plan is not coming against inheritance or partition table

I also think that commit didn't intend to change the behavior,
however, the point is how sensible is it to keep such behavior after
Parallel Append. I am not sure if this is the right time to consider
it or shall we wait till Parallel Append is committed.

I think the problem here is that compute_parallel_worker() thinks it
can use 0 as a sentinel value that means "ignore this", but it can't
really, because a heap can actually contain 0 pages. Here's a patch
which does the following:

- if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
- index_pages < (BlockNumber) min_parallel_index_scan_size &&
- rel->reloptkind == RELOPT_BASEREL)
+ if (rel->reloptkind == RELOPT_BASEREL &&
+ ((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+ (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
return 0;

The purpose of considering both heap and index pages () for
min_parallel_* is that even if one of them is bigger than threshold
then we should try parallelism.

Yes, But, this is only true for normal tables not for partitioned or
inheritance tables. I think for partition table, even if one heap page
exist, we go for parallelism.

This could be helpful for parallel

index scans where we consider parallel workers based on both heap and
index pages. Is there a reason for you to change that condition as
that is not even related to the problem being discussed?

I think he has changed to allow parallelism for inheritance or partition
tables. For normal tables, it won't be touched until the below if-condition
is satisfied.

*if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
index_pages < (BlockNumber) min_parallel_index_scan_size &&
rel->reloptkind == RELOPT_BASEREL) return 0;*

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Ashutosh Sharma (#6)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Tue, Mar 7, 2017 at 10:22 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

I also think that commit didn't intend to change the behavior,
however, the point is how sensible is it to keep such behavior after
Parallel Append. I am not sure if this is the right time to consider
it or shall we wait till Parallel Append is committed.

I think the problem here is that compute_parallel_worker() thinks it
can use 0 as a sentinel value that means "ignore this", but it can't
really, because a heap can actually contain 0 pages. Here's a patch
which does the following:

- if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
- index_pages < (BlockNumber) min_parallel_index_scan_size &&
- rel->reloptkind == RELOPT_BASEREL)
+ if (rel->reloptkind == RELOPT_BASEREL &&
+ ((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+ (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
return 0;

The purpose of considering both heap and index pages () for
min_parallel_* is that even if one of them is bigger than threshold
then we should try parallelism.

Yes, But, this is only true for normal tables not for partitioned or
inheritance tables. I think for partition table, even if one heap page
exist, we go for parallelism.

This could be helpful for parallel

index scans where we consider parallel workers based on both heap and
index pages. Is there a reason for you to change that condition as
that is not even related to the problem being discussed?

I think he has changed to allow parallelism for inheritance or partition
tables. For normal tables, it won't be touched until the below if-condition
is satisfied.

if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
index_pages < (BlockNumber) min_parallel_index_scan_size &&
rel->reloptkind == RELOPT_BASEREL)
return 0;

AFAICS, the patch has changed the if-condition you are quoting.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Ashutosh Sharma
ashu.coek88@gmail.com
In reply to: Amit Kapila (#7)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Tue, Mar 7, 2017 at 11:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Mar 7, 2017 at 10:22 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote:

I also think that commit didn't intend to change the behavior,
however, the point is how sensible is it to keep such behavior after
Parallel Append. I am not sure if this is the right time to consider
it or shall we wait till Parallel Append is committed.

I think the problem here is that compute_parallel_worker() thinks it
can use 0 as a sentinel value that means "ignore this", but it can't
really, because a heap can actually contain 0 pages. Here's a patch
which does the following:

- if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
- index_pages < (BlockNumber) min_parallel_index_scan_size &&
- rel->reloptkind == RELOPT_BASEREL)
+ if (rel->reloptkind == RELOPT_BASEREL &&
+ ((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+ (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
return 0;

The purpose of considering both heap and index pages () for
min_parallel_* is that even if one of them is bigger than threshold
then we should try parallelism.

Yes, But, this is only true for normal tables not for partitioned or
inheritance tables. I think for partition table, even if one heap page
exist, we go for parallelism.

This could be helpful for parallel

index scans where we consider parallel workers based on both heap and
index pages. Is there a reason for you to change that condition as
that is not even related to the problem being discussed?

I think he has changed to allow parallelism for inheritance or partition
tables. For normal tables, it won't be touched until the below if-condition
is satisfied.

if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
index_pages < (BlockNumber) min_parallel_index_scan_size &&
rel->reloptkind == RELOPT_BASEREL)
return 0;

AFAICS, the patch has changed the if-condition you are quoting.

Yes, It has slightly changed the if condition or I would say it has
corrected the if condition. The reason being, with new if condition in
patch, we first check if reloptkind is of BASEREL type or not before
checking if there are enough heap pages or index pages that is good to
go for parallelism. In case if it is partition table 'rel->reloptkind
== RELOPT_BASEREL' will fail hence, further conditions won't be
checked.

I think the most important thing that the patch fixes is passing
index_pages as '-1' to compute_parallel_worker() as
'min_parallel_index_scan_size' can be set to zero by the user.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#5)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Mon, Mar 6, 2017 at 10:28 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I also think that commit didn't intend to change the behavior,
however, the point is how sensible is it to keep such behavior after
Parallel Append. I am not sure if this is the right time to consider
it or shall we wait till Parallel Append is committed.

I think we should wait until that's committed. I'm not convinced we
ever want to change the behavior, but if we do, let's think about it
then, not now.

- if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
- index_pages < (BlockNumber) min_parallel_index_scan_size &&
- rel->reloptkind == RELOPT_BASEREL)
+ if (rel->reloptkind == RELOPT_BASEREL &&
+ ((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+ (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
return 0;

The purpose of considering both heap and index pages () for
min_parallel_* is that even if one of them is bigger than threshold
then we should try parallelism. This could be helpful for parallel
index scans where we consider parallel workers based on both heap and
index pages. Is there a reason for you to change that condition as
that is not even related to the problem being discussed?

Really? We want to do parallelism if EITHER condition is met? I
thought the goal was to do parallelism if BOTH conditions were met.
Otherwise, you might go parallel if you have a large number of heap
pages but only 1 or 2 index pages. Preventing that case from using
parallelism was the whole motivation for switching to a two-GUC system
in the first place.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#9)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Tue, Mar 7, 2017 at 10:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 6, 2017 at 10:28 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I also think that commit didn't intend to change the behavior,
however, the point is how sensible is it to keep such behavior after
Parallel Append. I am not sure if this is the right time to consider
it or shall we wait till Parallel Append is committed.

I think we should wait until that's committed. I'm not convinced we
ever want to change the behavior, but if we do, let's think about it
then, not now.

- if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
- index_pages < (BlockNumber) min_parallel_index_scan_size &&
- rel->reloptkind == RELOPT_BASEREL)
+ if (rel->reloptkind == RELOPT_BASEREL &&
+ ((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+ (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
return 0;

The purpose of considering both heap and index pages () for
min_parallel_* is that even if one of them is bigger than threshold
then we should try parallelism. This could be helpful for parallel
index scans where we consider parallel workers based on both heap and
index pages. Is there a reason for you to change that condition as
that is not even related to the problem being discussed?

Really? We want to do parallelism if EITHER condition is met? I
thought the goal was to do parallelism if BOTH conditions were met.
Otherwise, you might go parallel if you have a large number of heap
pages but only 1 or 2 index pages.

If we have lesser index pages and more heap pages, then we limit the
parallelism based on index pages. I think it can give us benefit in
such cases as well (especially when we have to discard rows based heap
rows). Now, consider it from another viewpoint, what if there are
enough index pages (> min_parallel_index_scan_size) but not sufficient
heap pages. I think in such cases parallel index (only) scans will be
beneficial especially because in the parallel index only scans
heap_pages could be very less or possibly could be zero.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#10)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Tue, Mar 7, 2017 at 9:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

If we have lesser index pages and more heap pages, then we limit the
parallelism based on index pages.

Kinda. In most cases, we figure out the degree of parallelism based
on heap pages and then we figure out the degree of parallelism based
on index pages and we return the smaller of those numbers. However,
as an exception, if one of those numbers would be zero and the other
would be non-zero, then we return 1 instead of 0. That's randomly
inconsistent, and I think it's a bug which my proposed patch would
fix. I don't understand how you can justify returning the smaller of
the two values in every other case, but in that case return something
else.

I think it can give us benefit in
such cases as well (especially when we have to discard rows based heap
rows). Now, consider it from another viewpoint, what if there are
enough index pages (> min_parallel_index_scan_size) but not sufficient
heap pages. I think in such cases parallel index (only) scans will be
beneficial especially because in the parallel index only scans
heap_pages could be very less or possibly could be zero.

That's a separate problem. I think we ought to consider having an
index-only scan pass -1 for the number of heap pages, so that the
degree of parallelism in that case is limited only by the number of
index pages. I was going to bring up that issue after I got this
committed. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#11)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Wed, Mar 8, 2017 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 7, 2017 at 9:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think it can give us benefit in
such cases as well (especially when we have to discard rows based heap
rows). Now, consider it from another viewpoint, what if there are
enough index pages (> min_parallel_index_scan_size) but not sufficient
heap pages. I think in such cases parallel index (only) scans will be
beneficial especially because in the parallel index only scans
heap_pages could be very less or possibly could be zero.

That's a separate problem. I think we ought to consider having an
index-only scan pass -1 for the number of heap pages, so that the
degree of parallelism in that case is limited only by the number of
index pages.

Sure, that sounds sensible, but even after that, I am not sure if for
plain index scans it is a good idea to not choose parallelism if the
number of heap pages is lesser than min_parallel_table_scan_size even
though the number of index pages is greater than
min_parallel_index_scan_size. I think we can try out some tests
(maybe TPC-H queries where parallel index scan gets picked up) to see
what is right here. Do you think it will be bad if just commit your
patch without this change and then consider changing it separately?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#12)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Wed, Mar 8, 2017 at 7:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 8, 2017 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 7, 2017 at 9:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think it can give us benefit in
such cases as well (especially when we have to discard rows based heap
rows). Now, consider it from another viewpoint, what if there are
enough index pages (> min_parallel_index_scan_size) but not sufficient
heap pages. I think in such cases parallel index (only) scans will be
beneficial especially because in the parallel index only scans
heap_pages could be very less or possibly could be zero.

That's a separate problem. I think we ought to consider having an
index-only scan pass -1 for the number of heap pages, so that the
degree of parallelism in that case is limited only by the number of
index pages.

Sure, that sounds sensible, but even after that, I am not sure if for
plain index scans it is a good idea to not choose parallelism if the
number of heap pages is lesser than min_parallel_table_scan_size even
though the number of index pages is greater than
min_parallel_index_scan_size. I think we can try out some tests
(maybe TPC-H queries where parallel index scan gets picked up) to see
what is right here. Do you think it will be bad if just commit your
patch without this change and then consider changing it separately?

No, I think that would be fine. I think that we need to commit
something like what I proposed because the earlier commit broke
something that used to work. That's got to get fixed. Now if we
subsequently find out that what we've implemented can be improved in
some way, we can consider those changes then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#13)
1 attachment(s)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Wed, Mar 8, 2017 at 6:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 8, 2017 at 7:04 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 8, 2017 at 8:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 7, 2017 at 9:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think it can give us benefit in
such cases as well (especially when we have to discard rows based heap
rows). Now, consider it from another viewpoint, what if there are
enough index pages (> min_parallel_index_scan_size) but not sufficient
heap pages. I think in such cases parallel index (only) scans will be
beneficial especially because in the parallel index only scans
heap_pages could be very less or possibly could be zero.

That's a separate problem. I think we ought to consider having an
index-only scan pass -1 for the number of heap pages, so that the
degree of parallelism in that case is limited only by the number of
index pages.

Sure, that sounds sensible, but even after that, I am not sure if for
plain index scans it is a good idea to not choose parallelism if the
number of heap pages is lesser than min_parallel_table_scan_size even
though the number of index pages is greater than
min_parallel_index_scan_size. I think we can try out some tests
(maybe TPC-H queries where parallel index scan gets picked up) to see
what is right here. Do you think it will be bad if just commit your
patch without this change and then consider changing it separately?

No, I think that would be fine. I think that we need to commit
something like what I proposed because the earlier commit broke
something that used to work. That's got to get fixed.

Agreed, so I have rebased your patch and passed heap_pages as -1 for
index_only scans as discussed. Also, Rafia has tested with attached
patch that parallel index and parallel index only scans are picked for
TPC-H queries. I have also reviewed and tested your changes with
respect to problem reported and found that it works as expected. So,
I think we can go ahead with attached patch unless you see some
problem with the changes I have made.

The only remaining open item about parallel index scans is a decision
about storage parameter which is being discussed on parallel index
scan thread [1]/messages/by-id/44cd4d75-41f2-8e63-e204-1ecb09127fbf@archidevsys.co.nz, if you and or others can share feedback, then we can
proceed on that aspect as well.

[1]: /messages/by-id/44cd4d75-41f2-8e63-e204-1ecb09127fbf@archidevsys.co.nz

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

compute-parallel-worker-fix.1.patchapplication/octet-stream; name=compute-parallel-worker-fix.1.patchDownload
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index b263359..d847a4d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -692,7 +692,7 @@ create_plain_partial_paths(PlannerInfo *root, RelOptInfo *rel)
 {
 	int			parallel_workers;
 
-	parallel_workers = compute_parallel_worker(rel, rel->pages, 0);
+	parallel_workers = compute_parallel_worker(rel, rel->pages, -1);
 
 	/* If any limit was set to zero, the user doesn't want a parallel scan. */
 	if (parallel_workers <= 0)
@@ -2938,7 +2938,7 @@ create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
 	pages_fetched = compute_bitmap_pages(root, rel, bitmapqual, 1.0,
 										 NULL, NULL);
 
-	parallel_workers = compute_parallel_worker(rel, pages_fetched, 0);
+	parallel_workers = compute_parallel_worker(rel, pages_fetched, -1);
 
 	if (parallel_workers <= 0)
 		return;
@@ -2953,16 +2953,16 @@ create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
  * be scanned and the size of the index to be scanned, then choose a minimum
  * of those.
  *
- * "heap_pages" is the number of pages from the table that we expect to scan.
- * "index_pages" is the number of pages from the index that we expect to scan.
+ * "heap_pages" is the number of pages from the table that we expect to scan, or
+ * -1 if we don't expect to scan any.
+ *
+ * "index_pages" is the number of pages from the index that we expect to scan, or
+ * -1 if we don't expect to scan any.
  */
 int
-compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
-						BlockNumber index_pages)
+compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages)
 {
 	int			parallel_workers = 0;
-	int			heap_parallel_workers = 1;
-	int			index_parallel_workers = 1;
 
 	/*
 	 * If the user has set the parallel_workers reloption, use that; otherwise
@@ -2972,23 +2972,23 @@ compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
 		parallel_workers = rel->rel_parallel_workers;
 	else
 	{
-		int			heap_parallel_threshold;
-		int			index_parallel_threshold;
-
 		/*
-		 * If this relation is too small to be worth a parallel scan, just
-		 * return without doing anything ... unless it's an inheritance child.
+		 * If too few pages are being justified to make a parallel scan worthwhile,
+		 * just return zero ... unless it's an inheritance child.
 		 * In that case, we want to generate a parallel path here anyway.  It
 		 * might not be worthwhile just for this relation, but when combined
 		 * with all of its inheritance siblings it may well pay off.
 		 */
-		if (heap_pages < (BlockNumber) min_parallel_table_scan_size &&
-			index_pages < (BlockNumber) min_parallel_index_scan_size &&
-			rel->reloptkind == RELOPT_BASEREL)
+		if (rel->reloptkind == RELOPT_BASEREL &&
+			((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
+			(index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
 			return 0;
 
-		if (heap_pages > 0)
+		if (heap_pages >= 0)
 		{
+			int		heap_parallel_threshold;
+			int		heap_parallel_workers = 1;
+
 			/*
 			 * Select the number of workers based on the log of the size of
 			 * the relation.  This probably needs to be a good deal more
@@ -3008,8 +3008,11 @@ compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
 			parallel_workers = heap_parallel_workers;
 		}
 
-		if (index_pages > 0)
+		if (index_pages >= 0)
 		{
+			int		index_parallel_workers = 1;
+			int		index_parallel_threshold;
+
 			/* same calculation as for heap_pages above */
 			index_parallel_threshold = Max(min_parallel_index_scan_size, 1);
 			while (index_pages >= (BlockNumber) (index_parallel_threshold * 3))
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e78f3a8..d154ab0 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -663,14 +663,20 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 	if (partial_path)
 	{
 		/*
+		 * For index only scans compute workers based on number of index pages
+		 * fetched.
+		 */
+		if (indexonly)
+			rand_heap_pages = -1;
+
+		/*
 		 * Estimate the number of parallel workers required to scan index. Use
 		 * the number of heap pages computed considering heap fetches won't be
 		 * sequential as for parallel scans the pages are accessed in random
 		 * order.
 		 */
 		path->path.parallel_workers = compute_parallel_worker(baserel,
-											   (BlockNumber) rand_heap_pages,
-												  (BlockNumber) index_pages);
+											   rand_heap_pages, index_pages);
 
 		/*
 		 * Fall out if workers can't be assigned for parallel scan, because in
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 247fd11..25fe78c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -54,8 +54,8 @@ extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
 extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
-extern int compute_parallel_worker(RelOptInfo *rel, BlockNumber heap_pages,
-						BlockNumber index_pages);
+extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
+						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
 										Path *bitmapqual);
 
#15Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#14)
Re: Parallel seq. plan is not coming against inheritance or partition table

On Mon, Mar 13, 2017 at 10:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Agreed, so I have rebased your patch and passed heap_pages as -1 for
index_only scans as discussed. Also, Rafia has tested with attached
patch that parallel index and parallel index only scans are picked for
TPC-H queries. I have also reviewed and tested your changes with
respect to problem reported and found that it works as expected. So,
I think we can go ahead with attached patch unless you see some
problem with the changes I have made.

OK, committed with a little more wordsmithing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers