why not parallel seq scan for slow functions

dilipbalaut@gmail.com

over 8 years ago

In reply to: Jeff Janes (#1)

Re: why not parallel seq scan for slow functions

On Tue, Jul 11, 2017 at 9:02 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

If I have a slow function which is evaluated in a simple seq scan, I do not
get parallel execution, even though it would be massively useful. Unless
force_parallel_mode=ON, then I get a dummy parallel plan with one worker.

explain select aid,slow(abalance) from pgbench_accounts;

After analysing this, I see multiple reasons of this getting not selected

1. The query is selecting all the tuple and the benefit what we are
getting by parallelism is by dividing cpu_tuple_cost which is 0.01 but
for each tuple sent from worker to gather there is parallel_tuple_cost
which is 0.1 for each tuple. (which will be very less in case of
aggregate). Maybe you can try some selecting with some condition.

like below:
postgres=# explain select slow(abalance) from pgbench_accounts where
abalance > 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather (cost=0.00..46602.33 rows=1 width=4)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..46602.33
rows=1 width=4)
Filter: (abalance > 1)

2. The second problem I am seeing is that (maybe the code problem),
the "slow" function is very costly (10000000) and in
apply_projection_to_path we account for this cost. But, I have
noticed that for gather node also we are adding this cost to all the
rows but actually, if the lower node is already doing the projection
then gather node just need to send out the tuple instead of actually
applying the projection.

In below function, we always multiply the target->cost.per_tuple with
path->rows, but in case of gather it should multiply this with
subpath->rows

apply_projection_to_path()
....

path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

CREATE OR REPLACE FUNCTION slow(integer)
RETURNS integer
LANGUAGE plperl
IMMUTABLE PARALLEL SAFE STRICT COST 10000000
AS $function$
my $thing=$_[0];
foreach (1..1_000_000) {
$thing = sqrt($thing);
$thing *= $thing;
};
return ($thing+0);
$function$;

The partial path is getting added to the list of paths, it is just not
getting chosen, even if parallel_*_cost are set to zero. Why not?

If I do an aggregate, then it does use parallel workers:

explain select sum(slow(abalance)) from pgbench_accounts;

It doesn't use as many as I would like, because there is a limit based on
the logarithm of the table size (I'm using -s 10 and get 3 parallel
processes) , but at least I know how to start looking into that.

Also, how do you debug stuff like this? Are there some gdb tricks to make
this easier to introspect into the plans?

Cheers,

Jeff

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

jeff.janes@gmail.com

over 8 years ago

In reply to: Dilip Kumar (#2)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

In below function, we always multiply the target->cost.per_tuple with
path->rows, but in case of gather it should multiply this with
subpath->rows

apply_projection_to_path()
....

path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.

So the attached patch improves things, but doesn't go far enough.

Cheers,

Jeff

Attachments:

subpath_projection_cost.patchapplication/octet-stream; name=subpath_projection_cost.patchDownload

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
new file mode 100644
index f2d6385..e7cd6df
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*************** apply_projection_to_path(PlannerInfo *ro
*** 2417,2422 ****
--- 2417,2424 ----
  								   gpath->subpath->parent,
  								   gpath->subpath,
  								   target);
+ 		path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+ 		path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
  	}
  	else if (path->parallel_safe &&
  			 !is_parallel_safe(root, (Node *) target->exprs))

amit.kapila16@gmail.com

over 8 years ago

In reply to: Jeff Janes (#3)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.

I think that is genuine because tuple communication cost is very high.
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))

So the attached patch improves things, but doesn't go far enough.

It seems to that we need to adjust the cost based on if the below node
is projection capable. See attached.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

subpath_projection_cost.2.patchapplication/octet-stream; name=subpath_projection_cost.2.patchDownload

diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index f2d6385..d8992eb 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2407,16 +2407,27 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the GatherPath's cost estimates; it might
-		 * be appropriate to do so, to reflect the fact that the bulk of the
-		 * target evaluation will happen in workers.
 		 */
 		gpath->subpath = (Path *)
 			create_projection_path(root,
 								   gpath->subpath->parent,
 								   gpath->subpath,
 								   target);
+
+		/*
+		 * Adjust the cost of GatherPath to reflect the fact that the bulk of
+		 * the target evaluation will happen in workers.
+		 */
+		if (((ProjectionPath *) gpath->subpath)->dummypp)
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
+		}
+		else
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * gpath->subpath->rows;
+		}
 	}
 	else if (path->parallel_safe &&
 			 !is_parallel_safe(root, (Node *) target->exprs))

dilipbalaut@gmail.com

over 8 years ago

In reply to: Amit Kapila (#4)

Re: why not parallel seq scan for slow functions

On Wed, Jul 12, 2017 at 10:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.

I think that is genuine because tuple communication cost is very high.
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))

So the attached patch improves things, but doesn't go far enough.

It seems to that we need to adjust the cost based on if the below node
is projection capable. See attached.

Patch looks good to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

jeff.janes@gmail.com

over 8 years ago

In reply to: Amit Kapila (#4)

Re: why not parallel seq scan for slow functions

On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>

wrote:

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as

an

argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.

I think that is genuine because tuple communication cost is very high.

Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.

I agree that the communication cost is high, which is why I don't want to
have to set parellel_tuple_cost very low. For example, to get the benefit
of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
real-world case, not the dummy test case I posted, although the number are
around the same for that one too). But with a setting that low, all kinds
of other things also start using parallel plans, even if they don't benefit
from them and are harmed.

I realize we need to do some aggressive pruning to avoid an exponential
explosion in planning time, but in this case it has some rather unfortunate
consequences. I wanted to explore it, but I can't figure out where this
particular pruning is taking place.

By the time we get to planner.c line 1787, current_rel->pathlist already
does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
pruning is happening earlier than that.

If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))

Setting parallel_workers to 8 changes the threshold for the parallel to
even be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it
is going in the correct direction, but not by enough to matter.

Cheers,

Jeff

amit.kapila16@gmail.com

over 8 years ago

In reply to: Jeff Janes (#6)

Re: why not parallel seq scan for slow functions

On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

I think you are correct. However, unless parallel_tuple_cost is set
very
low, apply_projection_to_path never gets called with the Gather path as
an
argument. It gets ruled out at some earlier stage, presumably because
it
assumes the projection step cannot make it win if it is already behind
by
enough.

I think that is genuine because tuple communication cost is very high.

Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.

Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.

I agree that the communication cost is high, which is why I don't want to
have to set parellel_tuple_cost very low. For example, to get the benefit
of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
real-world case, not the dummy test case I posted, although the number are
around the same for that one too). But with a setting that low, all kinds
of other things also start using parallel plans, even if they don't benefit
from them and are harmed.

I realize we need to do some aggressive pruning to avoid an exponential
explosion in planning time, but in this case it has some rather unfortunate
consequences. I wanted to explore it, but I can't figure out where this
particular pruning is taking place.

By the time we get to planner.c line 1787, current_rel->pathlist already
does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
pruning is happening earlier than that.

Check generate_gather_paths.

If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))

Setting parallel_workers to 8 changes the threshold for the parallel to even
be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it is
going in the correct direction, but not by enough to matter.

You might want to play with cpu_tuple_cost and or seq_page_cost.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Kapila (#7)

Re: why not parallel seq scan for slow functions

On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Setting parallel_workers to 8 changes the threshold for the parallel to even
be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it is
going in the correct direction, but not by enough to matter.

You might want to play with cpu_tuple_cost and or seq_page_cost.

I don't know whether the patch will completely solve your problem, but
this seems to be the right thing to do. Do you think we should stick
this for next CF?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

jeff.janes@gmail.com

over 8 years ago

In reply to: Amit Kapila (#8)

Re: why not parallel seq scan for slow functions

On Sat, Jul 22, 2017 at 8:53 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com>

wrote:

Setting parallel_workers to 8 changes the threshold for the parallel to

even

be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it is
going in the correct direction, but not by enough to matter.

You might want to play with cpu_tuple_cost and or seq_page_cost.

I don't know whether the patch will completely solve your problem, but
this seems to be the right thing to do. Do you think we should stick
this for next CF?

It doesn't solve the problem for me, but I agree it is an improvement we
should commit.

Cheers,

Jeff

#10

amit.kapila16@gmail.com

over 8 years ago

In reply to: Jeff Janes (#9)

Re: why not parallel seq scan for slow functions

On Mon, Jul 24, 2017 at 9:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Sat, Jul 22, 2017 at 8:53 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:

Setting parallel_workers to 8 changes the threshold for the parallel to
even
be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it
is
going in the correct direction, but not by enough to matter.

You might want to play with cpu_tuple_cost and or seq_page_cost.

I don't know whether the patch will completely solve your problem, but
this seems to be the right thing to do. Do you think we should stick
this for next CF?

It doesn't solve the problem for me, but I agree it is an improvement we
should commit.

Okay, added the patch for next CF.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

jeff.janes@gmail.com

over 8 years ago

In reply to: Amit Kapila (#7)

Re: why not parallel seq scan for slow functions

On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com>

wrote:

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

I think you are correct. However, unless parallel_tuple_cost is set
very
low, apply_projection_to_path never gets called with the Gather path

as

an
argument. It gets ruled out at some earlier stage, presumably because
it
assumes the projection step cannot make it win if it is already behind
by
enough.

I think that is genuine because tuple communication cost is very high.

Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.

Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.

If I understand it correctly, we have a way, it just can lead to
exponential explosion problem, so we are afraid to use it, correct? If I
just lobotomize the path domination code (make pathnode.c line 466 always
test false)

if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)

Then it keeps the parallel plan and later chooses to use it (after applying
your other patch in this thread) as the overall best plan. It even doesn't
slow down "make installcheck-parallel" by very much, which I guess just
means the regression tests don't have a lot of complex joins.

But what is an acceptable solution? Is there a heuristic for when
retaining a parallel path could be helpful, the same way there is for
fast-start paths? It seems like the best thing would be to include the
evaluation costs in the first place at this step.

Why is the path-cost domination code run before the cost of the function
evaluation is included? Is that because the information needed to compute
it is not available at that point, or because it would be too slow to
include it at that point? Or just because no one thought it important to do?

Cheers,

Jeff

#12

amit.kapila16@gmail.com

over 8 years ago

In reply to: Jeff Janes (#11)

Re: why not parallel seq scan for slow functions

On Wed, Aug 2, 2017 at 11:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com>
wrote:

On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:

So because of this high projection cost the seqpath and parallel
path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

I think you are correct. However, unless parallel_tuple_cost is set
very
low, apply_projection_to_path never gets called with the Gather path
as
an
argument. It gets ruled out at some earlier stage, presumably
because
it
assumes the projection step cannot make it win if it is already
behind
by
enough.

I think that is genuine because tuple communication cost is very high.

Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.

Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.

If I understand it correctly, we have a way, it just can lead to exponential
explosion problem, so we are afraid to use it, correct? If I just
lobotomize the path domination code (make pathnode.c line 466 always test
false)

if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)

Then it keeps the parallel plan and later chooses to use it (after applying
your other patch in this thread) as the overall best plan. It even doesn't
slow down "make installcheck-parallel" by very much, which I guess just
means the regression tests don't have a lot of complex joins.

But what is an acceptable solution? Is there a heuristic for when retaining
a parallel path could be helpful, the same way there is for fast-start
paths? It seems like the best thing would be to include the evaluation
costs in the first place at this step.

Why is the path-cost domination code run before the cost of the function
evaluation is included?

Because the function evaluation is part of target list and we create
path target after the creation of base paths (See call to
create_pathtarget @ planner.c:1696).

Is that because the information needed to compute
it is not available at that point,

Right.

I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path. This has the danger of path
explosion. (b) In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.

I think the preference will be to do something on the lines of
approach (b), but I am not sure whether we can easily do that.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#12)

Re: why not parallel seq scan for slow functions

On Tue, Aug 8, 2017 at 3:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Right.

I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path. This has the danger of path
explosion. (b) In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.

I think the right approach to this problem is to get the cost of the
GatherPath correct when it's initially created. The proposed patch
changes the cost after-the-fact, but that (1) doesn't prevent a
promising path from being rejected before we reach this point and (2)
is probably unsafe, because it might confuse code that reaches the
modified-in-place path through some other pointer (e.g. code which
expects the RelOptInfo's paths to still be sorted by cost). Perhaps
the way to do that is to skip generate_gather_paths() for the toplevel
scan/join node and do something similar later, after we know what
target list we want.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#13)

Re: why not parallel seq scan for slow functions

On Thu, Aug 10, 2017 at 1:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Aug 8, 2017 at 3:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Right.

I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path. This has the danger of path
explosion. (b) In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.

I think the right approach to this problem is to get the cost of the
GatherPath correct when it's initially created. The proposed patch
changes the cost after-the-fact, but that (1) doesn't prevent a
promising path from being rejected before we reach this point and (2)
is probably unsafe, because it might confuse code that reaches the
modified-in-place path through some other pointer (e.g. code which
expects the RelOptInfo's paths to still be sorted by cost). Perhaps
the way to do that is to skip generate_gather_paths() for the toplevel
scan/join node and do something similar later, after we know what
target list we want.

I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#14)

Re: why not parallel seq scan for slow functions

On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.

I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#15)

Re: why not parallel seq scan for slow functions

On Tue, Aug 15, 2017 at 7:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.

I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.

I am suggesting to call generate_gather_paths just before we try to
apply projection on paths in grouping_planner (file:planner.c;
line:1787; commit:004a9702). Won't the target list for upper nodes be
available at that point?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#16)

Re: why not parallel seq scan for slow functions

On Wed, Aug 16, 2017 at 7:23 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 15, 2017 at 7:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.

I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.

I am suggesting to call generate_gather_paths just before we try to
apply projection on paths in grouping_planner (file:planner.c;
line:1787; commit:004a9702). Won't the target list for upper nodes be
available at that point?

Oh, yes. Apparently I misunderstood your proposal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

dilipbalaut@gmail.com

over 8 years ago

In reply to: Amit Kapila (#14)

Re: why not parallel seq scan for slow functions

On Sat, Aug 12, 2017 at 6:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 10, 2017 at 1:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Aug 8, 2017 at 3:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Right.

I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump).

Either we can pass "num_gene" to merge_clump or we can store num_gene
in the root. And inside merge_clump we can check. Do you see some more
complexity?

if (joinrel)

{
/* Create GatherPaths for any useful partial paths for rel */
if (old_clump->size + new_clump->size < num_gene)
generate_gather_paths(root, joinrel);

}

Assuming, we find

some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

dilipbalaut@gmail.com

over 8 years ago

In reply to: Dilip Kumar (#18)

Re: why not parallel seq scan for slow functions

On Thu, Aug 17, 2017 at 2:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Either we can pass "num_gene" to merge_clump or we can store num_gene
in the root. And inside merge_clump we can check. Do you see some more
complexity?

After putting some more thought I see one more problem but not sure
whether we can solve it easily. Now, if we skip generating the gather
path at top level node then our cost comparison while adding the
element to pool will not be correct as we are skipping some of the
paths (gather path). And, it's very much possible that the path1 is
cheaper than path2 without adding gather on top of it but with gather,
path2 can be cheaper. But, there is not an easy way to handle it
because without targetlist we can not calculate the cost of the
gather(which is the actual problem).

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

amit.kapila16@gmail.com

over 8 years ago

In reply to: Dilip Kumar (#19)

Re: why not parallel seq scan for slow functions

On Thu, Aug 17, 2017 at 2:45 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Thu, Aug 17, 2017 at 2:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:

Either we can pass "num_gene" to merge_clump or we can store num_gene
in the root. And inside merge_clump we can check. Do you see some more
complexity?

I think something like that should work.

After putting some more thought I see one more problem but not sure
whether we can solve it easily. Now, if we skip generating the gather
path at top level node then our cost comparison while adding the
element to pool will not be correct as we are skipping some of the
paths (gather path). And, it's very much possible that the path1 is
cheaper than path2 without adding gather on top of it but with gather,
path2 can be cheaper.

I think that should not matter because the costing of gather is mainly
based on a number of rows and that should be same for both path1 and
path2 in this case.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

dilipbalaut@gmail.com

over 8 years ago

In reply to: Amit Kapila (#20)

Re: why not parallel seq scan for slow functions

On Fri, 18 Aug 2017 at 4:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Aug 17, 2017 at 2:45 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:

On Thu, Aug 17, 2017 at 2:09 PM, Dilip Kumar <dilipbalaut@gmail.com>

wrote:

Either we can pass "num_gene" to merge_clump or we can store num_gene
in the root. And inside merge_clump we can check. Do you see some more
complexity?

I think something like that should work.

After putting some more thought I see one more problem but not sure
whether we can solve it easily. Now, if we skip generating the gather
path at top level node then our cost comparison while adding the
element to pool will not be correct as we are skipping some of the
paths (gather path). And, it's very much possible that the path1 is
cheaper than path2 without adding gather on top of it but with gather,
path2 can be cheaper.

I think that should not matter because the costing of gather is mainly
based on a number of rows and that should be same for both path1 and
path2 in this case.

Yeah, I think you are right.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#22

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#17)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Wed, Aug 16, 2017 at 5:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 16, 2017 at 7:23 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 15, 2017 at 7:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.

I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.

I am suggesting to call generate_gather_paths just before we try to
apply projection on paths in grouping_planner (file:planner.c;
line:1787; commit:004a9702). Won't the target list for upper nodes be
available at that point?

Oh, yes. Apparently I misunderstood your proposal.

Thanks for acknowledging the idea. I have written a patch which
implements the above idea. At this stage, it is merely to move the
discussion forward. Few things which I am not entirely happy about
this patch are:

(a) To skip generating gather path for top level scan node, I have
used the number of relations which has RelOptInfo, basically
simple_rel_array_size. Is there any problem with it or do you see any
better way?
(b) I have changed the costing of gather path for path target in
generate_gather_paths which I am not sure is the best way. Another
possibility could have been that I change the code in
apply_projection_to_path as done in the previous patch and just call
it from generate_gather_paths. I have not done that because of your
comment above thread ("is probably unsafe, because it might confuse
code that reaches the modified-in-place path through some other
pointer (e.g. code which expects the RelOptInfo's paths to still be
sorted by cost)."). It is not clear to me what exactly is bothering
you if we directly change costing in apply_projection_to_path.

Thoughts?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v1.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v1.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index b5cab0c..ee871d6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -265,7 +266,8 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 			if (joinrel)
 			{
 				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -283,7 +285,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d8..20c7b21 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,14 +479,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
-	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * If this is a baserel and not the only rel, consider gathering any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't
+	 * do this for otherrels.  Instead, we'll consider gathering partial paths
+	 * for the parent appendrel.)
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL && root->simple_rel_array_size > 2)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2192,14 +2193,21 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
- * path that some GatherPath or GatherMergePath has a reference to.)
+ * for the specified relation (Otherwise, add_partial_path might delete a
+ * path that some GatherPath or GatherMergePath has a reference to.) and path
+ * target for top level scan/join node is available.
+ *
+ * For GatherPath, we try to push the new target down to its input as well. We
+ * need to do this here instead of doing it in apply_projection_to_path as it
+ * gives us chance to account for the fact that target evaluation can be
+ * performed by workers when it is safe to do so.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
+	QualCost	oldcost;
 	ListCell   *lc;
 
 	/* If there are no partial paths, there's nothing to do here. */
@@ -2215,6 +2223,53 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, NULL);
+
+	/*
+	 * We override the final target list into the gather path and update its
+	 * cost estimates accordingly.
+	 */
+	if (target && simple_gather_path->pathtarget != target)
+	{
+		oldcost = simple_gather_path->pathtarget->cost;
+		simple_gather_path->pathtarget = target;
+
+		if (is_parallel_safe(root, (Node *) target->exprs))
+		{
+			GatherPath *gpath = (GatherPath *) simple_gather_path;
+
+			simple_gather_path->startup_cost += target->cost.startup - oldcost.startup;
+
+			/*
+			 * We always use create_projection_path here, even if the subpath is
+			 * projection-capable, so as to avoid modifying the subpath in place.
+			 * It seems unlikely at present that there could be any other
+			 * references to the subpath, but better safe than sorry.
+			 */
+			gpath->subpath = (Path *)
+				create_projection_path(root,
+									   gpath->subpath->parent,
+									   gpath->subpath,
+									   target);
+
+			/*
+			 * Adjust the cost of GatherPath to reflect the fact that the bulk of
+			 * the target evaluation will happen in workers.
+			 */
+			if (((ProjectionPath *) gpath->subpath)->dummypp)
+				simple_gather_path->total_cost += target->cost.startup - oldcost.startup +
+							(target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
+			else
+				simple_gather_path->total_cost += target->cost.startup - oldcost.startup +
+							(cpu_tuple_cost + target->cost.per_tuple) * gpath->subpath->rows;
+		}
+		else
+		{
+			simple_gather_path->startup_cost += target->cost.startup - oldcost.startup;
+			simple_gather_path->total_cost += target->cost.startup - oldcost.startup +
+				(target->cost.per_tuple - oldcost.per_tuple) * simple_gather_path->rows;
+		}
+	}
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2224,14 +2279,18 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	*path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
+		path = (Path *) create_gather_merge_path(root, rel, subpath, rel->reltarget,
 										subpath->pathkeys, NULL, NULL);
-		add_path(rel, &path->path);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2397,7 +2456,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			rel = (RelOptInfo *) lfirst(lc);
 
 			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index fdef00a..1b891c1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1812,6 +1812,18 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * Consider ways to implement parallel paths.  We always skip
+		 * generating parallel path for top level scan/join nodes till the
+		 * pathtarget is computed.  This is to ensure that we can account
+		 * for the fact that most of the target evaluation work will be
+		 * performed in workers.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/* Set or update cheapest_total_path and related fields */
+		set_cheapest(current_rel);
+
+		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
 		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb..0bc7b09 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2354,10 +2354,6 @@ create_projection_path(PlannerInfo *root,
  * knows that the given path isn't referenced elsewhere and so can be modified
  * in-place.
  *
- * If the input path is a GatherPath, we try to push the new target down to
- * its input as well; this is a yet more invasive modification of the input
- * path, which create_projection_path() can't do.
- *
  * Note that we mustn't change the source path's parent link; so when it is
  * add_path'd to "rel" things will be a bit inconsistent.  So far that has
  * not caused any trouble.
@@ -2392,35 +2388,8 @@ apply_projection_to_path(PlannerInfo *root,
 	path->total_cost += target->cost.startup - oldcost.startup +
 		(target->cost.per_tuple - oldcost.per_tuple) * path->rows;
 
-	/*
-	 * If the path happens to be a Gather path, we'd like to arrange for the
-	 * subpath to return the required target list so that workers can help
-	 * project.  But if there is something that is not parallel-safe in the
-	 * target expressions, then we can't.
-	 */
-	if (IsA(path, GatherPath) &&
-		is_parallel_safe(root, (Node *) target->exprs))
-	{
-		GatherPath *gpath = (GatherPath *) path;
-
-		/*
-		 * We always use create_projection_path here, even if the subpath is
-		 * projection-capable, so as to avoid modifying the subpath in place.
-		 * It seems unlikely at present that there could be any other
-		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the GatherPath's cost estimates; it might
-		 * be appropriate to do so, to reflect the fact that the bulk of the
-		 * target evaluation will happen in workers.
-		 */
-		gpath->subpath = (Path *)
-			create_projection_path(root,
-								   gpath->subpath->parent,
-								   gpath->subpath,
-								   target);
-	}
-	else if (path->parallel_safe &&
-			 !is_parallel_safe(root, (Node *) target->exprs))
+	if (path->parallel_safe &&
+		!is_parallel_safe(root, (Node *) target->exprs))
 	{
 		/*
 		 * We're inserting a parallel-restricted target list into a path
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 4e06b2e..a4ba769 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,

#23

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Amit Kapila (#22)

Re: why not parallel seq scan for slow functions

On 21 August 2017 at 10:08, Amit Kapila <amit.kapila16@gmail.com> wrote:

Thoughts?

This seems like a very basic problem for parallel queries.

The problem seems to be that we are calculating the cost of the plan
rather than the speed of the plan.

Clearly, a parallel task has a higher overall cost but a lower time to
complete if resources are available.

We have the choice of 1) adding a new optimizable quantity, or of 2)
treating cost = speed, so we actually reduce the cost of a parallel
plan rather than increasing it so it is more likely to be picked.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

amit.kapila16@gmail.com

over 8 years ago

In reply to: Simon Riggs (#23)

Re: why not parallel seq scan for slow functions

On Mon, Aug 21, 2017 at 3:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 21 August 2017 at 10:08, Amit Kapila <amit.kapila16@gmail.com> wrote:

Thoughts?

This seems like a very basic problem for parallel queries.

The problem seems to be that we are calculating the cost of the plan
rather than the speed of the plan.

Clearly, a parallel task has a higher overall cost but a lower time to
complete if resources are available.

We have the choice of 1) adding a new optimizable quantity,

I think this has the potential of making costing decisions difficult.
I mean to say, if we include any such new parameter, then we need to
consider that along with cost as we can't completely ignore the cost.

or of 2)
treating cost = speed, so we actually reduce the cost of a parallel
plan rather than increasing it so it is more likely to be picked.

Yeah, this is what is being currently followed for costing of parallel
plans and this patch also tries to follow the same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Simon Riggs

simon@2ndquadrant.com

over 8 years ago

In reply to: Amit Kapila (#24)

Re: why not parallel seq scan for slow functions

On 21 August 2017 at 11:42, Amit Kapila <amit.kapila16@gmail.com> wrote:

or of 2)
treating cost = speed, so we actually reduce the cost of a parallel
plan rather than increasing it so it is more likely to be picked.

Yeah, this is what is being currently followed for costing of parallel
plans and this patch also tries to follow the same.

OK, I understand this better now, thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#22)

Re: why not parallel seq scan for slow functions

On Mon, Aug 21, 2017 at 5:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for acknowledging the idea. I have written a patch which
implements the above idea. At this stage, it is merely to move the
discussion forward. Few things which I am not entirely happy about
this patch are:

(a) To skip generating gather path for top level scan node, I have
used the number of relations which has RelOptInfo, basically
simple_rel_array_size. Is there any problem with it or do you see any
better way?

I'm not sure.

(b) I have changed the costing of gather path for path target in
generate_gather_paths which I am not sure is the best way. Another
possibility could have been that I change the code in
apply_projection_to_path as done in the previous patch and just call
it from generate_gather_paths. I have not done that because of your
comment above thread ("is probably unsafe, because it might confuse
code that reaches the modified-in-place path through some other
pointer (e.g. code which expects the RelOptInfo's paths to still be
sorted by cost)."). It is not clear to me what exactly is bothering
you if we directly change costing in apply_projection_to_path.

The point I was trying to make is that if you retroactively change the
cost of a path after you've already done add_path(), it's too late to
change your mind about whether to keep the path. At least according
to my current understanding, that's the root of this problem in the
first place. On top of that, add_path() and other routines that deal
with RelOptInfo path lists expect surviving paths to be ordered by
descending cost; if you frob the cost, they might not be properly
ordered any more.

I don't really have time right now to give this patch the attention
which it deserves; I can possibly come back to it at some future
point, or maybe somebody else will have time to give it a look.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#26)

Re: why not parallel seq scan for slow functions

Robert Haas <robertmhaas@gmail.com> writes:

The point I was trying to make is that if you retroactively change the
cost of a path after you've already done add_path(), it's too late to
change your mind about whether to keep the path. At least according
to my current understanding, that's the root of this problem in the
first place. On top of that, add_path() and other routines that deal
with RelOptInfo path lists expect surviving paths to be ordered by
descending cost; if you frob the cost, they might not be properly
ordered any more.

Hadn't been paying attention to this thread, but I happened to notice
Robert's comment here, and I strongly agree: it is *not* cool to be
changing a path's cost (or, really, anything else about it) after
it's already been given to add_path. add_path has already made
irreversible choices on the basis of what it was given.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

amit.kapila16@gmail.com

over 8 years ago

In reply to: Robert Haas (#26)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Fri, Aug 25, 2017 at 10:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 21, 2017 at 5:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

(b) I have changed the costing of gather path for path target in
generate_gather_paths which I am not sure is the best way. Another
possibility could have been that I change the code in
apply_projection_to_path as done in the previous patch and just call
it from generate_gather_paths. I have not done that because of your
comment above thread ("is probably unsafe, because it might confuse
code that reaches the modified-in-place path through some other
pointer (e.g. code which expects the RelOptInfo's paths to still be
sorted by cost)."). It is not clear to me what exactly is bothering
you if we directly change costing in apply_projection_to_path.

The point I was trying to make is that if you retroactively change the
cost of a path after you've already done add_path(), it's too late to
change your mind about whether to keep the path. At least according
to my current understanding, that's the root of this problem in the
first place. On top of that, add_path() and other routines that deal
with RelOptInfo path lists expect surviving paths to be ordered by
descending cost; if you frob the cost, they might not be properly
ordered any more.

Okay, now I understand your point, but I think we already change the
cost of paths in apply_projection_to_path which is done after add_path
for top level scan/join paths. I think this matters a lot in case of
Gather because the cost of computing target list can be divided among
workers. I have changed the patch such that parallel paths for top
level scan/join node will be generated after pathtarget is ready. I
had kept the costing of path targets local to
apply_projection_to_path() as that makes the patch simple.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v2.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v2.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index b5cab0c..ee871d6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -265,7 +266,8 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 			if (joinrel)
 			{
 				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -283,7 +285,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d8..f3e4892 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,14 +479,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
-	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * If this is a baserel and not the only rel, consider gathering any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't do
+	 * this for otherrels.  Instead, we'll consider gathering partial paths
+	 * for the parent appendrel.)
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL && root->simple_rel_array_size > 2)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2192,11 +2193,12 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
- * path that some GatherPath or GatherMergePath has a reference to.)
+ * for the specified relation (Otherwise, add_partial_path might delete a
+ * path that some GatherPath or GatherMergePath has a reference to.) and path
+ * target for top level scan/join node is available.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2215,6 +2217,11 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, NULL);
+
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root, rel, simple_gather_path, target);
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2224,14 +2231,18 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	   *path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, NULL);
-		add_path(rel, &path->path);
+		path = (Path *) create_gather_merge_path(root, rel, subpath, rel->reltarget,
+												 subpath->pathkeys, NULL, NULL);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2397,7 +2408,8 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			rel = (RelOptInfo *) lfirst(lc);
 
 			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9662302..93737da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1818,6 +1818,18 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * Consider ways to implement parallel paths.  We always skip
+		 * generating parallel path for top level scan/join nodes till the
+		 * pathtarget is computed.  This is to ensure that we can account for
+		 * the fact that most of the target evaluation work will be performed
+		 * in workers.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/* Set or update cheapest_total_path and related fields */
+		set_cheapest(current_rel);
+
+		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
 		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb..41f98b3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2408,16 +2408,27 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the GatherPath's cost estimates; it might
-		 * be appropriate to do so, to reflect the fact that the bulk of the
-		 * target evaluation will happen in workers.
 		 */
 		gpath->subpath = (Path *)
 			create_projection_path(root,
 								   gpath->subpath->parent,
 								   gpath->subpath,
 								   target);
+
+		/*
+		 * Adjust the cost of GatherPath to reflect the fact that the bulk of
+		 * the target evaluation will happen in workers.
+		 */
+		if (((ProjectionPath *) gpath->subpath)->dummypp)
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
+		}
+		else
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * gpath->subpath->rows;
+		}
 	}
 	else if (path->parallel_safe &&
 			 !is_parallel_safe(root, (Node *) target->exprs))
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 4e06b2e..61694a0 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 2ae600f..a3f83c8 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -101,6 +101,23 @@ explain (costs off)
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
 (5 rows)
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain select ten, costly_func(ten) from tenk1;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Gather  (cost=0.00..623882.94 rows=9976 width=8)
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1  (cost=0.00..623882.94 rows=2494 width=8)
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 89fe80a..c60e902 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -39,6 +39,17 @@ explain (costs off)
 	select  sum(parallel_restricted(unique1)) from tenk1
 	group by(parallel_restricted(unique1));
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#29

robertmhaas@gmail.com

over 8 years ago

In reply to: Amit Kapila (#28)

Re: why not parallel seq scan for slow functions

On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, now I understand your point, but I think we already change the
cost of paths in apply_projection_to_path which is done after add_path
for top level scan/join paths.

Yeah. I think that's a nasty hack, and I think it's Tom's fault. :-)

It's used in various places with comments like this:

/*
* The path might not return exactly what we want, so fix that. (We
* assume that this won't change any conclusions about which was the
* cheapest path.)
*/

And in another place:

* In principle we should re-run set_cheapest() here to identify the
* cheapest path, but it seems unlikely that adding the same tlist
* eval costs to all the paths would change that, so we don't bother.

I think these assumptions were a little shaky even before parallel
query came along, but they're now outright false, because we're not
adding the *same* tlist eval costs to all paths any more. The
parallel paths are getting smaller costs. That probably doesn't
matter much if the expressions in questions are things like a + b, but
when as in Jeff's example it's slow(a), then it matters a lot.

I'd feel a lot happier if Tom were to decide how this ought to be
fixed, because - in spite of some modifications by various parallel
query code - this is basically all his design and mostly his code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#29)

Re: why not parallel seq scan for slow functions

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, now I understand your point, but I think we already change the
cost of paths in apply_projection_to_path which is done after add_path
for top level scan/join paths.

Yeah. I think that's a nasty hack, and I think it's Tom's fault. :-)

Yeah, and it's also documented:

* This has the same net effect as create_projection_path(), except that if
* a separate Result plan node isn't needed, we just replace the given path's
* pathtarget with the desired one. This must be used only when the caller
* knows that the given path isn't referenced elsewhere and so can be modified
* in-place.

If somebody's applying apply_projection_to_path to a path that's already
been add_path'd, that's a violation of the documented restriction.

It might be that we should just get rid of apply_projection_to_path and
use create_projection_path, which is less mistake-prone at the cost of
manufacturing another level of Path node. Now that that has the dummypp
flag, it really shouldn't make any difference in terms of the accuracy of
the cost estimates.

I'd feel a lot happier if Tom were to decide how this ought to be
fixed, because - in spite of some modifications by various parallel
query code - this is basically all his design and mostly his code.

I can take a look, but not right away.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#30)

Re: why not parallel seq scan for slow functions

On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Okay, now I understand your point, but I think we already change the
cost of paths in apply_projection_to_path which is done after add_path
for top level scan/join paths.

Yeah. I think that's a nasty hack, and I think it's Tom's fault. :-)

Yeah, and it's also documented:

* This has the same net effect as create_projection_path(), except that if
* a separate Result plan node isn't needed, we just replace the given path's
* pathtarget with the desired one. This must be used only when the caller
* knows that the given path isn't referenced elsewhere and so can be modified
* in-place.

If somebody's applying apply_projection_to_path to a path that's already
been add_path'd, that's a violation of the documented restriction.

/me is confused. Isn't that exactly what grouping_planner() is doing,
and has done ever since your original pathification commit
(3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)? It's iterating over
current_rel->pathlist, so surely everything in there has been
add_path()'d.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#31)

Re: why not parallel seq scan for slow functions

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If somebody's applying apply_projection_to_path to a path that's already
been add_path'd, that's a violation of the documented restriction.

/me is confused. Isn't that exactly what grouping_planner() is doing,
and has done ever since your original pathification commit
(3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)? It's iterating over
current_rel->pathlist, so surely everything in there has been
add_path()'d.

I think the assumption there is that we no longer care about validity of
the input Relation, since we won't be looking at it any more (and
certainly not adding more paths to it). If there's some reason why
that's not true, then maybe grouping_planner has a bug there.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#32)

Re: why not parallel seq scan for slow functions

On Wed, Sep 6, 2017 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

If somebody's applying apply_projection_to_path to a path that's already
been add_path'd, that's a violation of the documented restriction.

/me is confused. Isn't that exactly what grouping_planner() is doing,
and has done ever since your original pathification commit
(3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)? It's iterating over
current_rel->pathlist, so surely everything in there has been
add_path()'d.

I think the assumption there is that we no longer care about validity of
the input Relation, since we won't be looking at it any more (and
certainly not adding more paths to it). If there's some reason why
that's not true, then maybe grouping_planner has a bug there.

Right, that's sorta what I assumed. But I think that thinking is
flawed in the face of parallel query, because of the fact that
apply_projection_to_path() pushes down target list projection below
Gather when possible. In particular, as Jeff and Amit point out, it
may well be that (a) before apply_projection_to_path(), the cheapest
plan is non-parallel and (b) after apply_projection_to_path(), the
cheapest plan would be a Gather plan, except that it's too late
because we've already thrown that path out.

What we ought to do, I think, is avoid generating gather paths until
after we've applied the target list (and the associated costing
changes) to both the regular path list and the partial path list.
Then the cost comparison is apples-to-apples. The use of
apply_projection_to_path() on every path in the pathlist would be fine
if it were adjusting all the costs by a uniform amount, but it isn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#33)

Re: why not parallel seq scan for slow functions

Robert Haas <robertmhaas@gmail.com> writes:

In particular, as Jeff and Amit point out, it
may well be that (a) before apply_projection_to_path(), the cheapest
plan is non-parallel and (b) after apply_projection_to_path(), the
cheapest plan would be a Gather plan, except that it's too late
because we've already thrown that path out.

I'm not entirely following. I thought that add_path was set up to treat
"can be parallelized" as an independent dimension of merit, so that
parallel paths would always survive.

What we ought to do, I think, is avoid generating gather paths until
after we've applied the target list (and the associated costing
changes) to both the regular path list and the partial path list.

Might be a tad messy to rearrange things that way.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

robertmhaas@gmail.com

over 8 years ago

In reply to: Tom Lane (#34)

Re: why not parallel seq scan for slow functions

On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

In particular, as Jeff and Amit point out, it
may well be that (a) before apply_projection_to_path(), the cheapest
plan is non-parallel and (b) after apply_projection_to_path(), the
cheapest plan would be a Gather plan, except that it's too late
because we've already thrown that path out.

I'm not entirely following. I thought that add_path was set up to treat
"can be parallelized" as an independent dimension of merit, so that
parallel paths would always survive.

It treats parallel-safety as an independent dimension of merit; a
parallel-safe plan is more meritorious than one of equal cost which is
not. We need that so that because, for example, forming a partial
path for a join means joining a partial path to a parallel-safe path.
But that doesn't help us here; that's to make sure we can build the
necessary stuff *below* the Gather. IOW, if we threw away
parallel-safe paths because there was a cheaper parallel-restricted
path, we might be unable to build a partial path for the join *at
all*.

Here, the Gather path is not parallel-safe, but rather
parallel-restricted: it's OK for it to exist in a plan that uses
parallelism (duh), but it can't be nested under another Gather (also
duh, kinda). So before accounting for the differing projection cost,
the Gather path is doubly inferior: it is more expensive AND not
parallel-safe, whereas the competing non-parallel plan is both cheaper
AND parallel-safe. After applying the expensive target list, the
parallel-safe plan gets a lot more expensive, but the Gather path gets
more expensive to a lesser degree because the projection step ends up
below the Gather and thus happens in parallel, so now the Gather plan,
still a loser on parallel-safety, is a winner on total cost and thus
ought to have been retained and, in fact, ought to have won. Instead,
we threw it out too early.

What we ought to do, I think, is avoid generating gather paths until
after we've applied the target list (and the associated costing
changes) to both the regular path list and the partial path list.

Might be a tad messy to rearrange things that way.

Why do you think I wanted you to do it? :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Robert Haas (#35)

Re: why not parallel seq scan for slow functions

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm not entirely following. I thought that add_path was set up to treat
"can be parallelized" as an independent dimension of merit, so that
parallel paths would always survive.

Here, the Gather path is not parallel-safe, but rather
parallel-restricted:

Ah, right, the problem is with the Gather not its sub-paths.

Might be a tad messy to rearrange things that way.

Why do you think I wanted you to do it? :-)

I'll think about it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Amit Khandekar

amitdkhan.pg@gmail.com

over 8 years ago

In reply to: Amit Kapila (#28)

Re: why not parallel seq scan for slow functions

On 5 September 2017 at 14:04, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Aug 25, 2017 at 10:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 21, 2017 at 5:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

(b) I have changed the costing of gather path for path target in
generate_gather_paths which I am not sure is the best way. Another
possibility could have been that I change the code in
apply_projection_to_path as done in the previous patch and just call
it from generate_gather_paths. I have not done that because of your
comment above thread ("is probably unsafe, because it might confuse
code that reaches the modified-in-place path through some other
pointer (e.g. code which expects the RelOptInfo's paths to still be
sorted by cost)."). It is not clear to me what exactly is bothering
you if we directly change costing in apply_projection_to_path.

The point I was trying to make is that if you retroactively change the
cost of a path after you've already done add_path(), it's too late to
change your mind about whether to keep the path. At least according
to my current understanding, that's the root of this problem in the
first place. On top of that, add_path() and other routines that deal
with RelOptInfo path lists expect surviving paths to be ordered by
descending cost; if you frob the cost, they might not be properly
ordered any more.

Okay, now I understand your point, but I think we already change the
cost of paths in apply_projection_to_path which is done after add_path
for top level scan/join paths. I think this matters a lot in case of
Gather because the cost of computing target list can be divided among
workers. I have changed the patch such that parallel paths for top
level scan/join node will be generated after pathtarget is ready. I
had kept the costing of path targets local to
apply_projection_to_path() as that makes the patch simple.

I started with a quick review ... a couple of comments below :

- * If this is a baserel, consider gathering any partial paths we may have
- * created for it.  (If we tried to gather inheritance children, we could
+ * If this is a baserel and not the only rel, consider gathering any
+ * partial paths we may have created for it.  (If we tried to gather

  /* Create GatherPaths for any useful partial paths for rel */
-  generate_gather_paths(root, rel);
+  if (lev < levels_needed)
+     generate_gather_paths(root, rel, NULL);

I think at the above two places, and may be in other place also, it's
better to mention the reason why we should generate the gather path
only if it's not the only rel.

----------

-       if (rel->reloptkind == RELOPT_BASEREL)
-               generate_gather_paths(root, rel);
+       if (rel->reloptkind == RELOPT_BASEREL &&
root->simple_rel_array_size > 2)
+               generate_gather_paths(root, rel, NULL);

Above, in case it's a partitioned table, root->simple_rel_array_size
includes the child rels. So even if it's a simple select without a
join rel, simple_rel_array_size would be > 2, and so gather path would
be generated here for the root table, and again in grouping_planner().

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Khandekar (#37)

Re: why not parallel seq scan for slow functions

On Tue, Sep 12, 2017 at 5:47 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:

On 5 September 2017 at 14:04, Amit Kapila <amit.kapila16@gmail.com> wrote:

I started with a quick review ... a couple of comments below :
- * If this is a baserel, consider gathering any partial paths we may have
- * created for it.  (If we tried to gather inheritance children, we could
+ * If this is a baserel and not the only rel, consider gathering any
+ * partial paths we may have created for it.  (If we tried to gather
/* Create GatherPaths for any useful partial paths for rel */
-  generate_gather_paths(root, rel);
+  if (lev < levels_needed)
+     generate_gather_paths(root, rel, NULL);
I think at the above two places, and may be in other place also, it's
better to mention the reason why we should generate the gather path
only if it's not the only rel.

I think the comment you are looking is present where we are calling
generate_gather_paths in grouping_planner. Instead of adding same or
similar comment at multiple places, how about if we just say something
like "See in grouping_planner where we generate gather paths" at all
other places?

----------
-       if (rel->reloptkind == RELOPT_BASEREL)
-               generate_gather_paths(root, rel);
+       if (rel->reloptkind == RELOPT_BASEREL &&
root->simple_rel_array_size > 2)
+               generate_gather_paths(root, rel, NULL);
Above, in case it's a partitioned table, root->simple_rel_array_size
includes the child rels. So even if it's a simple select without a
join rel, simple_rel_array_size would be > 2, and so gather path would
be generated here for the root table, and again in grouping_planner().

Yeah, that could be a problem. I think we should ensure that there is
no append rel list by checking root->append_rel_list. Can you think
of a better way to handle it?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

From https://travis-ci.org/postgresql-cfbot/postgresql/builds/277376953 .

amit.kapila16@gmail.com

over 8 years ago

In reply to: Amit Kapila (#38)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Wed, Sep 13, 2017 at 9:39 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Sep 12, 2017 at 5:47 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
On 5 September 2017 at 14:04, Amit Kapila <amit.kapila16@gmail.com> wrote:

I started with a quick review ... a couple of comments below :
- * If this is a baserel, consider gathering any partial paths we may have
- * created for it.  (If we tried to gather inheritance children, we could
+ * If this is a baserel and not the only rel, consider gathering any
+ * partial paths we may have created for it.  (If we tried to gather
/* Create GatherPaths for any useful partial paths for rel */
-  generate_gather_paths(root, rel);
+  if (lev < levels_needed)
+     generate_gather_paths(root, rel, NULL);
I think at the above two places, and may be in other place also, it's
better to mention the reason why we should generate the gather path
only if it's not the only rel.
I think the comment you are looking is present where we are calling
generate_gather_paths in grouping_planner. Instead of adding same or
similar comment at multiple places, how about if we just say something
like "See in grouping_planner where we generate gather paths" at all
other places?
----------
-       if (rel->reloptkind == RELOPT_BASEREL)
-               generate_gather_paths(root, rel);
+       if (rel->reloptkind == RELOPT_BASEREL &&
root->simple_rel_array_size > 2)
+               generate_gather_paths(root, rel, NULL);
Above, in case it's a partitioned table, root->simple_rel_array_size
includes the child rels. So even if it's a simple select without a
join rel, simple_rel_array_size would be > 2, and so gather path would
be generated here for the root table, and again in grouping_planner().
Yeah, that could be a problem. I think we should ensure that there is
no append rel list by checking root->append_rel_list. Can you think
of a better way to handle it?

The attached patch fixes both the review comments as discussed above.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v3.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v3.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index b5cab0c..d0e07bd 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -264,8 +265,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 			/* Keep searching if join order is not valid */
 			if (joinrel)
 			{
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				/*
+				 * Create GatherPaths for any useful partial paths for rel
+				 * other than top-level rel.  The gather path for top-level
+				 * rel is generated once path target is available.  See
+				 * grouping_planner.
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -283,7 +290,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 2d7e1d8..cbc094b 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,14 +479,18 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
-	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * If this is a baserel and not the top-level rel, consider gathering any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't do
+	 * this for otherrels.  Instead, we'll consider gathering partial paths
+	 * for the parent appendrel.).  The gather path for top-level rel is
+	 * generated once path target is available.  See grouping_planner.
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		root->simple_rel_array_size > 2 &&
+		!root->append_rel_list)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2192,11 +2196,12 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
- * path that some GatherPath or GatherMergePath has a reference to.)
+ * for the specified relation (Otherwise, add_partial_path might delete a
+ * path that some GatherPath or GatherMergePath has a reference to.) and path
+ * target for top level scan/join node is available.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2215,6 +2220,11 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, NULL);
+
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root, rel, simple_gather_path, target);
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2224,14 +2234,18 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	   *path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, NULL);
-		add_path(rel, &path->path);
+		path = (Path *) create_gather_merge_path(root, rel, subpath, rel->reltarget,
+												 subpath->pathkeys, NULL, NULL);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2396,8 +2410,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 		{
 			rel = (RelOptInfo *) lfirst(lc);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			/*
+			 * Create GatherPaths for any useful partial paths for rel other
+			 * than top-level rel.  The gather path for top-level rel is
+			 * generated once path target is available.  See grouping_planner.
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6b79b3a..d60cae9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1818,6 +1818,18 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * Consider ways to implement parallel paths.  We always skip
+		 * generating parallel path for top level scan/join nodes till the
+		 * pathtarget is computed.  This is to ensure that we can account for
+		 * the fact that most of the target evaluation work will be performed
+		 * in workers.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/* Set or update cheapest_total_path and related fields */
+		set_cheapest(current_rel);
+
+		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
 		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 26567cb..41f98b3 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2408,16 +2408,27 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the GatherPath's cost estimates; it might
-		 * be appropriate to do so, to reflect the fact that the bulk of the
-		 * target evaluation will happen in workers.
 		 */
 		gpath->subpath = (Path *)
 			create_projection_path(root,
 								   gpath->subpath->parent,
 								   gpath->subpath,
 								   target);
+
+		/*
+		 * Adjust the cost of GatherPath to reflect the fact that the bulk of
+		 * the target evaluation will happen in workers.
+		 */
+		if (((ProjectionPath *) gpath->subpath)->dummypp)
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
+		}
+		else
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * gpath->subpath->rows;
+		}
 	}
 	else if (path->parallel_safe &&
 			 !is_parallel_safe(root, (Node *) target->exprs))
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 4e06b2e..61694a0 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 2ae600f..a3f83c8 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -101,6 +101,23 @@ explain (costs off)
          ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
 (5 rows)
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain select ten, costly_func(ten) from tenk1;
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
+ Gather  (cost=0.00..623882.94 rows=9976 width=8)
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1  (cost=0.00..623882.94 rows=2494 width=8)
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 89fe80a..c60e902 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -39,6 +39,17 @@ explain (costs off)
 	select  sum(parallel_restricted(unique1)) from tenk1
 	group by(parallel_restricted(unique1));
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#40

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Amit Kapila (#39)

Re: why not parallel seq scan for slow functions

On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

The attached patch fixes both the review comments as discussed above.

This cost stuff looks unstable:

test select_parallel ... FAILED

! Gather (cost=0.00..623882.94 rows=9976 width=8)
Workers Planned: 4
! -> Parallel Seq Scan on tenk1 (cost=0.00..623882.94 rows=2494 width=8)
(3 rows)

  drop function costly_func(var1 integer);
--- 112,120 ----
  explain select ten, costly_func(ten) from tenk1;
                                   QUERY PLAN
  ----------------------------------------------------------------------------
!  Gather  (cost=0.00..625383.00 rows=10000 width=8)
     Workers Planned: 4
!    ->  Parallel Seq Scan on tenk1  (cost=0.00..625383.00 rows=2500 width=8)
  (3 rows)

drop function costly_func(var1 integer);

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41

jeff.janes@gmail.com

over 8 years ago

In reply to: Thomas Munro (#40)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro <thomas.munro@enterprisedb.com

wrote:

On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

The attached patch fixes both the review comments as discussed above.

This cost stuff looks unstable:

test select_parallel ... FAILED

! Gather (cost=0.00..623882.94 rows=9976 width=8)
Workers Planned: 4
! -> Parallel Seq Scan on tenk1 (cost=0.00..623882.94 rows=2494
width=8)
(3 rows)
drop function costly_func(var1 integer);
--- 112,120 ----
explain select ten, costly_func(ten) from tenk1;
QUERY PLAN
------------------------------------------------------------
----------------
!  Gather  (cost=0.00..625383.00 rows=10000 width=8)
Workers Planned: 4
!    ->  Parallel Seq Scan on tenk1  (cost=0.00..625383.00 rows=2500
width=8)
(3 rows)

that should be fixed by turning costs on the explain, as is the tradition.

See attached.

Cheers,

Jeff

Attachments:

parallel_paths_include_tlist_cost_v4.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v4.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
new file mode 100644
index b5cab0c..d0e07bd
*** a/src/backend/optimizer/geqo/geqo_eval.c
--- b/src/backend/optimizer/geqo/geqo_eval.c
*************** typedef struct
*** 40,46 ****
  } Clump;
  
  static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
! 			bool force);
  static bool desirable_join(PlannerInfo *root,
  			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
  
--- 40,46 ----
  } Clump;
  
  static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
! 			int num_gene, bool force);
  static bool desirable_join(PlannerInfo *root,
  			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
  
*************** gimme_tree(PlannerInfo *root, Gene *tour
*** 196,202 ****
  		cur_clump->size = 1;
  
  		/* Merge it into the clumps list, using only desirable joins */
! 		clumps = merge_clump(root, clumps, cur_clump, false);
  	}
  
  	if (list_length(clumps) > 1)
--- 196,202 ----
  		cur_clump->size = 1;
  
  		/* Merge it into the clumps list, using only desirable joins */
! 		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
  	}
  
  	if (list_length(clumps) > 1)
*************** gimme_tree(PlannerInfo *root, Gene *tour
*** 210,216 ****
  		{
  			Clump	   *clump = (Clump *) lfirst(lc);
  
! 			fclumps = merge_clump(root, fclumps, clump, true);
  		}
  		clumps = fclumps;
  	}
--- 210,216 ----
  		{
  			Clump	   *clump = (Clump *) lfirst(lc);
  
! 			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
  		}
  		clumps = fclumps;
  	}
*************** gimme_tree(PlannerInfo *root, Gene *tour
*** 235,241 ****
   * "desirable" joins.
   */
  static List *
! merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
  {
  	ListCell   *prev;
  	ListCell   *lc;
--- 235,242 ----
   * "desirable" joins.
   */
  static List *
! merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
! 			bool force)
  {
  	ListCell   *prev;
  	ListCell   *lc;
*************** merge_clump(PlannerInfo *root, List *clu
*** 264,271 ****
  			/* Keep searching if join order is not valid */
  			if (joinrel)
  			{
! 				/* Create GatherPaths for any useful partial paths for rel */
! 				generate_gather_paths(root, joinrel);
  
  				/* Find and save the cheapest paths for this joinrel */
  				set_cheapest(joinrel);
--- 265,278 ----
  			/* Keep searching if join order is not valid */
  			if (joinrel)
  			{
! 				/*
! 				 * Create GatherPaths for any useful partial paths for rel
! 				 * other than top-level rel.  The gather path for top-level
! 				 * rel is generated once path target is available.  See
! 				 * grouping_planner.
! 				 */
! 				if (old_clump->size + new_clump->size < num_gene)
! 					generate_gather_paths(root, joinrel, NULL);
  
  				/* Find and save the cheapest paths for this joinrel */
  				set_cheapest(joinrel);
*************** merge_clump(PlannerInfo *root, List *clu
*** 283,289 ****
  				 * others.  When no further merge is possible, we'll reinsert
  				 * it into the list.
  				 */
! 				return merge_clump(root, clumps, old_clump, force);
  			}
  		}
  		prev = lc;
--- 290,296 ----
  				 * others.  When no further merge is possible, we'll reinsert
  				 * it into the list.
  				 */
! 				return merge_clump(root, clumps, old_clump, num_gene, force);
  			}
  		}
  		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
new file mode 100644
index 5b746a9..74512da
*** a/src/backend/optimizer/path/allpaths.c
--- b/src/backend/optimizer/path/allpaths.c
*************** set_rel_pathlist(PlannerInfo *root, RelO
*** 480,493 ****
  	}
  
  	/*
! 	 * If this is a baserel, consider gathering any partial paths we may have
! 	 * created for it.  (If we tried to gather inheritance children, we could
! 	 * end up with a very large number of gather nodes, each trying to grab
! 	 * its own pool of workers, so don't do this for otherrels.  Instead,
! 	 * we'll consider gathering partial paths for the parent appendrel.)
  	 */
! 	if (rel->reloptkind == RELOPT_BASEREL)
! 		generate_gather_paths(root, rel);
  
  	/*
  	 * Allow a plugin to editorialize on the set of Paths for this base
--- 480,497 ----
  	}
  
  	/*
! 	 * If this is a baserel and not the top-level rel, consider gathering any
! 	 * partial paths we may have created for it.  (If we tried to gather
! 	 * inheritance children, we could end up with a very large number of
! 	 * gather nodes, each trying to grab its own pool of workers, so don't do
! 	 * this for otherrels.  Instead, we'll consider gathering partial paths
! 	 * for the parent appendrel.).  The gather path for top-level rel is
! 	 * generated once path target is available.  See grouping_planner.
  	 */
! 	if (rel->reloptkind == RELOPT_BASEREL &&
! 		root->simple_rel_array_size > 2 &&
! 		!root->append_rel_list)
! 		generate_gather_paths(root, rel, NULL);
  
  	/*
  	 * Allow a plugin to editorialize on the set of Paths for this base
*************** set_worktable_pathlist(PlannerInfo *root
*** 2228,2238 ****
   *		Gather Merge on top of a partial path.
   *
   * This must not be called until after we're done creating all partial paths
!  * for the specified relation.  (Otherwise, add_partial_path might delete a
!  * path that some GatherPath or GatherMergePath has a reference to.)
   */
  void
! generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
  {
  	Path	   *cheapest_partial_path;
  	Path	   *simple_gather_path;
--- 2232,2243 ----
   *		Gather Merge on top of a partial path.
   *
   * This must not be called until after we're done creating all partial paths
!  * for the specified relation (Otherwise, add_partial_path might delete a
!  * path that some GatherPath or GatherMergePath has a reference to.) and path
!  * target for top level scan/join node is available.
   */
  void
! generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
  {
  	Path	   *cheapest_partial_path;
  	Path	   *simple_gather_path;
*************** generate_gather_paths(PlannerInfo *root,
*** 2251,2256 ****
--- 2256,2266 ----
  	simple_gather_path = (Path *)
  		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
  						   NULL, NULL);
+ 
+ 	/* Add projection step if needed */
+ 	if (target && simple_gather_path->pathtarget != target)
+ 		simple_gather_path = apply_projection_to_path(root, rel, simple_gather_path, target);
+ 
  	add_path(rel, simple_gather_path);
  
  	/*
*************** generate_gather_paths(PlannerInfo *root,
*** 2260,2273 ****
  	foreach(lc, rel->partial_pathlist)
  	{
  		Path	   *subpath = (Path *) lfirst(lc);
! 		GatherMergePath *path;
  
  		if (subpath->pathkeys == NIL)
  			continue;
  
! 		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
! 										subpath->pathkeys, NULL, NULL);
! 		add_path(rel, &path->path);
  	}
  }
  
--- 2270,2287 ----
  	foreach(lc, rel->partial_pathlist)
  	{
  		Path	   *subpath = (Path *) lfirst(lc);
! 		Path	   *path;
  
  		if (subpath->pathkeys == NIL)
  			continue;
  
! 		path = (Path *) create_gather_merge_path(root, rel, subpath, rel->reltarget,
! 												 subpath->pathkeys, NULL, NULL);
! 		/* Add projection step if needed */
! 		if (target && path->pathtarget != target)
! 			path = apply_projection_to_path(root, rel, path, target);
! 
! 		add_path(rel, path);
  	}
  }
  
*************** standard_join_search(PlannerInfo *root, 
*** 2432,2439 ****
  		{
  			rel = (RelOptInfo *) lfirst(lc);
  
! 			/* Create GatherPaths for any useful partial paths for rel */
! 			generate_gather_paths(root, rel);
  
  			/* Find and save the cheapest paths for this rel */
  			set_cheapest(rel);
--- 2446,2458 ----
  		{
  			rel = (RelOptInfo *) lfirst(lc);
  
! 			/*
! 			 * Create GatherPaths for any useful partial paths for rel other
! 			 * than top-level rel.  The gather path for top-level rel is
! 			 * generated once path target is available.  See grouping_planner.
! 			 */
! 			if (lev < levels_needed)
! 				generate_gather_paths(root, rel, NULL);
  
  			/* Find and save the cheapest paths for this rel */
  			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
new file mode 100644
index 7f146d6..659f27d
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** grouping_planner(PlannerInfo *root, bool
*** 1860,1865 ****
--- 1860,1877 ----
  		}
  
  		/*
+ 		 * Consider ways to implement parallel paths.  We always skip
+ 		 * generating parallel path for top level scan/join nodes till the
+ 		 * pathtarget is computed.  This is to ensure that we can account for
+ 		 * the fact that most of the target evaluation work will be performed
+ 		 * in workers.
+ 		 */
+ 		generate_gather_paths(root, current_rel, scanjoin_target);
+ 
+ 		/* Set or update cheapest_total_path and related fields */
+ 		set_cheapest(current_rel);
+ 
+ 		/*
  		 * Upper planning steps which make use of the top scan/join rel's
  		 * partial pathlist will expect partial paths for that rel to produce
  		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
new file mode 100644
index 26567cb..41f98b3
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*************** apply_projection_to_path(PlannerInfo *ro
*** 2408,2423 ****
  		 * projection-capable, so as to avoid modifying the subpath in place.
  		 * It seems unlikely at present that there could be any other
  		 * references to the subpath, but better safe than sorry.
- 		 *
- 		 * Note that we don't change the GatherPath's cost estimates; it might
- 		 * be appropriate to do so, to reflect the fact that the bulk of the
- 		 * target evaluation will happen in workers.
  		 */
  		gpath->subpath = (Path *)
  			create_projection_path(root,
  								   gpath->subpath->parent,
  								   gpath->subpath,
  								   target);
  	}
  	else if (path->parallel_safe &&
  			 !is_parallel_safe(root, (Node *) target->exprs))
--- 2408,2434 ----
  		 * projection-capable, so as to avoid modifying the subpath in place.
  		 * It seems unlikely at present that there could be any other
  		 * references to the subpath, but better safe than sorry.
  		 */
  		gpath->subpath = (Path *)
  			create_projection_path(root,
  								   gpath->subpath->parent,
  								   gpath->subpath,
  								   target);
+ 
+ 		/*
+ 		 * Adjust the cost of GatherPath to reflect the fact that the bulk of
+ 		 * the target evaluation will happen in workers.
+ 		 */
+ 		if (((ProjectionPath *) gpath->subpath)->dummypp)
+ 		{
+ 			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+ 			path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
+ 		}
+ 		else
+ 		{
+ 			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+ 			path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * gpath->subpath->rows;
+ 		}
  	}
  	else if (path->parallel_safe &&
  			 !is_parallel_safe(root, (Node *) target->exprs))
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
new file mode 100644
index 4e06b2e..61694a0
*** a/src/include/optimizer/paths.h
--- b/src/include/optimizer/paths.h
*************** extern void set_dummy_rel_pathlist(RelOp
*** 53,59 ****
  extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
  					 List *initial_rels);
  
! extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
  extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
  						double index_pages);
  extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--- 53,60 ----
  extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
  					 List *initial_rels);
  
! extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
! 					  PathTarget *target);
  extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
  						double index_pages);
  extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
new file mode 100644
index 2ae600f..232a455
*** a/src/test/regress/expected/select_parallel.out
--- b/src/test/regress/expected/select_parallel.out
*************** explain (costs off)
*** 101,106 ****
--- 101,123 ----
           ->  Parallel Index Only Scan using tenk1_unique1 on tenk1
  (5 rows)
  
+ -- test that parallel plan gets selected when target list contains costly
+ -- function
+ create or replace function costly_func(var1 integer) returns integer
+ as $$
+ begin
+         return var1 + 10;
+ end;
+ $$ language plpgsql PARALLEL SAFE Cost 100000;
+ explain (costs off) select ten, costly_func(ten) from tenk1;
+             QUERY PLAN            
+ ----------------------------------
+  Gather
+    Workers Planned: 4
+    ->  Parallel Seq Scan on tenk1
+ (3 rows)
+ 
+ drop function costly_func(var1 integer);
  -- test parallel plans for queries containing un-correlated subplans.
  alter table tenk2 set (parallel_workers = 0);
  explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
new file mode 100644
index 89fe80a..2b8072d
*** a/src/test/regress/sql/select_parallel.sql
--- b/src/test/regress/sql/select_parallel.sql
*************** explain (costs off)
*** 39,44 ****
--- 39,55 ----
  	select  sum(parallel_restricted(unique1)) from tenk1
  	group by(parallel_restricted(unique1));
  
+ -- test that parallel plan gets selected when target list contains costly
+ -- function
+ create or replace function costly_func(var1 integer) returns integer
+ as $$
+ begin
+         return var1 + 10;
+ end;
+ $$ language plpgsql PARALLEL SAFE Cost 100000;
+ explain (costs off) select ten, costly_func(ten) from tenk1;
+ drop function costly_func(var1 integer);
+ 
  -- test parallel plans for queries containing un-correlated subplans.
  alter table tenk2 set (parallel_workers = 0);
  explain (costs off)

#42

amit.kapila16@gmail.com

over 8 years ago

In reply to: Jeff Janes (#41)

Re: why not parallel seq scan for slow functions

On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

The attached patch fixes both the review comments as discussed above.

that should be fixed by turning costs on the explain, as is the tradition.

Right. BTW, did you get a chance to run the original test (for which
you have reported the problem) with this patch?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

jeff.janes@gmail.com

over 8 years ago

In reply to: Amit Kapila (#42)

Re: why not parallel seq scan for slow functions

On Tue, Sep 19, 2017 at 9:15 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

The attached patch fixes both the review comments as discussed above.

that should be fixed by turning costs on the explain, as is the

tradition.

Right. BTW, did you get a chance to run the original test (for which
you have reported the problem) with this patch?

Yes, this patch makes it use a parallel scan, with great improvement. No
more having to \copy the data out, then run GNU split, then run my perl or
python program with GNU parallel on each file. Instead I just have to put
a pl/perl wrapper around the function.

(next up, how to put a "create temp table alsdkfjaslfdj as" in front of it
and keep it running in parallel)

Thanks,

Jeff

#44

amit.kapila16@gmail.com

about 8 years ago

In reply to: Jeff Janes (#43)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Thu, Sep 21, 2017 at 2:35 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Sep 19, 2017 at 9:15 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Wed, Sep 20, 2017 at 3:05 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Tue, Sep 19, 2017 at 1:17 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

The attached patch fixes both the review comments as discussed above.

that should be fixed by turning costs on the explain, as is the
tradition.

Right. BTW, did you get a chance to run the original test (for which
you have reported the problem) with this patch?

Yes, this patch makes it use a parallel scan, with great improvement.

Thanks for the confirmation. Find rebased patch attached.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v5.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v5.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 3cf268c..5ab53eb 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partition-wise joins. */
 				generate_partition_wise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				/*
+				 * Create GatherPaths for any useful partial paths for rel
+				 * other than top-level rel.  The gather path for top-level
+				 * rel is generated once path target is available.  See
+				 * grouping_planner.
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index a6efb4e..a225319 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -480,14 +480,18 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
-	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * If this is a baserel and not the top-level rel, consider gathering any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't do
+	 * this for otherrels.  Instead, we'll consider gathering partial paths
+	 * for the parent appendrel.).  The gather path for top-level rel is
+	 * generated once path target is available.  See grouping_planner.
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		root->simple_rel_array_size > 2 &&
+		!root->append_rel_list)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2288,11 +2292,12 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
- * path that some GatherPath or GatherMergePath has a reference to.)
+ * for the specified relation (Otherwise, add_partial_path might delete a
+ * path that some GatherPath or GatherMergePath has a reference to.) and path
+ * target for top level scan/join node is available.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2311,6 +2316,11 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, NULL);
+
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root, rel, simple_gather_path, target);
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2320,14 +2330,18 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	   *path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, NULL);
-		add_path(rel, &path->path);
+		path = (Path *) create_gather_merge_path(root, rel, subpath, rel->reltarget,
+												 subpath->pathkeys, NULL, NULL);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2498,8 +2512,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partition-wise joins. */
 			generate_partition_wise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			/*
+			 * Create GatherPaths for any useful partial paths for rel other
+			 * than top-level rel.  The gather path for top-level rel is
+			 * generated once path target is available.  See grouping_planner.
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d58635c..1b08e16 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1892,6 +1892,18 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * Consider ways to implement parallel paths.  We always skip
+		 * generating parallel path for top level scan/join nodes till the
+		 * pathtarget is computed.  This is to ensure that we can account for
+		 * the fact that most of the target evaluation work will be performed
+		 * in workers.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/* Set or update cheapest_total_path and related fields */
+		set_cheapest(current_rel);
+
+		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
 		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 36ec025..17320f5 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2422,16 +2422,27 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the GatherPath's cost estimates; it might
-		 * be appropriate to do so, to reflect the fact that the bulk of the
-		 * target evaluation will happen in workers.
 		 */
 		gpath->subpath = (Path *)
 			create_projection_path(root,
 								   gpath->subpath->parent,
 								   gpath->subpath,
 								   target);
+
+		/*
+		 * Adjust the cost of GatherPath to reflect the fact that the bulk of
+		 * the target evaluation will happen in workers.
+		 */
+		if (((ProjectionPath *) gpath->subpath)->dummypp)
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
+		}
+		else
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * gpath->subpath->rows;
+		}
 	}
 	else if (path->parallel_safe &&
 			 !is_parallel_safe(root, (Node *) target->exprs))
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index ea886b6..8f7f6fe 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index ac9ad06..79d5502 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -121,6 +121,23 @@ execute tenk1_count(1);
 (1 row)
 
 deallocate tenk1_count;
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+            QUERY PLAN            
+----------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 495f033..3544b43 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -45,6 +45,17 @@ explain (costs off) execute tenk1_count(1);
 execute tenk1_count(1);
 deallocate tenk1_count;
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#45

robertmhaas@gmail.com

about 8 years ago

In reply to: Amit Kapila (#44)

Re: why not parallel seq scan for slow functions

On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for the confirmation. Find rebased patch attached.

This looks like it's on the right track to me. I hope Tom will look
into it, but if he doesn't I may try to get it committed myself.

-    if (rel->reloptkind == RELOPT_BASEREL)
-        generate_gather_paths(root, rel);
+    if (rel->reloptkind == RELOPT_BASEREL &&
+        root->simple_rel_array_size > 2 &&
+        !root->append_rel_list)

This test doesn't look correct to me. Actually, it doesn't look
anywhere close to correct to me. So, one of us is very confused...
not sure whether it's you or me.

     simple_gather_path = (Path *)
         create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
                            NULL, NULL);
+
+    /* Add projection step if needed */
+    if (target && simple_gather_path->pathtarget != target)
+        simple_gather_path = apply_projection_to_path(root, rel,
simple_gather_path, target);

Instead of using apply_projection_to_path, why not pass the correct
reltarget to create_gather_path?

+        /* Set or update cheapest_total_path and related fields */
+        set_cheapest(current_rel);

I wonder if it's really OK to call set_cheapest() a second time for
the same relation...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Robert Haas (#45)

Re: why not parallel seq scan for slow functions

Robert Haas <robertmhaas@gmail.com> writes:

This looks like it's on the right track to me. I hope Tom will look
into it, but if he doesn't I may try to get it committed myself.

I do plan to take a look at it during this CF.

+        /* Set or update cheapest_total_path and related fields */
+        set_cheapest(current_rel);

I wonder if it's really OK to call set_cheapest() a second time for
the same relation...

It's safe enough, we do it in some places already when converting
a relation to dummy. But having to do that in a normal code path
suggests that something's not right about the design ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

[1]: /messages/by-id/CAA4eK1JUvL9WS9z=5hjSuSMNCo3TdBxFa0pA=E__E=p6iUffUQ@mail.gmail.com

amit.kapila16@gmail.com

about 8 years ago

In reply to: Robert Haas (#45)

Re: why not parallel seq scan for slow functions

On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Nov 5, 2017 at 12:57 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Thanks for the confirmation. Find rebased patch attached.

This looks like it's on the right track to me. I hope Tom will look
into it, but if he doesn't I may try to get it committed myself.
-    if (rel->reloptkind == RELOPT_BASEREL)
-        generate_gather_paths(root, rel);
+    if (rel->reloptkind == RELOPT_BASEREL &&
+        root->simple_rel_array_size > 2 &&
+        !root->append_rel_list)
This test doesn't look correct to me. Actually, it doesn't look
anywhere close to correct to me. So, one of us is very confused...
not sure whether it's you or me.

It is quite possible that I haven't got it right, but it shouldn't be
completely bogus as it stands the regression tests and some manual
verification. Can you explain what is your concern about this test?

simple_gather_path = (Path *)
create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
NULL, NULL);
+
+    /* Add projection step if needed */
+    if (target && simple_gather_path->pathtarget != target)
+        simple_gather_path = apply_projection_to_path(root, rel,
simple_gather_path, target);

Instead of using apply_projection_to_path, why not pass the correct
reltarget to create_gather_path?

We need to push it to gather's subpath as is done in
apply_projection_to_path and then we have to cost it accordingly. I
think if we don't use apply_projection_to_path then we might end up
with much of the code similar to it in generate_gather_paths. In
fact, I have tried something similar to what you are suggesting in the
first version of patch [1]/messages/by-id/CAA4eK1JUvL9WS9z=5hjSuSMNCo3TdBxFa0pA=E__E=p6iUffUQ@mail.gmail.com and it didn't turn out to be clean. Also,
I think we already do something similar in create_ordered_paths.

+        /* Set or update cheapest_total_path and related fields */
+        set_cheapest(current_rel);
I wonder if it's really OK to call set_cheapest() a second time for
the same relation...

I think if we want we can avoid it by checking whether we have
generated any gather path for the relation (basically, check if it has
partial path list). Another idea could be that we consider the
generation of gather/gathermerge for top-level scan/join relation as a
separate step and generate a new kind of upper rel for it which will
be a mostly dummy but will have paths for gather/gathermerge.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

robertmhaas@gmail.com

about 8 years ago

In reply to: Amit Kapila (#47)

Re: why not parallel seq scan for slow functions

On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
This looks like it's on the right track to me. I hope Tom will look
into it, but if he doesn't I may try to get it committed myself.
-    if (rel->reloptkind == RELOPT_BASEREL)
-        generate_gather_paths(root, rel);
+    if (rel->reloptkind == RELOPT_BASEREL &&
+        root->simple_rel_array_size > 2 &&
+        !root->append_rel_list)
This test doesn't look correct to me. Actually, it doesn't look
anywhere close to correct to me. So, one of us is very confused...
not sure whether it's you or me.
It is quite possible that I haven't got it right, but it shouldn't be
completely bogus as it stands the regression tests and some manual
verification. Can you explain what is your concern about this test?

Well, I suppose that test will fire for a baserel when the total
number of baserels is at least 3 and there's no inheritance involved.
But if there are 2 baserels, we're still not the topmost scan/join
target. Also, even if inheritance is used, we might still be the
topmost scan/join target.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

amit.kapila16@gmail.com

about 8 years ago

In reply to: Robert Haas (#48)

Re: why not parallel seq scan for slow functions

On Mon, Nov 6, 2017 at 7:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Nov 6, 2017 at 11:20 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Nov 6, 2017 at 3:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:
This looks like it's on the right track to me. I hope Tom will look
into it, but if he doesn't I may try to get it committed myself.
-    if (rel->reloptkind == RELOPT_BASEREL)
-        generate_gather_paths(root, rel);
+    if (rel->reloptkind == RELOPT_BASEREL &&
+        root->simple_rel_array_size > 2 &&
+        !root->append_rel_list)
This test doesn't look correct to me. Actually, it doesn't look
anywhere close to correct to me. So, one of us is very confused...
not sure whether it's you or me.
It is quite possible that I haven't got it right, but it shouldn't be
completely bogus as it stands the regression tests and some manual
verification. Can you explain what is your concern about this test?
Well, I suppose that test will fire for a baserel when the total
number of baserels is at least 3 and there's no inheritance involved.
But if there are 2 baserels, we're still not the topmost scan/join
target.

No, if there are 2 baserels, then simple_rel_array_size will be 3.
The simple_rel_array_size is always the "number of relations" plus
"one". See setup_simple_rel_arrays.

Also, even if inheritance is used, we might still be the
topmost scan/join target.

Sure, but in that case, it won't generate the gather path here (due to
this part of check "!root->append_rel_list"). I am not sure whether I
have understood the second part of your question, so if my answer
appears inadequate, then can you provide more details on what you are
concerned about?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

robertmhaas@gmail.com

about 8 years ago

In reply to: Amit Kapila (#49)

Re: why not parallel seq scan for slow functions

On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Well, I suppose that test will fire for a baserel when the total
number of baserels is at least 3 and there's no inheritance involved.
But if there are 2 baserels, we're still not the topmost scan/join
target.

No, if there are 2 baserels, then simple_rel_array_size will be 3.
The simple_rel_array_size is always the "number of relations" plus
"one". See setup_simple_rel_arrays.

Oh, wow. That's surprising.

Also, even if inheritance is used, we might still be the
topmost scan/join target.

Sure, but in that case, it won't generate the gather path here (due to
this part of check "!root->append_rel_list"). I am not sure whether I
have understood the second part of your question, so if my answer
appears inadequate, then can you provide more details on what you are
concerned about?

I don't know why the question of why root->append_rel_list is empty is
the relevant thing here for deciding whether to generate gather paths
at this point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51

amit.kapila16@gmail.com

about 8 years ago

In reply to: Robert Haas (#50)

Re: why not parallel seq scan for slow functions

On Wed, Nov 8, 2017 at 2:51 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Nov 6, 2017 at 9:57 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Also, even if inheritance is used, we might still be the
topmost scan/join target.

Sure, but in that case, it won't generate the gather path here (due to
this part of check "!root->append_rel_list"). I am not sure whether I
have understood the second part of your question, so if my answer
appears inadequate, then can you provide more details on what you are
concerned about?

I don't know why the question of why root->append_rel_list is empty is
the relevant thing here for deciding whether to generate gather paths
at this point.

This is required to prohibit generating gather path for top rel in
case of inheritence (Append node) at this place (we want to generate
it later when scan/join target is available). For such a case, the
reloptkind will be RELOPT_BASEREL and simple_rel_array_size will be
greater than two as it includes child rels as well. So, the check for
root->append_rel_list will prohibit generating gather path for such a
rel. Now, for all the child rels of Append, the reloptkind will be
RELOPT_OTHER_MEMBER_REL, so it won't generate gather path here because
of the first part of check (rel->reloptkind == RELOPT_BASEREL).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

robertmhaas@gmail.com

about 8 years ago

In reply to: Amit Kapila (#51)

Re: why not parallel seq scan for slow functions

On Tue, Nov 7, 2017 at 9:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

This is required to prohibit generating gather path for top rel in
case of inheritence (Append node) at this place (we want to generate
it later when scan/join target is available).

OK, but why don't we want to generate it later when there isn't
inheritance involved?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53

amit.kapila16@gmail.com

about 8 years ago

In reply to: Robert Haas (#52)

Re: why not parallel seq scan for slow functions

On Wed, Nov 8, 2017 at 4:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 7, 2017 at 9:41 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

This is required to prohibit generating gather path for top rel in
case of inheritence (Append node) at this place (we want to generate
it later when scan/join target is available).

OK, but why don't we want to generate it later when there isn't
inheritance involved?

We do want to generate it later when there isn't inheritance involved,
but only if there is a single rel involved (simple_rel_array_size
<=2). The rule is something like this, we will generate the gather
paths at this stage only if there are more than two rels involved and
there isn't inheritance involved.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

robertmhaas@gmail.com

about 8 years ago

In reply to: Amit Kapila (#53)

Re: why not parallel seq scan for slow functions

On Wed, Nov 8, 2017 at 7:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

We do want to generate it later when there isn't inheritance involved,
but only if there is a single rel involved (simple_rel_array_size
<=2). The rule is something like this, we will generate the gather
paths at this stage only if there are more than two rels involved and
there isn't inheritance involved.

Why is that the correct rule?

Sorry if I'm being dense here. I would have thought we'd want to skip
it for the topmost scan/join rel regardless of the presence or absence
of inheritance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

amit.kapila16@gmail.com

about 8 years ago

In reply to: Robert Haas (#54)

1 attachment(s)

Re: why not parallel seq scan for slow functions

On Wed, Nov 8, 2017 at 6:48 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 8, 2017 at 7:26 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

We do want to generate it later when there isn't inheritance involved,
but only if there is a single rel involved (simple_rel_array_size
<=2). The rule is something like this, we will generate the gather
paths at this stage only if there are more than two rels involved and
there isn't inheritance involved.

Why is that the correct rule?

Sorry if I'm being dense here. I would have thought we'd want to skip
it for the topmost scan/join rel regardless of the presence or absence
of inheritance.

I think I understood your concern after some offlist discussion and it
is primarily due to the inheritance related check which can skip the
generation of gather paths when it shouldn't. So what might fit
better here is a straight check on the number of base rels such that
allow generating gather path in set_rel_pathlist, if there are
multiple baserels involved. I have used all_baserels which I think
will work better for this purpose.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v6.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v6.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 3cf268c..5ab53eb 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partition-wise joins. */
 				generate_partition_wise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				/*
+				 * Create GatherPaths for any useful partial paths for rel
+				 * other than top-level rel.  The gather path for top-level
+				 * rel is generated once path target is available.  See
+				 * grouping_planner.
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 906d08a..f6336fd 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -480,14 +480,19 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
-	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * If this is a baserel and not the top-level rel, consider gathering any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't do
+	 * this for otherrels.  Instead, we'll consider gathering partial paths
+	 * for the parent appendrel.).  We can check for joins by counting the
+	 * membership of all_baserels (note that this correctly counts inheritance
+	 * trees as single rels).  The gather path for top-level rel is generated
+	 * once path target is available.  See grouping_planner.
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2288,11 +2293,12 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
- * path that some GatherPath or GatherMergePath has a reference to.)
+ * for the specified relation (Otherwise, add_partial_path might delete a
+ * path that some GatherPath or GatherMergePath has a reference to.) and path
+ * target for top level scan/join node is available.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2311,6 +2317,11 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, NULL);
+
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root, rel, simple_gather_path, target);
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2320,14 +2331,18 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	   *path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, NULL);
-		add_path(rel, &path->path);
+		path = (Path *) create_gather_merge_path(root, rel, subpath, rel->reltarget,
+												 subpath->pathkeys, NULL, NULL);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2498,8 +2513,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partition-wise joins. */
 			generate_partition_wise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			/*
+			 * Create GatherPaths for any useful partial paths for rel other
+			 * than top-level rel.  The gather path for top-level rel is
+			 * generated once path target is available.  See grouping_planner.
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9b7a8fd..aaf5a97 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1892,6 +1892,18 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * Consider ways to implement parallel paths.  We always skip
+		 * generating parallel path for top level scan/join nodes till the
+		 * pathtarget is computed.  This is to ensure that we can account for
+		 * the fact that most of the target evaluation work will be performed
+		 * in workers.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/* Set or update cheapest_total_path and related fields */
+		set_cheapest(current_rel);
+
+		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
 		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 36ec025..17320f5 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2422,16 +2422,27 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the GatherPath's cost estimates; it might
-		 * be appropriate to do so, to reflect the fact that the bulk of the
-		 * target evaluation will happen in workers.
 		 */
 		gpath->subpath = (Path *)
 			create_projection_path(root,
 								   gpath->subpath->parent,
 								   gpath->subpath,
 								   target);
+
+		/*
+		 * Adjust the cost of GatherPath to reflect the fact that the bulk of
+		 * the target evaluation will happen in workers.
+		 */
+		if (((ProjectionPath *) gpath->subpath)->dummypp)
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * gpath->subpath->rows;
+		}
+		else
+		{
+			path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+			path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * gpath->subpath->rows;
+		}
 	}
 	else if (path->parallel_safe &&
 			 !is_parallel_safe(root, (Node *) target->exprs))
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index ea886b6..8f7f6fe 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index ac9ad06..79d5502 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -121,6 +121,23 @@ execute tenk1_count(1);
 (1 row)
 
 deallocate tenk1_count;
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+            QUERY PLAN            
+----------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index 495f033..3544b43 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -45,6 +45,17 @@ explain (costs off) execute tenk1_count(1);
 execute tenk1_count(1);
 deallocate tenk1_count;
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#56

robertmhaas@gmail.com

about 8 years ago

In reply to: Amit Kapila (#55)

Re: why not parallel seq scan for slow functions

On Thu, Nov 9, 2017 at 3:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think I understood your concern after some offlist discussion and it
is primarily due to the inheritance related check which can skip the
generation of gather paths when it shouldn't. So what might fit
better here is a straight check on the number of base rels such that
allow generating gather path in set_rel_pathlist, if there are
multiple baserels involved. I have used all_baserels which I think
will work better for this purpose.

Yes, that looks a lot more likely to be correct.

Let's see what Tom thinks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Robert Haas (#56)

Re: [HACKERS] why not parallel seq scan for slow functions

On Fri, Nov 10, 2017 at 4:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Nov 9, 2017 at 3:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think I understood your concern after some offlist discussion and it
is primarily due to the inheritance related check which can skip the
generation of gather paths when it shouldn't. So what might fit
better here is a straight check on the number of base rels such that
allow generating gather path in set_rel_pathlist, if there are
multiple baserels involved. I have used all_baserels which I think
will work better for this purpose.

Yes, that looks a lot more likely to be correct.

Let's see what Tom thinks.

Moved to next CF for extra reviews.
--
Michael

#58

Marina Polyakova

m.polyakova@postgrespro.ru

about 8 years ago

In reply to: Michael Paquier (#57)

Re: [HACKERS] why not parallel seq scan for slow functions

Hello everyone in this thread!

On 29-11-2017 8:01, Michael Paquier wrote:

Moved to next CF for extra reviews.

Amit, I would like to ask some questions about your patch (and can you
please rebase it on the top of the master?):

1)
+ path->total_cost -= (target->cost.per_tuple - oldcost.per_tuple) *
path->rows;

Here we undo the changes that we make in this function earlier:

path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Perhaps we should not start these "reversible" changes in this case from
the very beginning?

2)
  		gpath->subpath = (Path *)
  			create_projection_path(root,
  								   gpath->subpath->parent,
  								   gpath->subpath,
  								   target);
...
+		if (((ProjectionPath *) gpath->subpath)->dummypp)
+		{
...
+			path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * 
gpath->subpath->rows;
+		}
+		else
+		{
...
+			path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * 
gpath->subpath->rows;
+		}

As I understand it, here in the if-else block we change the run costs of
gpath in the same way as they were changed for its subpath in the
function create_projection_path earlier. But for the startup costs we
always subtract the cost of the old target:

path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Should we change the startup costs of gpath in this way if
((ProjectionPath *) gpath->subpath)->dummypp is false?

3)
  	simple_gather_path = (Path *)
  		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
  						   NULL, NULL);
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root, rel, 
simple_gather_path, target);
...
+		path = (Path *) create_gather_merge_path(root, rel, subpath, 
rel->reltarget,
+												 subpath->pathkeys, NULL, NULL);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
...
@@ -2422,16 +2422,27 @@ apply_projection_to_path(PlannerInfo *root,
...
  		gpath->subpath = (Path *)
  			create_projection_path(root,
  								   gpath->subpath->parent,
  								   gpath->subpath,
  								   target);

The target is changing so we change it for the gather(merge) node and
for its subpath. Do not we have to do this work (replace the subpath by
calling the function create_projection_path if the target is different)
in the functions create_gather(_merge)_path too? I suppose that the
target of the subpath affects its costs => the costs of the
gather(merge) node in the functions cost_gather(_merge) (=> the costs of
the gather(merge) node in the function apply_projection_to_path).

4)
+		 * Consider ways to implement parallel paths.  We always skip
+		 * generating parallel path for top level scan/join nodes till the
+		 * pathtarget is computed.  This is to ensure that we can account for
+		 * the fact that most of the target evaluation work will be performed
+		 * in workers.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/* Set or update cheapest_total_path and related fields */
+		set_cheapest(current_rel);

As I understand it (if correctly, thank the comments of Robert Haas and
Tom Lane :), after that we cannot use the function
apply_projection_to_path for paths in the current_rel->pathlist without
risking that the cheapest path will change. And we have several calls to
the function adjust_paths_for_srfs (which uses apply_projection_to_path
for paths in the current_rel->pathlist) in grouping_planner after
generating the gather paths:

/* Now fix things up if scan/join target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
scanjoin_targets,
scanjoin_targets_contain_srfs);
...
/* Fix things up if grouping_target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
grouping_targets,
grouping_targets_contain_srfs);
...
/* Fix things up if sort_input_target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
sort_input_targets,
sort_input_targets_contain_srfs);
...
/* Fix things up if final_target contains SRFs */
if (parse->hasTargetSRFs)
adjust_paths_for_srfs(root, current_rel,
final_targets,
final_targets_contain_srfs);

Maybe we should add the appropriate call to the function set_cheapest()
after this if parse->hasTargetSRFs is true?

5)
+	 * If this is a baserel and not the top-level rel, consider gathering 
any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't 
do
+	 * this for otherrels.  Instead, we'll consider gathering partial 
paths
+	 * for the parent appendrel.).  We can check for joins by counting the
+	 * membership of all_baserels (note that this correctly counts 
inheritance
+	 * trees as single rels).  The gather path for top-level rel is 
generated
+	 * once path target is available.  See grouping_planner.
  	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
+		generate_gather_paths(root, rel, NULL);

Do I understand correctly that here there's an assumption about
root->query_level (which we don't need to check)?

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#59

amit.kapila16@gmail.com

about 8 years ago

In reply to: Marina Polyakova (#58)

1 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Fri, Dec 29, 2017 at 7:56 PM, Marina Polyakova
<m.polyakova@postgrespro.ru> wrote:

Hello everyone in this thread!

On 29-11-2017 8:01, Michael Paquier wrote:

Moved to next CF for extra reviews.

Amit, I would like to ask some questions about your patch (and can you
please rebase it on the top of the master?):

Thanks for looking into the patch.

1)
+ path->total_cost -= (target->cost.per_tuple -
oldcost.per_tuple) * path->rows;

Here we undo the changes that we make in this function earlier:

path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Perhaps we should not start these "reversible" changes in this case from the
very beginning?

We can do that way as well, however, when the patch was written we
don't have GatherMerge handling in that function due to which it
appears to me that changing the code the way you are suggesting will
complicate the code, but now it looks saner to do it that way. I have
changed the code accordingly.

2)
gpath->subpath = (Path *)
create_projection_path(root,

gpath->subpath->parent,
gpath->subpath,
target);
...
+               if (((ProjectionPath *) gpath->subpath)->dummypp)
+               {
...
+                       path->total_cost += (target->cost.per_tuple -
oldcost.per_tuple) * gpath->subpath->rows;
+               }
+               else
+               {
...
+                       path->total_cost += (cpu_tuple_cost +
target->cost.per_tuple) * gpath->subpath->rows;
+               }
As I understand it, here in the if-else block we change the run costs of
gpath in the same way as they were changed for its subpath in the function
create_projection_path earlier. But for the startup costs we always subtract
the cost of the old target:

path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

Should we change the startup costs of gpath in this way if ((ProjectionPath
*) gpath->subpath)->dummypp is false?

The startup costs will be computed in the way you mentioned for both
gpath and gmpath as that code is executed before we do any handling
for Gather or GatherMerge path.

3)
simple_gather_path = (Path *)
create_gather_path(root, rel, cheapest_partial_path,
rel->reltarget,
NULL, NULL);
+       /* Add projection step if needed */
+       if (target && simple_gather_path->pathtarget != target)
+               simple_gather_path = apply_projection_to_path(root, rel,
simple_gather_path, target);
...
+               path = (Path *) create_gather_merge_path(root, rel, subpath,
rel->reltarget,
+
subpath->pathkeys, NULL, NULL);
+               /* Add projection step if needed */
+               if (target && path->pathtarget != target)
+                       path = apply_projection_to_path(root, rel, path,
target);
...
@@ -2422,16 +2422,27 @@ apply_projection_to_path(PlannerInfo *root,
...
gpath->subpath = (Path *)
create_projection_path(root,
gpath->subpath->parent,

gpath->subpath,
target);

The target is changing so we change it for the gather(merge) node and for
its subpath. Do not we have to do this work (replace the subpath by calling
the function create_projection_path if the target is different) in the
functions create_gather(_merge)_path too? I suppose that the target of the
subpath affects its costs => the costs of the gather(merge) node in the
functions cost_gather(_merge) (=> the costs of the gather(merge) node in the
function apply_projection_to_path).

The cost impact due to different target will be taken care when we
call apply_projection_to_path. I am not sure if I understand your
question completely, but do you see anything in functions
cost_gather(_merge) which is suspicious and the target list costing
can impact the same. Generally, we compute the target list cost in
the later stage of planning once the path target is formed.

4)
+                * Consider ways to implement parallel paths.  We always
skip
+                * generating parallel path for top level scan/join nodes
till the
+                * pathtarget is computed.  This is to ensure that we can
account for
+                * the fact that most of the target evaluation work will be
performed
+                * in workers.
+                */
+               generate_gather_paths(root, current_rel, scanjoin_target);
+
+               /* Set or update cheapest_total_path and related fields */
+               set_cheapest(current_rel);
As I understand it (if correctly, thank the comments of Robert Haas and Tom
Lane :), after that we cannot use the function apply_projection_to_path for
paths in the current_rel->pathlist without risking that the cheapest path
will change. And we have several calls to the function adjust_paths_for_srfs
(which uses apply_projection_to_path for paths in the current_rel->pathlist)
in grouping_planner after generating the gather paths:

I think in general that is true even without this patch and without
having parallel paths. The main thing we are trying to cover with
this patch is that we try to generate parallel paths for top-level rel
after path target is computed so that the costing can take into
account the fact that bulk of target list evaluation can be done in
workers. I think adjust_paths_for_srfs can impact costing for
parallel paths if the ProjectSet nodes are allowed to be pushed to
workers, but I don't see that happening (See
create_set_projection_path.). If by any chance, you have an example
test to show the problem due to the point you have mentioned, then it
can be easier to see whether it can impact the selection of parallel
paths?

5)
+        * If this is a baserel and not the top-level rel, consider
gathering any
+        * partial paths we may have created for it.  (If we tried to gather
+        * inheritance children, we could end up with a very large number of
+        * gather nodes, each trying to grab its own pool of workers, so
don't do
+        * this for otherrels.  Instead, we'll consider gathering partial
paths
+        * for the parent appendrel.).  We can check for joins by counting
the
+        * membership of all_baserels (note that this correctly counts
inheritance
+        * trees as single rels).  The gather path for top-level rel is
generated
+        * once path target is available.  See grouping_planner.
*/
-       if (rel->reloptkind == RELOPT_BASEREL)
-               generate_gather_paths(root, rel);
+       if (rel->reloptkind == RELOPT_BASEREL &&
+               bms_membership(root->all_baserels) != BMS_SINGLETON)
+               generate_gather_paths(root, rel, NULL);

Do I understand correctly that here there's an assumption about
root->query_level (which we don't need to check)?

I don't think we need to check query_level, but can you please be more
explicit about the point you have in mind? Remember that we never
generate gather path atop any other gather path, if that is the
assumption you have in mind then we don't need any extra check for the
same.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v7.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v7.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 3cf268c..5ab53eb 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partition-wise joins. */
 				generate_partition_wise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				/*
+				 * Create GatherPaths for any useful partial paths for rel
+				 * other than top-level rel.  The gather path for top-level
+				 * rel is generated once path target is available.  See
+				 * grouping_planner.
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 0e8463e..8357f55 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,19 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
-	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * If this is a baserel and not the top-level rel, consider gathering any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't do
+	 * this for otherrels.  Instead, we'll consider gathering partial paths
+	 * for the parent appendrel.).  We can check for joins by counting the
+	 * membership of all_baserels (note that this correctly counts inheritance
+	 * trees as single rels).  The gather path for top-level rel is generated
+	 * once path target is available.  See grouping_planner.
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2440,11 +2445,12 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
- * path that some GatherPath or GatherMergePath has a reference to.)
+ * for the specified relation (Otherwise, add_partial_path might delete a
+ * path that some GatherPath or GatherMergePath has a reference to.) and path
+ * target for top level scan/join node is available.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2463,6 +2469,11 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, NULL);
+
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root, rel, simple_gather_path, target);
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2472,14 +2483,18 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	   *path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, NULL);
-		add_path(rel, &path->path);
+		path = (Path *) create_gather_merge_path(root, rel, subpath, rel->reltarget,
+												 subpath->pathkeys, NULL, NULL);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2650,8 +2665,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partition-wise joins. */
 			generate_partition_wise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			/*
+			 * Create GatherPaths for any useful partial paths for rel other
+			 * than top-level rel.  The gather path for top-level rel is
+			 * generated once path target is available.  See grouping_planner.
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 382791f..4ec1673 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1895,6 +1895,18 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * Consider ways to implement parallel paths.  We always skip
+		 * generating parallel path for top level scan/join nodes till the
+		 * pathtarget is computed.  This is to ensure that we can account for
+		 * the fact that most of the target evaluation work will be performed
+		 * in workers.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/* Set or update cheapest_total_path and related fields */
+		set_cheapest(current_rel);
+
+		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
 		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 2aee156..7bf420b 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2448,6 +2448,8 @@ apply_projection_to_path(PlannerInfo *root,
 						 PathTarget *target)
 {
 	QualCost	oldcost;
+	double		nrows;
+	bool		resultPath = false;
 
 	/*
 	 * If given path can't project, we might need a Result node, so make a
@@ -2458,14 +2460,16 @@ apply_projection_to_path(PlannerInfo *root,
 
 	/*
 	 * We can just jam the desired tlist into the existing path, being sure to
-	 * update its cost estimates appropriately.
+	 * update its cost estimates appropriately.  Also, ensure that the cost
+	 * estimates reflects the fact that the target list evaluation will happen
+	 * in workers if path is a Gather or GatherMerge path.
 	 */
 	oldcost = path->pathtarget->cost;
 	path->pathtarget = target;
 
+	nrows = path->rows;
 	path->startup_cost += target->cost.startup - oldcost.startup;
-	path->total_cost += target->cost.startup - oldcost.startup +
-		(target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+	path->total_cost += target->cost.startup - oldcost.startup;
 
 	/*
 	 * If the path happens to be a Gather or GatherMerge path, we'd like to
@@ -2481,10 +2485,6 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the parallel path's cost estimates; it
-		 * might be appropriate to do so, to reflect the fact that the bulk of
-		 * the target evaluation will happen in workers.
 		 */
 		if (IsA(path, GatherPath))
 		{
@@ -2495,6 +2495,10 @@ apply_projection_to_path(PlannerInfo *root,
 									   gpath->subpath->parent,
 									   gpath->subpath,
 									   target);
+
+			nrows = gpath->subpath->rows;
+			if (!((ProjectionPath *) gpath->subpath)->dummypp)
+				resultPath = true;
 		}
 		else
 		{
@@ -2505,6 +2509,10 @@ apply_projection_to_path(PlannerInfo *root,
 									   gmpath->subpath->parent,
 									   gmpath->subpath,
 									   target);
+
+			nrows = gmpath->subpath->rows;
+			if (!((ProjectionPath *) gmpath->subpath)->dummypp)
+				resultPath = true;
 		}
 	}
 	else if (path->parallel_safe &&
@@ -2518,6 +2526,20 @@ apply_projection_to_path(PlannerInfo *root,
 		path->parallel_safe = false;
 	}
 
+	/*
+	 * Update the cost estimates based on whether Result node is required. See
+	 * create_projection_path.
+	 */
+	if (resultPath)
+	{
+		Assert (IsA(path, GatherPath) || IsA(path, GatherMergePath));
+		path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * nrows;
+	}
+	else
+	{
+		path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * nrows;
+	}
+
 	return path;
 }
 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index ea886b6..8f7f6fe 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 7824ca5..e95a2c4 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -251,6 +251,23 @@ execute tenk1_count(1);
 (1 row)
 
 deallocate tenk1_count;
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+            QUERY PLAN            
+----------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index b12ba0b..0f95b63 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -91,6 +91,17 @@ explain (costs off) execute tenk1_count(1);
 execute tenk1_count(1);
 deallocate tenk1_count;
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#60

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#59)

Re: [HACKERS] why not parallel seq scan for slow functions

On Tue, Jan 2, 2018 at 6:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

[ new patch ]

I think that grouping_planner() could benefit from a slightly more
extensive rearrangement. With your patch applied, the order of
operations is:

1. compute the scan/join target
2. apply the scan/join target to all paths in current_rel's pathlist
3. generate gather paths, possibly adding more stuff to current_rel's pathlist
4. rerun set_cheapest
5. apply the scan/join target, if parallel safe, to all paths in the
current rel's partial_pathlist, for the benefit of upper planning
steps
6. clear the partial pathlist if the target list is not parallel safe

I at first thought this was outright broken because step #3 imposes
the scan/join target without testing it for parallel-safety, but then
I realized that generate_gather_paths will apply that target list by
using apply_projection_to_path, which makes an is_parallel_safe test
of its own. But it doesn't seem good for step 3 to test the
parallel-safety of the target list separately for each path and then
have grouping_planner do it one more time for the benefit of upper
planning steps. Instead, I suggest that we try to get rid of the
logic in apply_projection_to_path that knows about Gather and Gather
Merge specifically. I think we can do that if grouping_planner does
this:

1. compute the scan/join target
2. apply the scan/join target, if parallel safe, to all paths in the
current rel's partial_pathlist
3. generate gather paths
4. clear the partial pathlist if the target list is not parallel safe
5. apply the scan/join target to all paths in current_rel's pathlist
6. rerun set_cheapest

That seems like a considerably more logical order of operations. It
avoids not only the expense of testing the scanjoin_target for
parallel-safety multiple times, but the ugliness of having
apply_projection_to_path know about Gather and Gather Merge as a
special case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#61

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#60)

Re: [HACKERS] why not parallel seq scan for slow functions

On Sat, Jan 27, 2018 at 2:50 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 2, 2018 at 6:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

[ new patch ]

I think that grouping_planner() could benefit from a slightly more
extensive rearrangement. With your patch applied, the order of
operations is:

1. compute the scan/join target
2. apply the scan/join target to all paths in current_rel's pathlist
3. generate gather paths, possibly adding more stuff to current_rel's pathlist
4. rerun set_cheapest
5. apply the scan/join target, if parallel safe, to all paths in the
current rel's partial_pathlist, for the benefit of upper planning
steps
6. clear the partial pathlist if the target list is not parallel safe

I at first thought this was outright broken because step #3 imposes
the scan/join target without testing it for parallel-safety, but then
I realized that generate_gather_paths will apply that target list by
using apply_projection_to_path, which makes an is_parallel_safe test
of its own. But it doesn't seem good for step 3 to test the
parallel-safety of the target list separately for each path and then
have grouping_planner do it one more time for the benefit of upper
planning steps. Instead, I suggest that we try to get rid of the
logic in apply_projection_to_path that knows about Gather and Gather
Merge specifically. I think we can do that if grouping_planner does
this:

1. compute the scan/join target
2. apply the scan/join target, if parallel safe, to all paths in the
current rel's partial_pathlist
3. generate gather paths
4. clear the partial pathlist if the target list is not parallel safe
5. apply the scan/join target to all paths in current_rel's pathlist
6. rerun set_cheapest

That seems like a considerably more logical order of operations. It
avoids not only the expense of testing the scanjoin_target for
parallel-safety multiple times, but the ugliness of having
apply_projection_to_path know about Gather and Gather Merge as a
special case.

If we want to get rid of Gather (Merge) checks in
apply_projection_to_path(), then we need some way to add a projection
path to the subpath of gather node even if that is capable of
projection as we do now. I think changing the order of applying
scan/join target won't address that unless we decide to do it for
every partial path. Another way could be that we handle that in
generate_gather_paths, but I think that won't be the idle place to add
projection.

If we want, we can compute the parallel-safety of scan/join target
once in grouping_planner and then pass it in apply_projection_to_path
to address your main concern.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#62

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#61)

1 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Sun, Jan 28, 2018 at 10:13 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

If we want to get rid of Gather (Merge) checks in
apply_projection_to_path(), then we need some way to add a projection
path to the subpath of gather node even if that is capable of
projection as we do now. I think changing the order of applying
scan/join target won't address that unless we decide to do it for
every partial path. Another way could be that we handle that in
generate_gather_paths, but I think that won't be the idle place to add
projection.

If we want, we can compute the parallel-safety of scan/join target
once in grouping_planner and then pass it in apply_projection_to_path
to address your main concern.

I spent some time today hacking on this; see attached. It needs more
work, but you can see what I have in mind. It's not quite the same as
what I outlined before because that turned out to not quite work, but
it does remove most of the logic from apply_projection_to_path().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

parallel-paths-tlist-cost-rmh.patchapplication/octet-stream; name=parallel-paths-tlist-cost-rmh.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 9053cfd0b9..8666555ca5 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partition-wise joins. */
 				generate_partition_wise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				/*
+				 * Create GatherPaths for any useful partial paths for rel
+				 * other than top-level rel.  The gather path for top-level
+				 * rel is generated once path target is available.  See
+				 * grouping_planner.
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd1a58336b..140c4f121d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,19 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
-	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * If this is a baserel and not the top-level rel, consider gathering any
+	 * partial paths we may have created for it.  (If we tried to gather
+	 * inheritance children, we could end up with a very large number of
+	 * gather nodes, each trying to grab its own pool of workers, so don't do
+	 * this for otherrels.  Instead, we'll consider gathering partial paths
+	 * for the parent appendrel.).  We can check for joins by counting the
+	 * membership of all_baserels (note that this correctly counts inheritance
+	 * trees as single rels).  The gather path for top-level rel is generated
+	 * once path target is available.  See grouping_planner.
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2441,11 +2446,12 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
- * path that some GatherPath or GatherMergePath has a reference to.)
+ * for the specified relation (Otherwise, add_partial_path might delete a
+ * path that some GatherPath or GatherMergePath has a reference to.) and path
+ * target for top level scan/join node is available.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2455,6 +2461,9 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	if (rel->partial_pathlist == NIL)
 		return;
 
+	if (target == NULL)
+		target = rel->reltarget;
+
 	/*
 	 * The output of Gather is always unsorted, so there's only one partial
 	 * path of interest: the cheapest one.  That will be the one at the front
@@ -2462,7 +2471,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	cheapest_partial_path = linitial(rel->partial_pathlist);
 	simple_gather_path = (Path *)
-		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
+		create_gather_path(root, rel, cheapest_partial_path, target,
 						   NULL, NULL);
 	add_path(rel, simple_gather_path);
 
@@ -2478,7 +2487,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
+		path = create_gather_merge_path(root, rel, subpath, target,
 										subpath->pathkeys, NULL, NULL);
 		add_path(rel, &path->path);
 	}
@@ -2651,8 +2660,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partition-wise joins. */
 			generate_partition_wise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			/*
+			 * Create GatherPaths for any useful partial paths for rel other
+			 * than top-level rel.  The gather path for top-level rel is
+			 * generated once path target is available.  See grouping_planner.
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8679b14b29..3c4d6a44be 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -374,6 +374,14 @@ cost_gather(GatherPath *path, PlannerInfo *root,
 	startup_cost += parallel_setup_cost;
 	run_cost += parallel_tuple_cost * path->path.rows;
 
+	/* add tlist eval costs only if projecting */
+	if (path->path.pathtarget != path->subpath->pathtarget)
+	{
+		/* tlist eval costs are paid per output row, not per tuple scanned */
+		startup_cost += path->path.pathtarget->cost.startup;
+		run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows;
+	}
+
 	path->path.startup_cost = startup_cost;
 	path->path.total_cost = (startup_cost + run_cost);
 }
@@ -441,6 +449,14 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	startup_cost += parallel_setup_cost;
 	run_cost += parallel_tuple_cost * path->path.rows * 1.05;
 
+	/* add tlist eval costs only if projecting */
+	if (path->path.pathtarget != path->subpath->pathtarget)
+	{
+		/* tlist eval costs are paid per output row, not per tuple scanned */
+		startup_cost += path->path.pathtarget->cost.startup;
+		run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows;
+	}
+
 	path->path.startup_cost = startup_cost + input_startup_cost;
 	path->path.total_cost = (startup_cost + run_cost + input_total_cost);
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2a4e22b6c8..d66364e718 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1918,46 +1918,78 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
-		 * Upper planning steps which make use of the top scan/join rel's
-		 * partial pathlist will expect partial paths for that rel to produce
-		 * the same output as complete paths ... and we just changed the
-		 * output for the complete paths, so we'll need to do the same thing
-		 * for partial paths.  But only parallel-safe expressions can be
-		 * computed by partial paths.
+		 * When possible, we want target list evaluation to happen in parallel
+		 * worker processes rather than in the leader.  To facilitate this,
+		 * scan/join planning avoids generating Gather or Gather Merge paths
+		 * for the topmost scan/join relation.  That lets us do it here,
+		 * possibly after adjusting the target lists of the partial paths.
+		 *
+		 * In the past, we used to generate Gather or Gather Merge paths first
+		 * and then modify the target lists of their subpaths after the fact,
+		 * but that wasn't good because at that point it's too late for the
+		 * associated cost savings to affect which plans get chosen.  A plan
+		 * that involves using parallel query for the entire scan/join tree
+		 * may gain a significant advantage as compared with a serial plan if
+		 * target list evaluation is expensive.
 		 */
-		if (current_rel->partial_pathlist &&
-			is_parallel_safe(root, (Node *) scanjoin_target->exprs))
+		if (current_rel->partial_pathlist != NIL)
 		{
-			/* Apply the scan/join target to each partial path */
-			foreach(lc, current_rel->partial_pathlist)
+			bool		scanjoin_target_parallel_safe = false;
+
+			/*
+			 * If scanjoin_target is parallel-safe, apply it to all partial
+			 * paths, just like we already did for non-partial paths.
+			 */
+			if (is_parallel_safe(root, (Node *) scanjoin_target->exprs))
 			{
-				Path	   *subpath = (Path *) lfirst(lc);
-				Path	   *newpath;
+				/* Remember that the target list is parallel safe. */
+				scanjoin_target_parallel_safe = true;
 
-				/* Shouldn't have any parameterized paths anymore */
-				Assert(subpath->param_info == NULL);
+				/* Apply the scan/join target to each partial path */
+				foreach(lc, current_rel->partial_pathlist)
+				{
+					Path	   *subpath = (Path *) lfirst(lc);
+					Path	   *newpath;
 
-				/*
-				 * Don't use apply_projection_to_path() here, because there
-				 * could be other pointers to these paths, and therefore we
-				 * mustn't modify them in place.
-				 */
-				newpath = (Path *) create_projection_path(root,
-														  current_rel,
-														  subpath,
-														  scanjoin_target);
-				lfirst(lc) = newpath;
+					/* Shouldn't have any parameterized paths anymore */
+					Assert(subpath->param_info == NULL);
+
+					/*
+					 * Don't use apply_projection_to_path() here, because
+					 * there could be other pointers to these paths, and
+					 * therefore we mustn't modify them in place.
+					 */
+					newpath = (Path *) create_projection_path(root,
+															  current_rel,
+															  subpath,
+															  scanjoin_target);
+					lfirst(lc) = newpath;
+				}
 			}
-		}
-		else
-		{
+
+			/*
+			 * Try building Gather or Gather Merge paths.  We can do this even
+			 * if scanjoin_target isn't parallel-safe; for such queries,
+			 * Gather or Gather Merge will perform projection.  However, we
+			 * must be sure that the paths we generate produce
+			 * scanjoin_target, because the paths already in
+			 * current_rel->pathlist have already been adjusted to do so.
+			 */
+			generate_gather_paths(root, current_rel, scanjoin_target);
+
 			/*
-			 * In the unfortunate event that scanjoin_target is not
-			 * parallel-safe, we can't apply it to the partial paths; in that
-			 * case, we'll need to forget about the partial paths, which
-			 * aren't valid input for upper planning steps.
+			 * If scanjoin_target isn't parallel-safe, the partial paths for
+			 * this relation haven't been adjusted to generate it, which means
+			 * they can't safely be used for upper planning steps.
 			 */
-			current_rel->partial_pathlist = NIL;
+			if (!scanjoin_target_parallel_safe)
+				current_rel->partial_pathlist = NIL;
+
+			/*
+			 * Since generate_gather_paths has likely added new paths to
+			 * current_rel, the cheapest path might have changed.
+			 */
+			set_cheapest(current_rel);
 		}
 
 		/* Now fix things up if scan/join target contains SRFs */
@@ -4700,8 +4732,21 @@ create_ordered_paths(PlannerInfo *root,
 											 ordered_rel,
 											 cheapest_partial_path,
 											 root->sort_pathkeys,
+
 											 limit_tuples);
 
+			/*
+			 * If projection is required, and it's safe to to do it before
+			 * Gather Merge, then do so.
+			 */
+			if (path->pathtarget != target &&
+				is_parallel_safe(root, (Node *) target->exprs))
+				path = (Path *)
+					create_projection_path(root,
+										   ordered_rel,
+										   path,
+										   target);
+
 			total_groups = cheapest_partial_path->rows *
 				cheapest_partial_path->parallel_workers;
 			path = (Path *)
@@ -4711,7 +4756,10 @@ create_ordered_paths(PlannerInfo *root,
 										 root->sort_pathkeys, NULL,
 										 &total_groups);
 
-			/* Add projection step if needed */
+			/*
+			 * If projection is required and we didn't do it before Gather
+			 * Merge, do it now.
+			 */
 			if (path->pathtarget != target)
 				path = apply_projection_to_path(root, ordered_rel,
 												path, target);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fe3b4582d4..26fe5ca6a8 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2435,10 +2435,6 @@ create_projection_path(PlannerInfo *root,
  * knows that the given path isn't referenced elsewhere and so can be modified
  * in-place.
  *
- * If the input path is a GatherPath or GatherMergePath, we try to push the
- * new target down to its input as well; this is a yet more invasive
- * modification of the input path, which create_projection_path() can't do.
- *
  * Note that we mustn't change the source path's parent link; so when it is
  * add_path'd to "rel" things will be a bit inconsistent.  So far that has
  * not caused any trouble.
@@ -2473,57 +2469,6 @@ apply_projection_to_path(PlannerInfo *root,
 	path->total_cost += target->cost.startup - oldcost.startup +
 		(target->cost.per_tuple - oldcost.per_tuple) * path->rows;
 
-	/*
-	 * If the path happens to be a Gather or GatherMerge path, we'd like to
-	 * arrange for the subpath to return the required target list so that
-	 * workers can help project.  But if there is something that is not
-	 * parallel-safe in the target expressions, then we can't.
-	 */
-	if ((IsA(path, GatherPath) ||IsA(path, GatherMergePath)) &&
-		is_parallel_safe(root, (Node *) target->exprs))
-	{
-		/*
-		 * We always use create_projection_path here, even if the subpath is
-		 * projection-capable, so as to avoid modifying the subpath in place.
-		 * It seems unlikely at present that there could be any other
-		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the parallel path's cost estimates; it
-		 * might be appropriate to do so, to reflect the fact that the bulk of
-		 * the target evaluation will happen in workers.
-		 */
-		if (IsA(path, GatherPath))
-		{
-			GatherPath *gpath = (GatherPath *) path;
-
-			gpath->subpath = (Path *)
-				create_projection_path(root,
-									   gpath->subpath->parent,
-									   gpath->subpath,
-									   target);
-		}
-		else
-		{
-			GatherMergePath *gmpath = (GatherMergePath *) path;
-
-			gmpath->subpath = (Path *)
-				create_projection_path(root,
-									   gmpath->subpath->parent,
-									   gmpath->subpath,
-									   target);
-		}
-	}
-	else if (path->parallel_safe &&
-			 !is_parallel_safe(root, (Node *) target->exprs))
-	{
-		/*
-		 * We're inserting a parallel-restricted target list into a path
-		 * currently marked parallel-safe, so we have to mark it as no longer
-		 * safe.
-		 */
-		path->parallel_safe = false;
-	}
-
 	return path;
 }
 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0072b7aa0d..4afbcbe13e 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 452494fbfa..e5a91bd200 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -251,6 +251,23 @@ execute tenk1_count(1);
 (1 row)
 
 deallocate tenk1_count;
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+            QUERY PLAN            
+----------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index b12ba0b74a..0f95b63c23 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -91,6 +91,17 @@ explain (costs off) execute tenk1_count(1);
 execute tenk1_count(1);
 deallocate tenk1_count;
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#63

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#62)

Re: [HACKERS] why not parallel seq scan for slow functions

On Tue, Jan 30, 2018 at 3:30 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Jan 28, 2018 at 10:13 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

If we want to get rid of Gather (Merge) checks in
apply_projection_to_path(), then we need some way to add a projection
path to the subpath of gather node even if that is capable of
projection as we do now. I think changing the order of applying
scan/join target won't address that unless we decide to do it for
every partial path. Another way could be that we handle that in
generate_gather_paths, but I think that won't be the idle place to add
projection.

If we want, we can compute the parallel-safety of scan/join target
once in grouping_planner and then pass it in apply_projection_to_path
to address your main concern.

I spent some time today hacking on this; see attached. It needs more
work, but you can see what I have in mind.

I can see what you have in mind, but I think we still need to change
the parallel safety flag of the path if *_target is not parallel safe
either inside apply_projection_to_path or may be outside where it is
called. Basically, I am talking about below code:

@@ -2473,57 +2469,6 @@ apply_projection_to_path(PlannerInfo *root,
{
..
- else if (path->parallel_safe &&
- !is_parallel_safe(root, (Node *) target->exprs))
- {
- /*
- * We're inserting a parallel-restricted target list into a path
- * currently marked parallel-safe, so we have to mark it as no longer
- * safe.
- */
- path->parallel_safe = false;
- }
-
..
}

I can take care of dealing with this unless you think otherwise.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#64

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#63)

1 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Wed, Jan 31, 2018 at 11:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I can see what you have in mind, but I think we still need to change
the parallel safety flag of the path if *_target is not parallel safe
either inside apply_projection_to_path or may be outside where it is
called.

Hmm. You have a point.

That's not the only problem, though. With these changes, we need to
check every place where apply_projection_to_path is used to see
whether we're losing the ability to push down the target list below
Gather (Merge) in some situations. If we are, then we need to see if
we can supply the correct target list to the code that generates the
partial path, before Gather (Merge) is added.

There are the following call sites:

* grouping_planner.
* create_ordered_paths (x2).
* adjust_path_for_srfs.
* build_minmax_path.
* recurse_set_operations (x2).

The cases in recurse_set_operations() don't matter, because the path
whose target list we're adjusting is known not to be a Gather path.
In the first call, it's definitely a Subquery Scan, and in the second
case it'll be a path implementing some set operation.
grouping_planner() is the core thing the patch is trying to fix, and
as far as I can tell the logic changes there are adequate. Also, the
second case in create_ordered_paths() is fixed: the patch changes
things so that it inserts a projection path before Gather Merge if
that's safe to do so.

The other cases aren't so clear. In the case of the first call within
create_ordered_paths, there's no loss in the !is_sorted case because
apply_projection_to_path will be getting called on a Sort path. But
when is_sorted is true, the current code can push a target list into a
Gather or Gather Merge that was created with some other target list,
and with the patch it can't. I'm not quite sure what sort of query
would trigger that problem, but it seems like there's something to
worry about there. Similarly I can't see any obvious reason why this
isn't a problem for adjust_path_for_srfs and build_minmax_path as
well, although I haven't tried to construct queries that hit those
cases, either.

Now, we could just give up on this approach and leave that code in
apply_projection_to_path, but what's bothering me is that, presumably,
any place where that code is actually getting used has the same
problem that we're trying to fix in grouping_planner: the tlist evals
costs are not being factored into the decision as to which path we
should choose, which might make a good parallel path lose to an
inferior non-parallel path. It would be best to fix that throughout
the code base rather than only fixing the more common paths -- if we
can do so with a reasonable amount of work.

Here's a new version which (a) restores the code that you pointed out,
(b) always passes the target to generate_gather_paths() instead of
treating NULL as a special case, and (c) reworks some of the comments
you added.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

parallel-paths-tlist-cost-rmh-v2.patchapplication/octet-stream; name=parallel-paths-tlist-cost-rmh-v2.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 9053cfd0b9..8243e83ef1 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partition-wise joins. */
 				generate_partition_wise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				/*
+				 * Except for the topmost scan/join rel, consider gathering
+				 * partial paths.  We'll do the same for the topmost scan/join
+				 * rel once we know the final targetlist (see
+				 * grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, joinrel->reltarget);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index fd1a58336b..48a387945a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers.  Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan/join rel (that is, the only baserel),
+	 * we postpone this until the final scan/join targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
+		generate_gather_paths(root, rel, rel->reltarget);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2443,9 +2450,13 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * This must not be called until after we're done creating all partial paths
  * for the specified relation.  (Otherwise, add_partial_path might delete a
  * path that some GatherPath or GatherMergePath has a reference to.)
+ *
+ * It should also not be called until we know what target list we want to
+ * generate; if the Gather's target list is different from that of its subplan,
+ * the projection will have to be done by the leader rather than the workers.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2462,7 +2473,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	 */
 	cheapest_partial_path = linitial(rel->partial_pathlist);
 	simple_gather_path = (Path *)
-		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
+		create_gather_path(root, rel, cheapest_partial_path, target,
 						   NULL, NULL);
 	add_path(rel, simple_gather_path);
 
@@ -2478,7 +2489,7 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
+		path = create_gather_merge_path(root, rel, subpath, target,
 										subpath->pathkeys, NULL, NULL);
 		add_path(rel, &path->path);
 	}
@@ -2651,8 +2662,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partition-wise joins. */
 			generate_partition_wise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			/*
+			 * Except for the topmost scan/join rel, consider gathering
+			 * partial paths.  We'll do the same for the topmost scan/join rel
+			 * once we know the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, rel->reltarget);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 8679b14b29..3c4d6a44be 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -374,6 +374,14 @@ cost_gather(GatherPath *path, PlannerInfo *root,
 	startup_cost += parallel_setup_cost;
 	run_cost += parallel_tuple_cost * path->path.rows;
 
+	/* add tlist eval costs only if projecting */
+	if (path->path.pathtarget != path->subpath->pathtarget)
+	{
+		/* tlist eval costs are paid per output row, not per tuple scanned */
+		startup_cost += path->path.pathtarget->cost.startup;
+		run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows;
+	}
+
 	path->path.startup_cost = startup_cost;
 	path->path.total_cost = (startup_cost + run_cost);
 }
@@ -441,6 +449,14 @@ cost_gather_merge(GatherMergePath *path, PlannerInfo *root,
 	startup_cost += parallel_setup_cost;
 	run_cost += parallel_tuple_cost * path->path.rows * 1.05;
 
+	/* add tlist eval costs only if projecting */
+	if (path->path.pathtarget != path->subpath->pathtarget)
+	{
+		/* tlist eval costs are paid per output row, not per tuple scanned */
+		startup_cost += path->path.pathtarget->cost.startup;
+		run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows;
+	}
+
 	path->path.startup_cost = startup_cost + input_startup_cost;
 	path->path.total_cost = (startup_cost + run_cost + input_total_cost);
 }
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 2a4e22b6c8..d66364e718 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1918,46 +1918,78 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
-		 * Upper planning steps which make use of the top scan/join rel's
-		 * partial pathlist will expect partial paths for that rel to produce
-		 * the same output as complete paths ... and we just changed the
-		 * output for the complete paths, so we'll need to do the same thing
-		 * for partial paths.  But only parallel-safe expressions can be
-		 * computed by partial paths.
+		 * When possible, we want target list evaluation to happen in parallel
+		 * worker processes rather than in the leader.  To facilitate this,
+		 * scan/join planning avoids generating Gather or Gather Merge paths
+		 * for the topmost scan/join relation.  That lets us do it here,
+		 * possibly after adjusting the target lists of the partial paths.
+		 *
+		 * In the past, we used to generate Gather or Gather Merge paths first
+		 * and then modify the target lists of their subpaths after the fact,
+		 * but that wasn't good because at that point it's too late for the
+		 * associated cost savings to affect which plans get chosen.  A plan
+		 * that involves using parallel query for the entire scan/join tree
+		 * may gain a significant advantage as compared with a serial plan if
+		 * target list evaluation is expensive.
 		 */
-		if (current_rel->partial_pathlist &&
-			is_parallel_safe(root, (Node *) scanjoin_target->exprs))
+		if (current_rel->partial_pathlist != NIL)
 		{
-			/* Apply the scan/join target to each partial path */
-			foreach(lc, current_rel->partial_pathlist)
+			bool		scanjoin_target_parallel_safe = false;
+
+			/*
+			 * If scanjoin_target is parallel-safe, apply it to all partial
+			 * paths, just like we already did for non-partial paths.
+			 */
+			if (is_parallel_safe(root, (Node *) scanjoin_target->exprs))
 			{
-				Path	   *subpath = (Path *) lfirst(lc);
-				Path	   *newpath;
+				/* Remember that the target list is parallel safe. */
+				scanjoin_target_parallel_safe = true;
 
-				/* Shouldn't have any parameterized paths anymore */
-				Assert(subpath->param_info == NULL);
+				/* Apply the scan/join target to each partial path */
+				foreach(lc, current_rel->partial_pathlist)
+				{
+					Path	   *subpath = (Path *) lfirst(lc);
+					Path	   *newpath;
 
-				/*
-				 * Don't use apply_projection_to_path() here, because there
-				 * could be other pointers to these paths, and therefore we
-				 * mustn't modify them in place.
-				 */
-				newpath = (Path *) create_projection_path(root,
-														  current_rel,
-														  subpath,
-														  scanjoin_target);
-				lfirst(lc) = newpath;
+					/* Shouldn't have any parameterized paths anymore */
+					Assert(subpath->param_info == NULL);
+
+					/*
+					 * Don't use apply_projection_to_path() here, because
+					 * there could be other pointers to these paths, and
+					 * therefore we mustn't modify them in place.
+					 */
+					newpath = (Path *) create_projection_path(root,
+															  current_rel,
+															  subpath,
+															  scanjoin_target);
+					lfirst(lc) = newpath;
+				}
 			}
-		}
-		else
-		{
+
+			/*
+			 * Try building Gather or Gather Merge paths.  We can do this even
+			 * if scanjoin_target isn't parallel-safe; for such queries,
+			 * Gather or Gather Merge will perform projection.  However, we
+			 * must be sure that the paths we generate produce
+			 * scanjoin_target, because the paths already in
+			 * current_rel->pathlist have already been adjusted to do so.
+			 */
+			generate_gather_paths(root, current_rel, scanjoin_target);
+
 			/*
-			 * In the unfortunate event that scanjoin_target is not
-			 * parallel-safe, we can't apply it to the partial paths; in that
-			 * case, we'll need to forget about the partial paths, which
-			 * aren't valid input for upper planning steps.
+			 * If scanjoin_target isn't parallel-safe, the partial paths for
+			 * this relation haven't been adjusted to generate it, which means
+			 * they can't safely be used for upper planning steps.
 			 */
-			current_rel->partial_pathlist = NIL;
+			if (!scanjoin_target_parallel_safe)
+				current_rel->partial_pathlist = NIL;
+
+			/*
+			 * Since generate_gather_paths has likely added new paths to
+			 * current_rel, the cheapest path might have changed.
+			 */
+			set_cheapest(current_rel);
 		}
 
 		/* Now fix things up if scan/join target contains SRFs */
@@ -4700,8 +4732,21 @@ create_ordered_paths(PlannerInfo *root,
 											 ordered_rel,
 											 cheapest_partial_path,
 											 root->sort_pathkeys,
+
 											 limit_tuples);
 
+			/*
+			 * If projection is required, and it's safe to to do it before
+			 * Gather Merge, then do so.
+			 */
+			if (path->pathtarget != target &&
+				is_parallel_safe(root, (Node *) target->exprs))
+				path = (Path *)
+					create_projection_path(root,
+										   ordered_rel,
+										   path,
+										   target);
+
 			total_groups = cheapest_partial_path->rows *
 				cheapest_partial_path->parallel_workers;
 			path = (Path *)
@@ -4711,7 +4756,10 @@ create_ordered_paths(PlannerInfo *root,
 										 root->sort_pathkeys, NULL,
 										 &total_groups);
 
-			/* Add projection step if needed */
+			/*
+			 * If projection is required and we didn't do it before Gather
+			 * Merge, do it now.
+			 */
 			if (path->pathtarget != target)
 				path = apply_projection_to_path(root, ordered_rel,
 												path, target);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fe3b4582d4..b2a25b86f1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2435,10 +2435,6 @@ create_projection_path(PlannerInfo *root,
  * knows that the given path isn't referenced elsewhere and so can be modified
  * in-place.
  *
- * If the input path is a GatherPath or GatherMergePath, we try to push the
- * new target down to its input as well; this is a yet more invasive
- * modification of the input path, which create_projection_path() can't do.
- *
  * Note that we mustn't change the source path's parent link; so when it is
  * add_path'd to "rel" things will be a bit inconsistent.  So far that has
  * not caused any trouble.
@@ -2473,48 +2469,8 @@ apply_projection_to_path(PlannerInfo *root,
 	path->total_cost += target->cost.startup - oldcost.startup +
 		(target->cost.per_tuple - oldcost.per_tuple) * path->rows;
 
-	/*
-	 * If the path happens to be a Gather or GatherMerge path, we'd like to
-	 * arrange for the subpath to return the required target list so that
-	 * workers can help project.  But if there is something that is not
-	 * parallel-safe in the target expressions, then we can't.
-	 */
-	if ((IsA(path, GatherPath) ||IsA(path, GatherMergePath)) &&
-		is_parallel_safe(root, (Node *) target->exprs))
-	{
-		/*
-		 * We always use create_projection_path here, even if the subpath is
-		 * projection-capable, so as to avoid modifying the subpath in place.
-		 * It seems unlikely at present that there could be any other
-		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the parallel path's cost estimates; it
-		 * might be appropriate to do so, to reflect the fact that the bulk of
-		 * the target evaluation will happen in workers.
-		 */
-		if (IsA(path, GatherPath))
-		{
-			GatherPath *gpath = (GatherPath *) path;
-
-			gpath->subpath = (Path *)
-				create_projection_path(root,
-									   gpath->subpath->parent,
-									   gpath->subpath,
-									   target);
-		}
-		else
-		{
-			GatherMergePath *gmpath = (GatherMergePath *) path;
-
-			gmpath->subpath = (Path *)
-				create_projection_path(root,
-									   gmpath->subpath->parent,
-									   gmpath->subpath,
-									   target);
-		}
-	}
-	else if (path->parallel_safe &&
-			 !is_parallel_safe(root, (Node *) target->exprs))
+	if (path->parallel_safe &&
+		!is_parallel_safe(root, (Node *) target->exprs))
 	{
 		/*
 		 * We're inserting a parallel-restricted target list into a path
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0072b7aa0d..4afbcbe13e 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 452494fbfa..e5a91bd200 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -251,6 +251,23 @@ execute tenk1_count(1);
 (1 row)
 
 deallocate tenk1_count;
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+            QUERY PLAN            
+----------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index b12ba0b74a..0f95b63c23 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -91,6 +91,17 @@ explain (costs off) execute tenk1_count(1);
 execute tenk1_count(1);
 deallocate tenk1_count;
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#65

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#64)

1 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Fri, Feb 2, 2018 at 12:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 31, 2018 at 11:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

The other cases aren't so clear. In the case of the first call within
create_ordered_paths, there's no loss in the !is_sorted case because
apply_projection_to_path will be getting called on a Sort path. But
when is_sorted is true, the current code can push a target list into a
Gather or Gather Merge that was created with some other target list,
and with the patch it can't. I'm not quite sure what sort of query
would trigger that problem, but it seems like there's something to
worry about there.

I think the query plans which involve Gather Merge -> Parallel Index
Scan can be impacted. For ex. cosider below case:

With parallel-paths-tlist-cost-rmh-v2.patch
postgres=# set cpu_operator_cost=0;
SET
postgres=# set parallel_setup_cost=0;set parallel_tuple_cost=0;set
min_parallel_table_scan_size=0;set
max_parallel_workers_per_gather=4;
SET
SET
SET
SET
postgres=# explain (costs off, verbose) select simple_func(aid) from
pgbench_accounts where aid > 1000 order by aid;
QUERY PLAN
---------------------------------------------------------------------------------------
Gather Merge
Output: simple_func(aid), aid
Workers Planned: 4
-> Parallel Index Only Scan using pgbench_accounts_pkey on
public.pgbench_accounts
Output: aid
Index Cond: (pgbench_accounts.aid > 1000)
(6 rows)

HEAD and parallel_paths_include_tlist_cost_v7
postgres=# explain (costs off, verbose) select simple_func(aid) from
pgbench_accounts where aid > 1000 order by aid;
QUERY PLAN
---------------------------------------------------------------------------------------
Gather Merge
Output: (simple_func(aid)), aid
Workers Planned: 4
-> Parallel Index Only Scan using pgbench_accounts_pkey on
public.pgbench_accounts
Output: simple_func(aid), aid
Index Cond: (pgbench_accounts.aid > 1000)
(6 rows)

For the above test, I have initialized pgbench with 100 scale factor.

It shows that with patch parallel-paths-tlist-cost-rmh-v2.patch, we
will lose the capability to push target list in some cases. One lame
idea could be that at the call location, we detect if it is a Gather
(Merge) and then push the target list, but doing so at all the
locations doesn't sound to be a good alternative as compared to
another approach where we push target list in
apply_projection_to_path.

Similarly I can't see any obvious reason why this
isn't a problem for adjust_path_for_srfs and build_minmax_path as
well, although I haven't tried to construct queries that hit those
cases, either.

Neither do I, but I can give it a try if we expect something different
than the results of above example.

Now, we could just give up on this approach and leave that code in
apply_projection_to_path, but what's bothering me is that, presumably,
any place where that code is actually getting used has the same
problem that we're trying to fix in grouping_planner: the tlist evals
costs are not being factored into the decision as to which path we
should choose, which might make a good parallel path lose to an
inferior non-parallel path.

Your concern is valid, but isn't the same problem exists in another
approach as well, because in that also we can call
adjust_paths_for_srfs after generating gather path which means that we
might lose some opportunity to reduce the relative cost of parallel
paths due to tlists having SRFs. Also, a similar problem can happen
in create_order_paths for the cases as described in the example
above.

It would be best to fix that throughout
the code base rather than only fixing the more common paths -- if we
can do so with a reasonable amount of work.

Agreed, I think one way to achieve that is instead of discarding
parallel paths based on cost, we retain them till the later phase of
planning, something like what we do for ordered paths. In that case,
the way changes have been done in the patch
parallel_paths_include_tlist_cost_v7 will work. I think it will be
helpful for other cases as well if we keep the parallel paths as
alternative paths till later stage of planning (we have discussed it
during parallelizing subplans as well), however, that is a bigger
project on its own. I don't think it will be too bad to go with what
we have in parallel_paths_include_tlist_cost_v7 with the intent that
in future when we do the other project, the other cases will be
automatically dealt.

I have updated the patch parallel_paths_include_tlist_cost_v7 by
changing some of the comments based on
parallel-paths-tlist-cost-rmh-v2.patch.

I have one additional comment on parallel-paths-tlist-cost-rmh-v2.patch
@@ -374,6 +374,14 @@ cost_gather(GatherPath *path, PlannerInfo *root,
startup_cost += parallel_setup_cost;
run_cost += parallel_tuple_cost * path->path.rows;

+ /* add tlist eval costs only if projecting */
+ if (path->path.pathtarget != path->subpath->pathtarget)
+ {
+ /* tlist eval costs are paid per output row, not per tuple scanned */
+ startup_cost += path->path.pathtarget->cost.startup;
+ run_cost += path->path.pathtarget->cost.per_tuple * path->path.rows;
+ }

I think when the tlist eval costs are added, we should subtract the
previous cost added for subpath.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v8.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v8.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 9053cfd..b5f849d 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partition-wise joins. */
 				generate_partition_wise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel);
+				/*
+				 * Except for the topmost scan/join rel, consider gathering
+				 * partial paths.  We'll do the same for the topmost scan/join
+				 * rel once we know the final targetlist (see
+				 * grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6e842f9..5206da7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers. Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan/join rel (that is, the only baserel),
+	 * we postpone this until the final scan/join targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
+		generate_gather_paths(root, rel, NULL);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2442,11 +2449,14 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  *		Gather Merge on top of a partial path.
  *
  * This must not be called until after we're done creating all partial paths
- * for the specified relation.  (Otherwise, add_partial_path might delete a
+ * for the specified relation. (Otherwise, add_partial_path might delete a
  * path that some GatherPath or GatherMergePath has a reference to.)
+ *
+ * It should also not be called until we know what target list we want to
+ * generate.
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2465,6 +2475,14 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, NULL);
+
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root,
+													  rel,
+													  simple_gather_path,
+													  target);
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2474,14 +2492,20 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	   *path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, NULL);
-		add_path(rel, &path->path);
+		path = (Path *) create_gather_merge_path(root, rel, subpath,
+												 rel->reltarget,
+												 subpath->pathkeys, NULL,
+												 NULL);
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2652,8 +2676,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partition-wise joins. */
 			generate_partition_wise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel);
+			/*
+			 * Except for the topmost scan/join rel, consider gathering
+			 * partial paths.  We'll do the same for the topmost scan/join rel
+			 * once we know the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 3e8cd14..e581c32 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1918,6 +1918,28 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
+		 * When possible, we want target list evaluation to happen in parallel
+		 * worker processes rather than in the leader.  To facilitate this,
+		 * scan/join planning avoids generating Gather or Gather Merge paths
+		 * for the topmost scan/join relation.  That lets us do it here.
+		 *
+		 * In the past, we used to generate Gather or Gather Merge paths first
+		 * and then modify the target lists of their subpaths after the fact,
+		 * but that wasn't good because at that point it's too late for the
+		 * associated cost savings to affect which plans get chosen.  A plan
+		 * that involves using parallel query for the entire scan/join tree
+		 * may gain a significant advantage as compared with a serial plan if
+		 * target list evaluation is expensive.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target);
+
+		/*
+		 * Since generate_gather_paths has likely added new paths to
+		 * current_rel, the cheapest path might have changed.
+		 */
+		set_cheapest(current_rel);
+
+		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
 		 * the same output as complete paths ... and we just changed the
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fe3b458..d2b845c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2454,6 +2454,8 @@ apply_projection_to_path(PlannerInfo *root,
 						 PathTarget *target)
 {
 	QualCost	oldcost;
+	double		nrows;
+	bool		resultPath = false;
 
 	/*
 	 * If given path can't project, we might need a Result node, so make a
@@ -2464,14 +2466,16 @@ apply_projection_to_path(PlannerInfo *root,
 
 	/*
 	 * We can just jam the desired tlist into the existing path, being sure to
-	 * update its cost estimates appropriately.
+	 * update its cost estimates appropriately.  Also, ensure that the cost
+	 * estimates reflects the fact that the target list evaluation will happen
+	 * in workers if path is a Gather or GatherMerge path.
 	 */
 	oldcost = path->pathtarget->cost;
 	path->pathtarget = target;
 
+	nrows = path->rows;
 	path->startup_cost += target->cost.startup - oldcost.startup;
-	path->total_cost += target->cost.startup - oldcost.startup +
-		(target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+	path->total_cost += target->cost.startup - oldcost.startup;
 
 	/*
 	 * If the path happens to be a Gather or GatherMerge path, we'd like to
@@ -2487,10 +2491,6 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the parallel path's cost estimates; it
-		 * might be appropriate to do so, to reflect the fact that the bulk of
-		 * the target evaluation will happen in workers.
 		 */
 		if (IsA(path, GatherPath))
 		{
@@ -2501,6 +2501,10 @@ apply_projection_to_path(PlannerInfo *root,
 									   gpath->subpath->parent,
 									   gpath->subpath,
 									   target);
+
+			nrows = gpath->subpath->rows;
+			if (!((ProjectionPath *) gpath->subpath)->dummypp)
+				resultPath = true;
 		}
 		else
 		{
@@ -2511,6 +2515,10 @@ apply_projection_to_path(PlannerInfo *root,
 									   gmpath->subpath->parent,
 									   gmpath->subpath,
 									   target);
+
+			nrows = gmpath->subpath->rows;
+			if (!((ProjectionPath *) gmpath->subpath)->dummypp)
+				resultPath = true;
 		}
 	}
 	else if (path->parallel_safe &&
@@ -2524,6 +2532,20 @@ apply_projection_to_path(PlannerInfo *root,
 		path->parallel_safe = false;
 	}
 
+	/*
+	 * Update the cost estimates based on whether Result node is required. See
+	 * create_projection_path.
+	 */
+	if (resultPath)
+	{
+		Assert (IsA(path, GatherPath) || IsA(path, GatherMergePath));
+		path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * nrows;
+	}
+	else
+	{
+		path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * nrows;
+	}
+
 	return path;
 }
 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 4708443..fff947e 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -53,7 +53,8 @@ extern void set_dummy_rel_pathlist(RelOptInfo *rel);
 extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
-extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel);
+extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 452494f..e5a91bd 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -251,6 +251,23 @@ execute tenk1_count(1);
 (1 row)
 
 deallocate tenk1_count;
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+            QUERY PLAN            
+----------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index b12ba0b..0f95b63 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -91,6 +91,17 @@ explain (costs off) execute tenk1_count(1);
 execute tenk1_count(1);
 deallocate tenk1_count;
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#66

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Amit Kapila (#65)

Re: [HACKERS] why not parallel seq scan for slow functions

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner(). standard_join_search() would then set this as
reltarget of the topmost relation and every path created for it will
have that target, applying projection if needed. This way we avoid
calling generate_gather_path() at two places. Right now
generate_gather_path() seems to be the only thing benefitting from
this but what about FDWs and custom paths whose costs may change when
targetlist changes. For custom paths I am considering GPU optimization
paths. Also this might address Tom's worry, "But having to do that in
a normal code path
suggests that something's not right about the design ... "

Here are some comments on the patch.

+                /*
+                 * Except for the topmost scan/join rel, consider gathering
+                 * partial paths.  We'll do the same for the topmost scan/join
This function only works on join relations. Mentioning scan rel is confusing.

index 6e842f9..5206da7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
     }

+     *
+     * Also, if this is the topmost scan/join rel (that is, the only baserel),
+     * we postpone this until the final scan/join targelist is available (see

Mentioning join rel here is confusing since we deal with base relations here.

+ bms_membership(root->all_baserels) != BMS_SINGLETON)

set_tablesample_rel_pathlist() is also using this method to decide whether
there are any joins in the query. May be macro-ize this and use that macro at
these two places?

- * for the specified relation.  (Otherwise, add_partial_path might delete a
+ * for the specified relation. (Otherwise, add_partial_path might delete a

Unrelated change?

+    /* Add projection step if needed */
+    if (target && simple_gather_path->pathtarget != target)

If the target was copied someplace, this test will fail. Probably we want to
check containts of the PathTarget structure? Right now copy_pathtarget() is not
called from many places and all those places modify the copied target. So this
isn't a problem. But we can't guarantee that in future. Similar comment for
gather_merge path creation.

+        simple_gather_path = apply_projection_to_path(root,
+                                                      rel,
+                                                      simple_gather_path,
+                                                      target);
+

Why don't we incorporate those changes in create_gather_path() by passing it
the projection target instead of rel->reltarget? Similar comment for
gather_merge path creation.

+            /*
+             * Except for the topmost scan/join rel, consider gathering
+             * partial paths.  We'll do the same for the topmost scan/join rel

Mentioning scan rel is confusing here.

deallocate tenk1_count;
+explain (costs off) select ten, costly_func(ten) from tenk1;

verbose output will show that the parallel seq scan's targetlist has
costly_func() in it. Isn't that what we want to test here?

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#67

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#66)

Re: [HACKERS] why not parallel seq scan for slow functions

On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner().

I think the reason for doing in that order is that we can't compute
target's width till after query_planner(). See the below note in
code:

/*
* Convert the query's result tlist into PathTarget format.
*
* Note: it's desirable to not do this till after query_planner(),
* because the target width estimates can use per-Var width numbers
* that were obtained within query_planner().
*/
final_target = create_pathtarget(root, tlist);

Now, I think we can try to juggle the code in a way that the width can
be computed later, but the rest can be done earlier. However, I think
that will be somewhat major change and still won't address all kind of
cases (like for ordered paths) unless we can try to get all kind of
targets pushed down in the call stack. Also, I am not sure if that is
the only reason or there are some other assumptions about this calling
order as well.

Here are some comments on the patch.

Thanks for looking into the patch. As of now, we are evaluating the
right approach for this patch, so let's wait for Robert's reply.
After we agree on the approach, I will address your comments.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#68

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Amit Kapila (#67)

Re: [HACKERS] why not parallel seq scan for slow functions

On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner().

I think the reason for doing in that order is that we can't compute
target's width till after query_planner(). See the below note in
code:

/*
* Convert the query's result tlist into PathTarget format.
*
* Note: it's desirable to not do this till after query_planner(),
* because the target width estimates can use per-Var width numbers
* that were obtained within query_planner().
*/
final_target = create_pathtarget(root, tlist);

Now, I think we can try to juggle the code in a way that the width can
be computed later, but the rest can be done earlier.

/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
create_pathtarget already works that way. We will need to split it.

Create the Pathtarget without widths. Apply width estimates once we
know the width of Vars somewhere here in query_planner()
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
* query by join removal; so we can compute total_table_pages.
*
* Note that appendrels are not double-counted here, even though we don't
* bother to distinguish RelOptInfos for appendrel parents, because the
* parents will still have size zero.
*
The next step is building the join tree. Set the pathtarget before that.

However, I think
that will be somewhat major change

I agree.

still won't address all kind of
cases (like for ordered paths) unless we can try to get all kind of
targets pushed down in the call stack.

I didn't understand that.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#69

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#68)

Re: [HACKERS] why not parallel seq scan for slow functions

On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner().

I think the reason for doing in that order is that we can't compute
target's width till after query_planner(). See the below note in
code:

/*
* Convert the query's result tlist into PathTarget format.
*
* Note: it's desirable to not do this till after query_planner(),
* because the target width estimates can use per-Var width numbers
* that were obtained within query_planner().
*/
final_target = create_pathtarget(root, tlist);

Now, I think we can try to juggle the code in a way that the width can
be computed later, but the rest can be done earlier.

/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
create_pathtarget already works that way. We will need to split it.

Create the Pathtarget without widths. Apply width estimates once we
know the width of Vars somewhere here in query_planner()
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
* query by join removal; so we can compute total_table_pages.
*
* Note that appendrels are not double-counted here, even though we don't
* bother to distinguish RelOptInfos for appendrel parents, because the
* parents will still have size zero.
*
The next step is building the join tree. Set the pathtarget before that.

However, I think
that will be somewhat major change

I agree.

still won't address all kind of
cases (like for ordered paths) unless we can try to get all kind of
targets pushed down in the call stack.

I didn't understand that.

The places where we use a target different than the target which is
pushed via query planner can cause a similar problem (ex. see the
usage of adjust_paths_for_srfs) because the cost of that target
wouldn't be taken into consideration for Gather's costing.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#70

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Amit Kapila (#69)

Re: [HACKERS] why not parallel seq scan for slow functions

On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner().

I think the reason for doing in that order is that we can't compute
target's width till after query_planner(). See the below note in
code:

/*
* Convert the query's result tlist into PathTarget format.
*
* Note: it's desirable to not do this till after query_planner(),
* because the target width estimates can use per-Var width numbers
* that were obtained within query_planner().
*/
final_target = create_pathtarget(root, tlist);

Now, I think we can try to juggle the code in a way that the width can
be computed later, but the rest can be done earlier.

/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
create_pathtarget already works that way. We will need to split it.

Create the Pathtarget without widths. Apply width estimates once we
know the width of Vars somewhere here in query_planner()
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
* query by join removal; so we can compute total_table_pages.
*
* Note that appendrels are not double-counted here, even though we don't
* bother to distinguish RelOptInfos for appendrel parents, because the
* parents will still have size zero.
*
The next step is building the join tree. Set the pathtarget before that.

However, I think
that will be somewhat major change

I agree.

still won't address all kind of
cases (like for ordered paths) unless we can try to get all kind of
targets pushed down in the call stack.

I didn't understand that.

The places where we use a target different than the target which is
pushed via query planner can cause a similar problem (ex. see the
usage of adjust_paths_for_srfs) because the cost of that target
wouldn't be taken into consideration for Gather's costing.

Right. Right now apply_projection_to_path() or adjust_paths_for_srfs()
do not take into consideration the type of path whose cost is being
adjusted for the new targetlist. That works for most of the paths but
not all the paths like custom, FDW or parallel paths. The idea I am
proposing is to compute final targetlist before query planner so that
it's available when we create paths for the topmost scan/join
relation. That way, any path creation logic can then take advantage of
this list to compute costs, not just parallel paths.

--
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#71

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Ashutosh Bapat (#70)

Re: [HACKERS] why not parallel seq scan for slow functions

On Mon, Feb 19, 2018 at 9:35 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner().

I think the reason for doing in that order is that we can't compute
target's width till after query_planner(). See the below note in
code:

/*
* Convert the query's result tlist into PathTarget format.
*
* Note: it's desirable to not do this till after query_planner(),
* because the target width estimates can use per-Var width numbers
* that were obtained within query_planner().
*/
final_target = create_pathtarget(root, tlist);

Now, I think we can try to juggle the code in a way that the width can
be computed later, but the rest can be done earlier.

/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
create_pathtarget already works that way. We will need to split it.

Create the Pathtarget without widths. Apply width estimates once we
know the width of Vars somewhere here in query_planner()
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
* query by join removal; so we can compute total_table_pages.
*
* Note that appendrels are not double-counted here, even though we don't
* bother to distinguish RelOptInfos for appendrel parents, because the
* parents will still have size zero.
*
The next step is building the join tree. Set the pathtarget before that.

However, I think
that will be somewhat major change

I agree.

still won't address all kind of
cases (like for ordered paths) unless we can try to get all kind of
targets pushed down in the call stack.

I didn't understand that.

The places where we use a target different than the target which is
pushed via query planner can cause a similar problem (ex. see the
usage of adjust_paths_for_srfs) because the cost of that target
wouldn't be taken into consideration for Gather's costing.

Right. Right now apply_projection_to_path() or adjust_paths_for_srfs()
do not take into consideration the type of path whose cost is being
adjusted for the new targetlist. That works for most of the paths but
not all the paths like custom, FDW or parallel paths. The idea I am
proposing is to compute final targetlist before query planner so that
it's available when we create paths for the topmost scan/join
relation. That way, any path creation logic can then take advantage of
this list to compute costs, not just parallel paths.

In fact, we should do this not just for scan/join relations, but all
the upper relations as well. Upper relations too create parallel
paths, whose costs need to be adjusted after their targetlists are
updated by adjust_paths_for_srfs(). Similar adjustments are needed for
any FDWs, custom paths which cost targetlists differently.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#72

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#71)

Re: [HACKERS] why not parallel seq scan for slow functions

On Mon, Feb 19, 2018 at 9:56 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Mon, Feb 19, 2018 at 9:35 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Sat, Feb 17, 2018 at 8:17 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

I happened to look at the patch for something else. But here are some
comments. If any of those have been already discussed, please feel
free to ignore. I have gone through the thread cursorily, so I might
have missed something.

In grouping_planner() we call query_planner() first which builds the
join tree and creates paths, then calculate the target for scan/join
rel which is applied on the topmost scan join rel. I am wondering
whether we can reverse this order to calculate the target list of
scan/join relation and pass it to standard_join_search() (or the hook)
through query_planner().

I think the reason for doing in that order is that we can't compute
target's width till after query_planner(). See the below note in
code:

/*
* Convert the query's result tlist into PathTarget format.
*
* Note: it's desirable to not do this till after query_planner(),
* because the target width estimates can use per-Var width numbers
* that were obtained within query_planner().
*/
final_target = create_pathtarget(root, tlist);

Now, I think we can try to juggle the code in a way that the width can
be computed later, but the rest can be done earlier.

/* Convenience macro to get a PathTarget with valid cost/width fields */
#define create_pathtarget(root, tlist) \
set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
create_pathtarget already works that way. We will need to split it.

Create the Pathtarget without widths. Apply width estimates once we
know the width of Vars somewhere here in query_planner()
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
* query by join removal; so we can compute total_table_pages.
*
* Note that appendrels are not double-counted here, even though we don't
* bother to distinguish RelOptInfos for appendrel parents, because the
* parents will still have size zero.
*
The next step is building the join tree. Set the pathtarget before that.

However, I think
that will be somewhat major change

I agree.

still won't address all kind of
cases (like for ordered paths) unless we can try to get all kind of
targets pushed down in the call stack.

I didn't understand that.

The places where we use a target different than the target which is
pushed via query planner can cause a similar problem (ex. see the
usage of adjust_paths_for_srfs) because the cost of that target
wouldn't be taken into consideration for Gather's costing.

Right. Right now apply_projection_to_path() or adjust_paths_for_srfs()
do not take into consideration the type of path whose cost is being
adjusted for the new targetlist. That works for most of the paths but
not all the paths like custom, FDW or parallel paths. The idea I am
proposing is to compute final targetlist before query planner so that
it's available when we create paths for the topmost scan/join
relation. That way, any path creation logic can then take advantage of
this list to compute costs, not just parallel paths.

In fact, we should do this not just for scan/join relations, but all
the upper relations as well. Upper relations too create parallel
paths, whose costs need to be adjusted after their targetlists are
updated by adjust_paths_for_srfs(). Similar adjustments are needed for
any FDWs, custom paths which cost targetlists differently.

I think any such change in planner can be quite tricky and can lead to
a lot of work. I am not denying that it is not possible to think
along the lines you are suggesting, but OTOH, I don't see it as a
realistic approach for this patch where we can deal with the majority
of cases with the much smaller patch. In future, if you are someone
can have a patch along those lines for some other purpose (considering
it is feasible to do so which I am not completely sure), then we can
adjust the things for parallel paths as well.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#73

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Ashutosh Bapat (#66)

1 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

After recent commits, the patch doesn't get applied cleanly, so
rebased it and along the way addressed the comments raised by you.

Here are some comments on the patch.

+                /*
+                 * Except for the topmost scan/join rel, consider gathering
+                 * partial paths.  We'll do the same for the topmost scan/join
This function only works on join relations. Mentioning scan rel is confusing.

Okay, removed the 'scan' word from the comment.

index 6e842f9..5206da7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
}

+     *
+     * Also, if this is the topmost scan/join rel (that is, the only baserel),
+     * we postpone this until the final scan/join targelist is available (see

Mentioning join rel here is confusing since we deal with base relations here.

Okay, removed the 'join' word from the comment.

+ bms_membership(root->all_baserels) != BMS_SINGLETON)

set_tablesample_rel_pathlist() is also using this method to decide whether
there are any joins in the query. May be macro-ize this and use that macro at
these two places?

maybe, but I am not sure if it improves the readability. I am open to
changing it if somebody else also feels it is better to macro-ize this
usage.

- * for the specified relation.  (Otherwise, add_partial_path might delete a
+ * for the specified relation. (Otherwise, add_partial_path might delete a

Unrelated change?

oops, removed.

+    /* Add projection step if needed */
+    if (target && simple_gather_path->pathtarget != target)
If the target was copied someplace, this test will fail. Probably we want to
check containts of the PathTarget structure? Right now copy_pathtarget() is not
called from many places and all those places modify the copied target. So this
isn't a problem. But we can't guarantee that in future. Similar comment for
gather_merge path creation.

I am not sure if there is any use of copying the path target unless
you want to modify it , but anyway we use the check similar to what is
used in patch in the multiple places in code. See
create_ordered_paths. So, we need to change all those places first if
we sense any such danger.

+        simple_gather_path = apply_projection_to_path(root,
+                                                      rel,
+                                                      simple_gather_path,
+                                                      target);
+
Why don't we incorporate those changes in create_gather_path() by passing it
the projection target instead of rel->reltarget? Similar comment for
gather_merge path creation.

Nothing important, just for the sake of code consistency, some other
places in code uses it this way. See create_ordered_paths. Also, I am
not sure if there is any advantage of doing it inside
create_gather_path.

+            /*
+             * Except for the topmost scan/join rel, consider gathering
+             * partial paths.  We'll do the same for the topmost scan/join rel

Mentioning scan rel is confusing here.

Okay, changed.

deallocate tenk1_count;
+explain (costs off) select ten, costly_func(ten) from tenk1;

verbose output will show that the parallel seq scan's targetlist has
costly_func() in it. Isn't that what we want to test here?

Not exactly, we want to just test whether the parallel plan is
selected when the costly function is used in the target list.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

parallel_paths_include_tlist_cost_v9.patchapplication/octet-stream; name=parallel_paths_include_tlist_cost_v9.patchDownload

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 0be2a73e05b..e9cd8e60f57 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,13 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partitionwise joins. */
 				generate_partitionwise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel, false);
+				/*
+				 * Except for the topmost join rel, consider gathering partial
+				 * paths.  We'll do the same for the topmost join rel once we
+				 * know the final targetlist (see grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, NULL, false);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +292,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a00eb2..bf0469a4649 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,14 +481,21 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers. Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan rel (that is, the only baserel), we
+	 * postpone this until the final scan targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
-		generate_gather_paths(root, rel, false);
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
+		generate_gather_paths(root, rel, NULL, false);
 
 	/*
 	 * Allow a plugin to editorialize on the set of Paths for this base
@@ -2445,6 +2452,9 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * for the specified relation.  (Otherwise, add_partial_path might delete a
  * path that some GatherPath or GatherMergePath has a reference to.)
  *
+ * It should also not be called until we know what target list we want to
+ * generate.
+ *
  * If we're generating paths for a scan or join relation, override_rows will
  * be false, and we'll just use the relation's size estimate.  When we're
  * being called for a partially-grouped path, though, we need to override
@@ -2453,7 +2463,8 @@ set_worktable_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
  * we must do something.)
  */
 void
-generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
+generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, PathTarget *target,
+					  bool override_rows)
 {
 	Path	   *cheapest_partial_path;
 	Path	   *simple_gather_path;
@@ -2480,6 +2491,14 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
 	simple_gather_path = (Path *)
 		create_gather_path(root, rel, cheapest_partial_path, rel->reltarget,
 						   NULL, rowsp);
+
+	/* Add projection step if needed */
+	if (target && simple_gather_path->pathtarget != target)
+		simple_gather_path = apply_projection_to_path(root,
+													  rel,
+													  simple_gather_path,
+													  target);
+
 	add_path(rel, simple_gather_path);
 
 	/*
@@ -2489,15 +2508,20 @@ generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
 	foreach(lc, rel->partial_pathlist)
 	{
 		Path	   *subpath = (Path *) lfirst(lc);
-		GatherMergePath *path;
+		Path	   *path;
 
 		if (subpath->pathkeys == NIL)
 			continue;
 
 		rows = subpath->rows * subpath->parallel_workers;
-		path = create_gather_merge_path(root, rel, subpath, rel->reltarget,
-										subpath->pathkeys, NULL, rowsp);
-		add_path(rel, &path->path);
+		path = (Path *) create_gather_merge_path(root, rel, subpath,
+							rel->reltarget, subpath->pathkeys, NULL, rowsp);
+
+		/* Add projection step if needed */
+		if (target && path->pathtarget != target)
+			path = apply_projection_to_path(root, rel, path, target);
+
+		add_path(rel, path);
 	}
 }
 
@@ -2668,8 +2692,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partitionwise joins. */
 			generate_partitionwise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel, false);
+			/*
+			 * Except for the topmost join rel, consider gathering partial
+			 * paths.  We'll do the same for the topmost join rel once we know
+			 * the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, NULL, false);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 14b7becf3e8..01a62e2038f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1948,6 +1948,28 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			}
 		}
 
+		/*
+		 * When possible, we want target list evaluation to happen in parallel
+		 * worker processes rather than in the leader.  To facilitate this,
+		 * scan/join planning avoids generating Gather or Gather Merge paths
+		 * for the topmost scan/join relation.  That lets us do it here.
+		 *
+		 * In the past, we used to generate Gather or Gather Merge paths first
+		 * and then modify the target lists of their subpaths after the fact,
+		 * but that wasn't good because at that point it's too late for the
+		 * associated cost savings to affect which plans get chosen.  A plan
+		 * that involves using parallel query for the entire scan/join tree
+		 * may gain a significant advantage as compared with a serial plan if
+		 * target list evaluation is expensive.
+		 */
+		generate_gather_paths(root, current_rel, scanjoin_target, false);
+
+		/*
+		 * Since generate_gather_paths has likely added new paths to
+		 * current_rel, the cheapest path might have changed.
+		 */
+		set_cheapest(current_rel);
+
 		/*
 		 * Upper planning steps which make use of the top scan/join rel's
 		 * partial pathlist will expect partial paths for that rel to produce
@@ -6370,7 +6392,7 @@ add_paths_to_partial_grouping_rel(PlannerInfo *root,
 	 * Try adding Gather or Gather Merge to partial paths to produce
 	 * non-partial paths.
 	 */
-	generate_gather_paths(root, partially_grouped_rel, true);
+	generate_gather_paths(root, partially_grouped_rel, NULL, true);
 
 	/* Get cheapest partial path from partially_grouped_rel */
 	cheapest_partial_path = linitial(partially_grouped_rel->partial_pathlist);
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index fe3b4582d42..d2b845cc854 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2454,6 +2454,8 @@ apply_projection_to_path(PlannerInfo *root,
 						 PathTarget *target)
 {
 	QualCost	oldcost;
+	double		nrows;
+	bool		resultPath = false;
 
 	/*
 	 * If given path can't project, we might need a Result node, so make a
@@ -2464,14 +2466,16 @@ apply_projection_to_path(PlannerInfo *root,
 
 	/*
 	 * We can just jam the desired tlist into the existing path, being sure to
-	 * update its cost estimates appropriately.
+	 * update its cost estimates appropriately.  Also, ensure that the cost
+	 * estimates reflects the fact that the target list evaluation will happen
+	 * in workers if path is a Gather or GatherMerge path.
 	 */
 	oldcost = path->pathtarget->cost;
 	path->pathtarget = target;
 
+	nrows = path->rows;
 	path->startup_cost += target->cost.startup - oldcost.startup;
-	path->total_cost += target->cost.startup - oldcost.startup +
-		(target->cost.per_tuple - oldcost.per_tuple) * path->rows;
+	path->total_cost += target->cost.startup - oldcost.startup;
 
 	/*
 	 * If the path happens to be a Gather or GatherMerge path, we'd like to
@@ -2487,10 +2491,6 @@ apply_projection_to_path(PlannerInfo *root,
 		 * projection-capable, so as to avoid modifying the subpath in place.
 		 * It seems unlikely at present that there could be any other
 		 * references to the subpath, but better safe than sorry.
-		 *
-		 * Note that we don't change the parallel path's cost estimates; it
-		 * might be appropriate to do so, to reflect the fact that the bulk of
-		 * the target evaluation will happen in workers.
 		 */
 		if (IsA(path, GatherPath))
 		{
@@ -2501,6 +2501,10 @@ apply_projection_to_path(PlannerInfo *root,
 									   gpath->subpath->parent,
 									   gpath->subpath,
 									   target);
+
+			nrows = gpath->subpath->rows;
+			if (!((ProjectionPath *) gpath->subpath)->dummypp)
+				resultPath = true;
 		}
 		else
 		{
@@ -2511,6 +2515,10 @@ apply_projection_to_path(PlannerInfo *root,
 									   gmpath->subpath->parent,
 									   gmpath->subpath,
 									   target);
+
+			nrows = gmpath->subpath->rows;
+			if (!((ProjectionPath *) gmpath->subpath)->dummypp)
+				resultPath = true;
 		}
 	}
 	else if (path->parallel_safe &&
@@ -2524,6 +2532,20 @@ apply_projection_to_path(PlannerInfo *root,
 		path->parallel_safe = false;
 	}
 
+	/*
+	 * Update the cost estimates based on whether Result node is required. See
+	 * create_projection_path.
+	 */
+	if (resultPath)
+	{
+		Assert (IsA(path, GatherPath) || IsA(path, GatherMergePath));
+		path->total_cost += (cpu_tuple_cost + target->cost.per_tuple) * nrows;
+	}
+	else
+	{
+		path->total_cost += (target->cost.per_tuple - oldcost.per_tuple) * nrows;
+	}
+
 	return path;
 }
 
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 94f9bb2b574..b21e98b4ae2 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -54,7 +54,7 @@ extern RelOptInfo *standard_join_search(PlannerInfo *root, int levels_needed,
 					 List *initial_rels);
 
 extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
-					  bool override_rows);
+						PathTarget *target, bool override_rows);
 extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 						double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/test/regress/expected/select_parallel.out b/src/test/regress/expected/select_parallel.out
index 0a782616385..0aff0c94dd1 100644
--- a/src/test/regress/expected/select_parallel.out
+++ b/src/test/regress/expected/select_parallel.out
@@ -262,6 +262,23 @@ execute tenk1_count(1);
 (1 row)
 
 deallocate tenk1_count;
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+            QUERY PLAN            
+----------------------------------
+ Gather
+   Workers Planned: 4
+   ->  Parallel Seq Scan on tenk1
+(3 rows)
+
+drop function costly_func(var1 integer);
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)
diff --git a/src/test/regress/sql/select_parallel.sql b/src/test/regress/sql/select_parallel.sql
index fa03aae0c03..eef9ace6c9f 100644
--- a/src/test/regress/sql/select_parallel.sql
+++ b/src/test/regress/sql/select_parallel.sql
@@ -97,6 +97,17 @@ explain (costs off) execute tenk1_count(1);
 execute tenk1_count(1);
 deallocate tenk1_count;
 
+-- test that parallel plan gets selected when target list contains costly
+-- function
+create or replace function costly_func(var1 integer) returns integer
+as $$
+begin
+        return var1 + 10;
+end;
+$$ language plpgsql PARALLEL SAFE Cost 100000;
+explain (costs off) select ten, costly_func(ten) from tenk1;
+drop function costly_func(var1 integer);
+
 -- test parallel plans for queries containing un-correlated subplans.
 alter table tenk2 set (parallel_workers = 0);
 explain (costs off)

#74

ashutosh.bapat@enterprisedb.com

almost 8 years ago

In reply to: Amit Kapila (#73)

Re: [HACKERS] why not parallel seq scan for slow functions

On Sun, Mar 11, 2018 at 5:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

+    /* Add projection step if needed */
+    if (target && simple_gather_path->pathtarget != target)
If the target was copied someplace, this test will fail. Probably we want to
check containts of the PathTarget structure? Right now copy_pathtarget() is not
called from many places and all those places modify the copied target. So this
isn't a problem. But we can't guarantee that in future. Similar comment for
gather_merge path creation.
I am not sure if there is any use of copying the path target unless
you want to modify it , but anyway we use the check similar to what is
used in patch in the multiple places in code. See
create_ordered_paths. So, we need to change all those places first if
we sense any such danger.

Even if the test fails and we add a projection path here, while
creating the plan we avoid adding a Result node when the projection
target and underlying plan's target look same
(create_projection_plan()), so this works. An advantage with this
simple check (although it miscosts the projection) is that we don't do
expensive target equality check for every path. The expensive check
happens only on the chosen path.

deallocate tenk1_count;
+explain (costs off) select ten, costly_func(ten) from tenk1;

verbose output will show that the parallel seq scan's targetlist has
costly_func() in it. Isn't that what we want to test here?

Not exactly, we want to just test whether the parallel plan is
selected when the costly function is used in the target list.

Parallel plan may be selected whether or not costly function exists in
the targetlist, if the underlying scan is optimal with parallel scan.
AFAIU, this patch is about pushing down the costly functions into the
parllel scan's targetlist. I think that can be verified only by
looking at the targetlist of parallel seq scan plan.

The solution here addresses only parallel scan requirement. In future
if we implement a solution which also addresses requirements of FDW
and custom plan (i.e. ability to handle targetlists by FDW and custom
plan), the changes made here will need to be reverted. That would be a
painful exercsize.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

#75

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#65)

5 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Wed, Feb 14, 2018 at 5:37 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Your concern is valid, but isn't the same problem exists in another
approach as well, because in that also we can call
adjust_paths_for_srfs after generating gather path which means that we
might lose some opportunity to reduce the relative cost of parallel
paths due to tlists having SRFs. Also, a similar problem can happen
in create_order_paths for the cases as described in the example
above.

You're right. I think cleaning all of this up for v11 is too much to
consider, but please tell me your opinion of the attached patch set.
Here, instead of the ripping the problematic logic out of
apply_projection_to_path, what I've done is just removed several of
the callers to apply_projection_to_path. I think that the work of
removing other callers to that function could be postponed to a future
release, but we'd still get some benefit now, and this shows the
direction I have in mind. I'm going to explain what the patches do
one by one, but out of order, because I backed into the need for the
earlier patches as a result of troubleshooting the later ones in the
series. Note that the patches need to be applied in order even though
I'm explaining them out of order.

0003 introduces a new upper relation to represent the result of
applying the scan/join target to the topmost scan/join relation. I'll
explain further down why this seems to be needed. Since each path
must belong to only one relation, this involves using
create_projection_path() for the non-partial pathlist as we already do
for the partial pathlist, rather than apply_projection_to_path().
This is probably marginally more expensive but I'm hoping it doesn't
matter. (However, I have not tested.) Each non-partial path in the
topmost scan/join rel produces a corresponding path in the new upper
rel. The same is done for partial paths if the scan/join target is
parallel-safe; otherwise we can't.

0004 causes the main part of the planner to skip calling
generate_gather_paths() from the topmost scan/join rel. This logic is
mostly stolen from your patch. If the scan/join target is NOT
parallel-safe, we perform generate_gather_paths() on the topmost
scan/join rel. If the scan/join target IS parallel-safe, we do it on
the post-projection rel introduced by 0003 instead. This is the patch
that actually fixes Jeff's original complaint.

0005 goes a bit further and removes a bunch of logic from
create_ordered_paths(). The removed logic tries to satisfy the
query's ordering requirement via cheapest_partial_path + Sort + Gather
Merge. Instead, it adds an additional path to the new upper rel added
in 0003. This again avoids a call to apply_projection_to_path() which
could cause projection to be pushed down after costing has already
been fixed. Therefore, it gains the same advantages as 0004 for
queries that would this sort of plan. Currently, this loses the
ability to set limit_tuples for the Sort path; that doesn't look too
hard to fix but I haven't done it. If we decide to proceed with this
overall approach I'll see about getting it sorted out.

Unfortunately, when I initially tried this approach, a number of
things broke due to the fact that create_projection_path() was not
exactly equivalent to apply_projection_to_path(). This initially
surprised me, because create_projection_plan() contains logic to
eliminate the Result node that is very similar to the logic in
apply_projection_to_path(). If apply_projection_path() sees that the
subordinate node is projection-capable, it applies the revised target
list to the child; if not, it adds a Result node.
create_projection_plan() does the same thing. However,
create_projection_plan() can lose the physical tlist optimization for
the subordinate node; it forces an exact projection even if the parent
node doesn't require this. 0001 fixes this, so that we get the same
plans with create_projection_path() that we would with
apply_projection_to_path(). I think this patch is a good idea
regardless of what we decide to do about the rest of this, because it
can potentially avoid losing the physical-tlist optimization in any
place where create_projection_path() is used.

It turns out that 0001 doesn't manage to preserve the physical-tlist
optimization when the projection path is attached to an upper
relation. 0002 fixes this.

If we decide to go forward with this approach, it may makes sense to
merge some of these when actually committing, but I found it useful to
separate them for development and testing purposes, and for clarity
about what was getting changed at each stage. Please let me know your
thoughts.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0005-Remove-explicit-path-construction-logic-in-create_or.patchapplication/octet-stream; name=0005-Remove-explicit-path-construction-logic-in-create_or.patchDownload

From 11be7e7664e1d3d75d7aa12b3f5775f93ea7e462 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 13 Mar 2018 13:42:56 -0400
Subject: [PATCH 5/5] Remove explicit path construction logic in
 create_ordered_paths.

Instead of having create_ordered_paths build a path for an explicit
Sort and Gather Merge of the cheapest partial path, add an
explicitly-sorted path to tlist_rel.  If this path looks advantageous
from a cost point of view, the call to generate_gather_paths() for
tlist_rel will take care of building a Gather Merge path for it.

This improves the accuracy of costing for such paths because it
avoids using apply_projection_to_path, which has the disadvantage
of modifying a path in place after the cost has already been
determined.

Patch by me.
---
 src/backend/optimizer/plan/planner.c | 74 +++++++++++-------------------------
 1 file changed, 22 insertions(+), 52 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 9d33306783..f85011beef 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1958,8 +1958,10 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		 * If we can produce partial paths for the tlist rel, for possible use
 		 * by upper planning stages, do so.
 		 */
-		if (tlist_rel->consider_parallel)
+		if (tlist_rel->consider_parallel && current_rel->partial_pathlist)
 		{
+			Path	   *cheapest_partial_path;
+
 			/* Apply the scan/join target to each partial path */
 			foreach(lc, current_rel->partial_pathlist)
 			{
@@ -1975,6 +1977,25 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 														  scanjoin_target);
 				add_partial_path(tlist_rel, newpath);
 			}
+
+			/* Also try explicitly sorting the cheapest path. */
+			cheapest_partial_path = linitial(current_rel->partial_pathlist);
+			if (!pathkeys_contained_in(root->query_pathkeys,
+									   cheapest_partial_path->pathkeys))
+			{
+				Path	   *path;
+
+				path = (Path *) create_projection_path(root,
+													   tlist_rel,
+													   cheapest_partial_path,
+													   scanjoin_target);
+				path = (Path *) create_sort_path(root,
+												 tlist_rel,
+												 path,
+												 root->query_pathkeys,
+												 -1);
+				add_partial_path(tlist_rel, path);
+			}
 		}
 
 		/* Now fix things up if scan/join target contains SRFs */
@@ -4706,57 +4727,6 @@ create_ordered_paths(PlannerInfo *root,
 		}
 	}
 
-	/*
-	 * generate_gather_paths() will have already generated a simple Gather
-	 * path for the best parallel path, if any, and the loop above will have
-	 * considered sorting it.  Similarly, generate_gather_paths() will also
-	 * have generated order-preserving Gather Merge plans which can be used
-	 * without sorting if they happen to match the sort_pathkeys, and the loop
-	 * above will have handled those as well.  However, there's one more
-	 * possibility: it may make sense to sort the cheapest partial path
-	 * according to the required output order and then use Gather Merge.
-	 */
-	if (ordered_rel->consider_parallel && root->sort_pathkeys != NIL &&
-		input_rel->partial_pathlist != NIL)
-	{
-		Path	   *cheapest_partial_path;
-
-		cheapest_partial_path = linitial(input_rel->partial_pathlist);
-
-		/*
-		 * If cheapest partial path doesn't need a sort, this is redundant
-		 * with what's already been tried.
-		 */
-		if (!pathkeys_contained_in(root->sort_pathkeys,
-								   cheapest_partial_path->pathkeys))
-		{
-			Path	   *path;
-			double		total_groups;
-
-			path = (Path *) create_sort_path(root,
-											 ordered_rel,
-											 cheapest_partial_path,
-											 root->sort_pathkeys,
-											 limit_tuples);
-
-			total_groups = cheapest_partial_path->rows *
-				cheapest_partial_path->parallel_workers;
-			path = (Path *)
-				create_gather_merge_path(root, ordered_rel,
-										 path,
-										 path->pathtarget,
-										 root->sort_pathkeys, NULL,
-										 &total_groups);
-
-			/* Add projection step if needed */
-			if (path->pathtarget != target)
-				path = apply_projection_to_path(root, ordered_rel,
-												path, target);
-
-			add_path(ordered_rel, path);
-		}
-	}
-
 	/*
 	 * If there is an FDW that's responsible for all baserels of the query,
 	 * let it consider adding ForeignPaths.
-- 
2.14.3 (Apple Git-98)

0004-Postpone-generate_gather_paths-for-topmost-scan-join.patchapplication/octet-stream; name=0004-Postpone-generate_gather_paths-for-topmost-scan-join.patchDownload

From b2e1d7bcdc3bd6ef67cf2426a42c3fb96559ec97 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 16:45:15 -0400
Subject: [PATCH 4/5] Postpone generate_gather_paths for topmost scan/join rel.

Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths.  If the scan/join target
is parallel-safe, we actually skip this for the topmost scan/join rel
altogether and instead do it for the tlist_rel, so that the
projection is done in the worker and costs are computed accordingly.

Amit Kapila and Robert Haas
---
 src/backend/optimizer/geqo/geqo_eval.c | 21 ++++++++++++++-------
 src/backend/optimizer/path/allpaths.c  | 26 +++++++++++++++++++-------
 src/backend/optimizer/plan/planner.c   | 16 ++++++++++++++++
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 0be2a73e05..3ef7d7d8aa 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partitionwise joins. */
 				generate_partitionwise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel, false);
+				/*
+				 * Except for the topmost scan/join rel, consider gathering
+				 * partial paths.  We'll do the same for the topmost scan/join
+				 * rel once we know the final targetlist (see
+				 * grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, false);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 1c792a00eb..9a4d995d25 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -481,13 +481,20 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers.  Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan/join rel (that is, the only baserel),
+	 * we postpone this until the final scan/join targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
 		generate_gather_paths(root, rel, false);
 
 	/*
@@ -2668,8 +2675,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partitionwise joins. */
 			generate_partitionwise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel, false);
+			/*
+			 * Except for the topmost scan/join rel, consider gathering
+			 * partial paths.  We'll do the same for the topmost scan/join rel
+			 * once we know the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 7532990548..9d33306783 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1884,6 +1884,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		tlist_rel = fetch_upper_rel(root, UPPERREL_TLIST, NULL);
 		tlist_rel->consider_parallel = current_rel->consider_parallel &&
 			scanjoin_target_parallel_safe;
+		tlist_rel->reltarget = scanjoin_target;
+		tlist_rel->rows = current_rel->rows;
 
 		/*
 		 * If there are any SRFs in the targetlist, we must separate each of
@@ -1927,6 +1929,16 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
 		}
 
+		/*
+		 * If the final scan/join target is not parallel-safe, we must
+		 * generate Gather paths now, since no partial paths will be generated
+		 * for tlist_rel.  Otherwise, the paths generated from tlist_rel will
+		 * be superior to these in that projection will be done in by each
+		 * participant rather than only in the leader, so skip this for now.
+		 */
+		if (!scanjoin_target_parallel_safe)
+			generate_gather_paths(root, current_rel, false);
+
 		/*
 		 * Apply SRF-free scan/join target to all Paths for the scanjoin rel
 		 * to produce paths for the tlist rel.
@@ -1971,6 +1983,10 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 								  scanjoin_targets,
 								  scanjoin_targets_contain_srfs);
 
+		/* Generate Gather and Gather Merge paths, if appropriate. */
+		if (tlist_rel->consider_parallel)
+			generate_gather_paths(root, tlist_rel, false);
+
 		/* Now consider the tlist_rel to be the current upper relation. */
 		set_cheapest(tlist_rel);
 		current_rel = tlist_rel;
-- 
2.14.3 (Apple Git-98)

0003-Add-new-upper-rel-to-represent-projecting-toplevel-s.patchapplication/octet-stream; name=0003-Add-new-upper-rel-to-represent-projecting-toplevel-s.patchDownload

From 370897446063682e7a51b5585a3604bd27997ecf Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 13:16:30 -0400
Subject: [PATCH 3/5] Add new upper rel to represent projecting toplevel
 scan/join rel.

UPPERREL_TLIST represents the result of applying the scan/join target
to the final scan/join relation.  This requires us to use
create_projection_path() rather than apply_projection_to_path()
when projection non-partial paths from the scan/join relation, but
it also enables us to avoid needing to modify those paths in place.

It also avoids the need to clear the topmost scan/join rel's
partial_pathlist when the scan/join target is not parallel-safe, which
is sort of hack; see commit 3bf05e096b9f8375e640c5d7996aa57efd7f240c
for an example of a previous fix that eliminated a similar hack.

Patch by me.
---
 src/backend/optimizer/plan/planner.c | 69 +++++++++++++-----------------------
 src/include/nodes/relation.h         |  1 +
 2 files changed, 25 insertions(+), 45 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 24e6c46396..7532990548 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1694,6 +1694,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		List	   *scanjoin_targets_contain_srfs;
 		bool		scanjoin_target_parallel_safe;
 		bool		have_grouping;
+		RelOptInfo *tlist_rel;
 		AggClauseCosts agg_costs;
 		WindowFuncLists *wflists = NULL;
 		List	   *activeWindows = NIL;
@@ -1876,6 +1877,14 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_target_parallel_safe = grouping_target_parallel_safe;
 		}
 
+		/*
+		 * Create a new upper relation to represent the result of scan/join
+		 * projection.
+		 */
+		tlist_rel = fetch_upper_rel(root, UPPERREL_TLIST, NULL);
+		tlist_rel->consider_parallel = current_rel->consider_parallel &&
+			scanjoin_target_parallel_safe;
+
 		/*
 		 * If there are any SRFs in the targetlist, we must separate each of
 		 * these PathTargets into SRF-computing and SRF-free targets.  Replace
@@ -1919,15 +1928,8 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 
 		/*
-		 * Forcibly apply SRF-free scan/join target to all the Paths for the
-		 * scan/join rel.
-		 *
-		 * In principle we should re-run set_cheapest() here to identify the
-		 * cheapest path, but it seems unlikely that adding the same tlist
-		 * eval costs to all the paths would change that, so we don't bother.
-		 * Instead, just assume that the cheapest-startup and cheapest-total
-		 * paths remain so.  (There should be no parameterized paths anymore,
-		 * so we needn't worry about updating cheapest_parameterized_paths.)
+		 * Apply SRF-free scan/join target to all Paths for the scanjoin rel
+		 * to produce paths for the tlist rel.
 		 */
 		foreach(lc, current_rel->pathlist)
 		{
@@ -1935,28 +1937,16 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			Path	   *path;
 
 			Assert(subpath->param_info == NULL);
-			path = apply_projection_to_path(root, current_rel,
-											subpath, scanjoin_target);
-			/* If we had to add a Result, path is different from subpath */
-			if (path != subpath)
-			{
-				lfirst(lc) = path;
-				if (subpath == current_rel->cheapest_startup_path)
-					current_rel->cheapest_startup_path = path;
-				if (subpath == current_rel->cheapest_total_path)
-					current_rel->cheapest_total_path = path;
-			}
+			path = (Path *) create_projection_path(root, tlist_rel,
+												   subpath, scanjoin_target);
+			add_path(tlist_rel, path);
 		}
 
 		/*
-		 * Upper planning steps which make use of the top scan/join rel's
-		 * partial pathlist will expect partial paths for that rel to produce
-		 * the same output as complete paths ... and we just changed the
-		 * output for the complete paths, so we'll need to do the same thing
-		 * for partial paths.  But only parallel-safe expressions can be
-		 * computed by partial paths.
+		 * If we can produce partial paths for the tlist rel, for possible use
+		 * by upper planning stages, do so.
 		 */
-		if (current_rel->partial_pathlist && scanjoin_target_parallel_safe)
+		if (tlist_rel->consider_parallel)
 		{
 			/* Apply the scan/join target to each partial path */
 			foreach(lc, current_rel->partial_pathlist)
@@ -1967,35 +1957,24 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 				/* Shouldn't have any parameterized paths anymore */
 				Assert(subpath->param_info == NULL);
 
-				/*
-				 * Don't use apply_projection_to_path() here, because there
-				 * could be other pointers to these paths, and therefore we
-				 * mustn't modify them in place.
-				 */
 				newpath = (Path *) create_projection_path(root,
-														  current_rel,
+														  tlist_rel,
 														  subpath,
 														  scanjoin_target);
-				lfirst(lc) = newpath;
+				add_partial_path(tlist_rel, newpath);
 			}
 		}
-		else
-		{
-			/*
-			 * In the unfortunate event that scanjoin_target is not
-			 * parallel-safe, we can't apply it to the partial paths; in that
-			 * case, we'll need to forget about the partial paths, which
-			 * aren't valid input for upper planning steps.
-			 */
-			current_rel->partial_pathlist = NIL;
-		}
 
 		/* Now fix things up if scan/join target contains SRFs */
 		if (parse->hasTargetSRFs)
-			adjust_paths_for_srfs(root, current_rel,
+			adjust_paths_for_srfs(root, tlist_rel,
 								  scanjoin_targets,
 								  scanjoin_targets_contain_srfs);
 
+		/* Now consider the tlist_rel to be the current upper relation. */
+		set_cheapest(tlist_rel);
+		current_rel = tlist_rel;
+
 		/*
 		 * Save the various upper-rel PathTargets we just computed into
 		 * root->upper_targets[].  The core code doesn't use this, but it
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index d576aa7350..8d873bdee1 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,7 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_TLIST,				/* result of projecting final scan/join rel */
 	UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
 								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
-- 
2.14.3 (Apple Git-98)

0002-Adjust-use_physical_tlist-to-work-on-upper-rels.patchapplication/octet-stream; name=0002-Adjust-use_physical_tlist-to-work-on-upper-rels.patchDownload

From 7633676bd346452a5106dec9f6df0c2feefad385 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 13:52:01 -0400
Subject: [PATCH 2/5] Adjust use_physical_tlist to work on upper rels.

Instead of testing for the inheritance case by checking specifically
for RELOPT_BASEREL, use IS_OTHER_REL().  This requires a small
adjustment later in the function: upper rels won't have attr_neeeded
set, so just skip that test if the information is not present.

Patch by me.
---
 src/backend/optimizer/plan/createplan.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index a2bf3b77fe..73a06f32d3 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -805,7 +805,7 @@ use_physical_tlist(PlannerInfo *root, Path *path, int flags)
 	 * doesn't project; this test may be unnecessary now that
 	 * create_append_plan instructs its children to return an exact tlist).
 	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
+	if (IS_OTHER_REL(rel))
 		return false;
 
 	/*
@@ -831,10 +831,13 @@ use_physical_tlist(PlannerInfo *root, Path *path, int flags)
 	 * (This could possibly be fixed but would take some fragile assumptions
 	 * in setrefs.c, I think.)
 	 */
-	for (i = rel->min_attr; i <= 0; i++)
+	if (rel->attr_needed)
 	{
-		if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
-			return false;
+		for (i = rel->min_attr; i <= 0; i++)
+		{
+			if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
+				return false;
+		}
 	}
 
 	/*
-- 
2.14.3 (Apple Git-98)

0001-Teach-create_projection_plan-to-omit-projection-wher.patchapplication/octet-stream; name=0001-Teach-create_projection_plan-to-omit-projection-wher.patchDownload

From 5f2adfdb4bab07c2e94a99b047dfed64e22f9416 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 12:36:57 -0400
Subject: [PATCH 1/5] Teach create_projection_plan to omit projection where
 possible.

We sometimes insert a ProjectionPath into a plan tree when it isn't
actually needed.  The existing code already provides for the case
where the ProjectionPath's subpath can perform the projection itself
instead of needing a Result node to do it, but previously it didn't
consider the possibility that the parent node might not actually
require the projection.  This optimization also allows the "physical
tlist" optimization to be preserved in some cases where it would not
otherwise happen.

Patch by me.
---
 src/backend/optimizer/plan/createplan.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 9ae1bf31d5..a2bf3b77fe 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -87,7 +87,9 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 				   int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
-static Plan *create_projection_plan(PlannerInfo *root, ProjectionPath *best_path);
+static Plan *create_projection_plan(PlannerInfo *root,
+					   ProjectionPath *best_path,
+					   int flags);
 static Plan *inject_projection_plan(Plan *subplan, List *tlist, bool parallel_safe);
 static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags);
 static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
@@ -400,7 +402,8 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 			if (IsA(best_path, ProjectionPath))
 			{
 				plan = create_projection_plan(root,
-											  (ProjectionPath *) best_path);
+											  (ProjectionPath *) best_path,
+											  flags);
 			}
 			else if (IsA(best_path, MinMaxAggPath))
 			{
@@ -1567,7 +1570,7 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
  *	  but sometimes we can just let the subplan do the work.
  */
 static Plan *
-create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
+create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 {
 	Plan	   *plan;
 	Plan	   *subplan;
@@ -1576,7 +1579,22 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	/* Since we intend to project, we don't need to constrain child tlist */
 	subplan = create_plan_recurse(root, best_path->subpath, 0);
 
-	tlist = build_path_tlist(root, &best_path->path);
+	/*
+	 * If our caller doesn't really care what tlist we return, then we might
+	 * not really need to project.  If use_physical_tlist returns false, then
+	 * we're obliged to project.  If it returns true, we can skip actually
+	 * projecting but must still correctly label the input path's tlist with
+	 * the sortgroupref information if the caller has so requested.
+	 */
+	if (!use_physical_tlist(root, &best_path->path, flags))
+		tlist = build_path_tlist(root, &best_path->path);
+	else if ((flags & CP_LABEL_TLIST) != 0)
+	{
+		tlist = copyObject(subplan->targetlist);
+		apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+	}
+	else
+		return subplan;
 
 	/*
 	 * We might not really need a Result node here, either because the subplan
-- 
2.14.3 (Apple Git-98)

#76

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#75)

Re: [HACKERS] why not parallel seq scan for slow functions

On Wed, Mar 14, 2018 at 12:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

0003 introduces a new upper relation to represent the result of
applying the scan/join target to the topmost scan/join relation. I'll
explain further down why this seems to be needed. Since each path
must belong to only one relation, this involves using
create_projection_path() for the non-partial pathlist as we already do
for the partial pathlist, rather than apply_projection_to_path().
This is probably marginally more expensive but I'm hoping it doesn't
matter. (However, I have not tested.)

I think in the patch series this is the questionable patch wherein it
will always add an additional projection path (whether or not it is
required) to all Paths (partial and non-partial) for the scanjoin rel
and then later remove it (if not required) in create_projection_plan.
As you are saying, I also think it might not matter much in the grand
scheme of things and if required we can test it as well, however, I
think it is better if some other people can also share their opinion
on this matter.

Tom, do you have anything to say?

Each non-partial path in the

topmost scan/join rel produces a corresponding path in the new upper
rel. The same is done for partial paths if the scan/join target is
parallel-safe; otherwise we can't.

0004 causes the main part of the planner to skip calling
generate_gather_paths() from the topmost scan/join rel. This logic is
mostly stolen from your patch. If the scan/join target is NOT
parallel-safe, we perform generate_gather_paths() on the topmost
scan/join rel. If the scan/join target IS parallel-safe, we do it on
the post-projection rel introduced by 0003 instead. This is the patch
that actually fixes Jeff's original complaint.

Looks good, I feel you can include the test from my patch as well.

0005 goes a bit further and removes a bunch of logic from
create_ordered_paths(). The removed logic tries to satisfy the
query's ordering requirement via cheapest_partial_path + Sort + Gather
Merge. Instead, it adds an additional path to the new upper rel added
in 0003. This again avoids a call to apply_projection_to_path() which
could cause projection to be pushed down after costing has already
been fixed. Therefore, it gains the same advantages as 0004 for
queries that would this sort of plan.

After this patch, part of the sorts related work will be done in
create_ordered_paths and the other in grouping_planner, which looks
bit odd, otherwise, I don't see any problem with it.

Currently, this loses the
ability to set limit_tuples for the Sort path; that doesn't look too
hard to fix but I haven't done it. If we decide to proceed with this
overall approach I'll see about getting it sorted out.

Okay, that makes sense.

Unfortunately, when I initially tried this approach, a number of
things broke due to the fact that create_projection_path() was not
exactly equivalent to apply_projection_to_path(). This initially
surprised me, because create_projection_plan() contains logic to
eliminate the Result node that is very similar to the logic in
apply_projection_to_path(). If apply_projection_path() sees that the
subordinate node is projection-capable, it applies the revised target
list to the child; if not, it adds a Result node.
create_projection_plan() does the same thing. However,
create_projection_plan() can lose the physical tlist optimization for
the subordinate node; it forces an exact projection even if the parent
node doesn't require this. 0001 fixes this, so that we get the same
plans with create_projection_path() that we would with
apply_projection_to_path(). I think this patch is a good idea
regardless of what we decide to do about the rest of this, because it
can potentially avoid losing the physical-tlist optimization in any
place where create_projection_path() is used.

It turns out that 0001 doesn't manage to preserve the physical-tlist
optimization when the projection path is attached to an upper
relation. 0002 fixes this.

I have done some basic verification of 0001 and 0002, will do further
review/tests, if I don't see any objection from anyone else about the
overall approach.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#77

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#76)

Re: [HACKERS] why not parallel seq scan for slow functions

On Fri, Mar 16, 2018 at 6:06 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think in the patch series this is the questionable patch wherein it
will always add an additional projection path (whether or not it is
required) to all Paths (partial and non-partial) for the scanjoin rel
and then later remove it (if not required) in create_projection_plan.
As you are saying, I also think it might not matter much in the grand
scheme of things and if required we can test it as well, however, I
think it is better if some other people can also share their opinion
on this matter.

Tom, do you have anything to say?

I forgot to include part of the explanation in my previous email. The
reason it has to work this way is that, of course, you can't include
the same path in the path list of relation B as you put into the path
of relation A; if you do, then you will be in trouble if a later
addition to the path list of relation B kicks that path out, because
it will get pfree'd, leaving a garbage pointer in the list for A,
which you may subsequently referenced.

At one point, I tried to solve the problem by sorting the cheapest
partial path from the scan/join rel and putting it back into the same
pathlist, but that also fails: in some cases, the new path dominates
all of the existing paths because it is better-sorted and only very
slightly more expensive. So, when you call add_partial_path() for the
sort path, all of the previous paths -- including the one the sort
path is pointing to as its subpath! -- get pfree'd.

So without another relation, and the projection paths, I couldn't get
it to where I didn't have to modify paths after creating them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#78

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#76)

1 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Fri, Mar 16, 2018 at 3:36 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Mar 14, 2018 at 12:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

0003 introduces a new upper relation to represent the result of
applying the scan/join target to the topmost scan/join relation. I'll
explain further down why this seems to be needed. Since each path
must belong to only one relation, this involves using
create_projection_path() for the non-partial pathlist as we already do
for the partial pathlist, rather than apply_projection_to_path().
This is probably marginally more expensive but I'm hoping it doesn't
matter. (However, I have not tested.)

I think in the patch series this is the questionable patch wherein it
will always add an additional projection path (whether or not it is
required) to all Paths (partial and non-partial) for the scanjoin rel
and then later remove it (if not required) in create_projection_plan.
As you are saying, I also think it might not matter much in the grand
scheme of things and if required we can test it as well,

I have done some tests to see the impact of this patch on planning
time. I took some simple statements and tried to compute the time
they took in planning.

Test-1
----------
DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain Select ten from tenk1';
END LOOP;
END;
$$;

In the above block, I am explaining the simple statement which will
have just one path, so there will be one additional path projection
and removal cycle for this statement. I have just executed the above
block in psql by having \timing option 'on' and the average timing for
ten runs on HEAD is 21292.388 ms, with patches (0001.* ~ 0003) is
22405.2466 ms and with patches (0001.* ~ 0005.*) is 22537.1362. These
results indicate that there is approximately 5~6% of the increase in
planning time.

Test-2
----------
DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain select hundred,ten from tenk1 order by hundred';
END LOOP;
END;
$$;

In the above block, I am explaining the statement which will have two
paths, so there will be two additional path projections and one
removal cycle for one of the selected paths for this statement. The
average timing for ten runs on HEAD is 32869.8343 ms, with patches
(0001.* ~ 0003) is 34068.0608 ms and with patches (0001.* ~ 0005.*) is
34097.4899 ms. These results indicate that there is approximately
3~4% of the increase in optimizer time. Now, ideally, this test
should have shown more impact as we are adding additional projection
path for two paths, but I think as the overall time for planning is
higher, the impact of additional work is not much visible.

I have done these tests on the Centos VM, so there is some variation
in test results. Please find attached the detailed results of all the
tests. I have not changed any configuration for these tests. I think
before reaching any conclusion, it would be better if someone repeats
these tests and see if they also have a similar observation. The
reason for doing the tests separately for first three patches (0001.*
~ 0003.*) is to see the impact of changes without any change related
to parallelism.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#79

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#78)

Re: [HACKERS] why not parallel seq scan for slow functions

On Sat, Mar 17, 2018 at 1:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Test-1
----------
DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain Select ten from tenk1';
END LOOP;
END;
$$;

In the above block, I am explaining the simple statement which will
have just one path, so there will be one additional path projection
and removal cycle for this statement. I have just executed the above
block in psql by having \timing option 'on' and the average timing for
ten runs on HEAD is 21292.388 ms, with patches (0001.* ~ 0003) is
22405.2466 ms and with patches (0001.* ~ 0005.*) is 22537.1362. These
results indicate that there is approximately 5~6% of the increase in
planning time.

Ugh. I'm able to reproduce this, more or less -- with master, this
test took 42089.484 ms, 41935.849 ms, 42519.336 ms on my laptop, but
with 0001-0003 applied, 43925.959 ms, 43619.004 ms, 43648.426 ms.
However I have a feeling there's more going on here, because the
following patch on top of 0001-0003 made the time go back down to
42353.548, 41797.757 ms, 41891.194 ms.

diff --git a/src/backend/optimizer/plan/planner.c
b/src/backend/optimizer/plan/planner.c
index bf0b3e75ea..0542b3e40b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1947,12 +1947,19 @@ grouping_planner(PlannerInfo *root, bool
inheritance_update,
         {
             Path       *subpath = (Path *) lfirst(lc);
             Path       *path;
+            Path       *path2;

             Assert(subpath->param_info == NULL);
-            path = (Path *) create_projection_path(root, tlist_rel,
+            path2 = (Path *) create_projection_path(root, tlist_rel,
                                                    subpath, scanjoin_target);
-            add_path(tlist_rel, path);
+            path = (Path *) apply_projection_to_path(root, tlist_rel,
+                                                   subpath, scanjoin_target);
+            if (path == path2)
+                elog(ERROR, "oops");
+            lfirst(lc) = path;
         }
+        tlist_rel->pathlist = current_rel->pathlist;
+        current_rel->pathlist = NIL;

/*
* If we can produce partial paths for the tlist rel, for possible use

It seems pretty obvious that creating an extra projection path that is
just thrown away can't "really" be making this faster, so there's
evidently some other effect here involving how the code is laid out,
or CPU cache effects, or, uh, something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#80

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#79)

Re: [HACKERS] why not parallel seq scan for slow functions

On Tue, Mar 20, 2018 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Mar 17, 2018 at 1:16 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Test-1
----------
DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain Select ten from tenk1';
END LOOP;
END;
$$;

In the above block, I am explaining the simple statement which will
have just one path, so there will be one additional path projection
and removal cycle for this statement. I have just executed the above
block in psql by having \timing option 'on' and the average timing for
ten runs on HEAD is 21292.388 ms, with patches (0001.* ~ 0003) is
22405.2466 ms and with patches (0001.* ~ 0005.*) is 22537.1362. These
results indicate that there is approximately 5~6% of the increase in
planning time.

Ugh. I'm able to reproduce this, more or less -- with master, this
test took 42089.484 ms, 41935.849 ms, 42519.336 ms on my laptop, but
with 0001-0003 applied, 43925.959 ms, 43619.004 ms, 43648.426 ms.
However I have a feeling there's more going on here, because the
following patch on top of 0001-0003 made the time go back down to
42353.548, 41797.757 ms, 41891.194 ms.

It seems pretty obvious that creating an extra projection path that is
just thrown away can't "really" be making this faster, so there's
evidently some other effect here involving how the code is laid out,
or CPU cache effects, or, uh, something.

Yeah, sometimes that kind of stuff change performance characteristics,
but I think what is going on here is that create_projection_plan is
causing the lower node to build physical tlist which takes some
additional time. I have tried below change on top of the patch series
and it brings back the performance for me.

@@ -1580,7 +1580,7 @@ create_projection_plan(PlannerInfo *root,
ProjectionPath *best_path, int flags)
List *tlist;

        /* Since we intend to project, we don't need to constrain child tlist */
-       subplan = create_plan_recurse(root, best_path->subpath, 0);
+       subplan = create_plan_recurse(root, best_path->subpath, flags);

Another point I have noticed in
0001-Teach-create_projection_plan-to-omit-projection-wher patch:

-create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
+create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
{
..
+ else if ((flags & CP_LABEL_TLIST) != 0)
+ {
+ tlist = copyObject(subplan->targetlist);
+ apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+ }
+ else
+ return subplan;
..
}

Before returning subplan, don't we need to copy the cost estimates
from best_path as is done in the same function after few lines.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#81

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#80)

7 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Fri, Mar 23, 2018 at 12:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, sometimes that kind of stuff change performance characteristics,
but I think what is going on here is that create_projection_plan is
causing the lower node to build physical tlist which takes some
additional time. I have tried below change on top of the patch series
and it brings back the performance for me.

I tried another approach inspired by this, which is to altogether skip
building the child scan tlist if it will just be replaced. See 0006.
In testing here, that seems to be a bit better than your proposal, but
I wonder what your results will be.

Before returning subplan, don't we need to copy the cost estimates
from best_path as is done in the same function after few lines.

The new 0006 takes care of this, too. Really, the new 0006 should be
merged into 0001, but I kept it separate for now.

So, rebased:

0001 - More or less as before.

0002 - More or less as before.

0003 - Rewritten in the wake of partitionwise aggregate, as the
tlist_rel must be partitioned in order for partitionwise aggregate to
work. Quite pleasingly, this eliminates a bunch of Result nodes from
the partitionwise join test results. Overall, I find this quite a bit
cleaner than the present code (leaving performance aside for the
moment). In the process of writing this I realized that the
partitionwise aggregate code doesn't look like it handles SRFs
properly, so if this doesn't get committed we'll have to fix that
problem some other way.

0004 - A little different than before as a result of the extensive
changes in 0003.

0005 - Also different, and revealing another defect in partitionwise
aggregate, as noted in the commit message.

0006 - Introduce CP_IGNORE_TLIST; optimization of 0001.

0007 - Use NULL relids set for the toplevel tlist upper rel. This
seems to be slightly faster than the other way. This is an
optimization of 0003.

It looks in my testing like this still underperforms master on your
test case. Do you get the same result? Any other ideas?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0007-Use-NULL-relids-set.patchapplication/octet-stream; name=0007-Use-NULL-relids-set.patchDownload

From e05b2daf4bd912d01ce2643dd4eec4874df8d34a Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Mar 2018 23:01:54 -0400
Subject: [PATCH 7/7] Use NULL relids set.

---
 src/backend/optimizer/plan/planner.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 90eddf73fd..12375e2cf2 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -6772,9 +6772,13 @@ create_tlist_paths(PlannerInfo *root,
 	 * Create a new upper relation to represent the result of scan/join
 	 * projection.
 	 */
-	tlist_rel = fetch_upper_rel(root, UPPERREL_TLIST, input_rel->relids);
-	if (is_other_rel)
+	if (!is_other_rel)
+		tlist_rel = fetch_upper_rel(root, UPPERREL_TLIST, NULL);
+	else
+	{
+		tlist_rel = fetch_upper_rel(root, UPPERREL_TLIST, input_rel->relids);
 		tlist_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
+	}
 	tlist_rel->rows = input_rel->rows;
 	tlist_rel->reltarget = llast_node(PathTarget, scanjoin_targets);
 	tlist_rel->consider_parallel = input_rel->consider_parallel &&
-- 
2.14.3 (Apple Git-98)

0006-CP_IGNORE_TLIST.patchapplication/octet-stream; name=0006-CP_IGNORE_TLIST.patchDownload

From ea9145c4b9faddb9fdc473d7d4d7f7ca1505def2 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Mar 2018 21:32:36 -0400
Subject: [PATCH 6/7] CP_IGNORE_TLIST.

---
 src/backend/optimizer/plan/createplan.c | 98 +++++++++++++++++++++++----------
 1 file changed, 68 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index b5107988d6..1fe936ee09 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -62,10 +62,14 @@
  * any sortgrouprefs specified in its pathtarget, with appropriate
  * ressortgroupref labels.  This is passed down by parent nodes such as Sort
  * and Group, which need these values to be available in their inputs.
+ *
+ * CP_IGNORE_TLIST specifies that the caller plans to replace the targetlist,
+ * and therefore it doens't matter a bit what target list gets generated.
  */
 #define CP_EXACT_TLIST		0x0001	/* Plan must return specified tlist */
 #define CP_SMALL_TLIST		0x0002	/* Prefer narrower tlists */
 #define CP_LABEL_TLIST		0x0004	/* tlist must contain sortgrouprefs */
+#define CP_IGNORE_TLIST		0x0008	/* caller will replace tlist */
 
 
 static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path,
@@ -566,8 +570,16 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 	 * only those Vars actually needed by the query), we prefer to generate a
 	 * tlist containing all Vars in order.  This will allow the executor to
 	 * optimize away projection of the table tuples, if possible.
+	 *
+	 * But if the caller is going to ignore our tlist anyway, then don't
+	 * bother generating one at all.  We use an exact equality test here,
+	 * so that this only applies when CP_IGNORE_TLIST is the only flag set.
 	 */
-	if (use_physical_tlist(root, best_path, flags))
+	if (flags == CP_IGNORE_TLIST)
+	{
+		tlist = NULL;
+	}
+	else if (use_physical_tlist(root, best_path, flags))
 	{
 		if (best_path->pathtype == T_IndexOnlyScan)
 		{
@@ -1578,44 +1590,70 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 	Plan	   *plan;
 	Plan	   *subplan;
 	List	   *tlist;
-
-	/* Since we intend to project, we don't need to constrain child tlist */
-	subplan = create_plan_recurse(root, best_path->subpath, 0);
+	bool		needs_result_node = false;
 
 	/*
-	 * If our caller doesn't really care what tlist we return, then we might
-	 * not really need to project.  If use_physical_tlist returns false, then
-	 * we're obliged to project.  If it returns true, we can skip actually
-	 * projecting but must still correctly label the input path's tlist with
-	 * the sortgroupref information if the caller has so requested.
+	 * Convert our subpath to a Plan and determine whether we need a Result
+	 * node.
+	 *
+	 * In most cases where we don't need to project, creation_projection_path
+	 * will have set dummypp, but not always.  First, some createplan.c
+	 * routines change the tlists of their nodes.  (An example is that
+	 * create_merge_append_plan might add resjunk sort columns to a
+	 * MergeAppend.)  Second, create_projection_path has no way of knowing
+	 * what path node will be placed on top of the projection path and
+	 * therefore can't predict whether it will require an exact tlist.
+	 * For both of these reasons, we have to recheck here.
 	 */
-	if (!use_physical_tlist(root, &best_path->path, flags))
-		tlist = build_path_tlist(root, &best_path->path);
-	else if ((flags & CP_LABEL_TLIST) != 0)
+	if (use_physical_tlist(root, &best_path->path, flags))
 	{
-		tlist = copyObject(subplan->targetlist);
-		apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+		/*
+		 * Our caller doesn't really care what tlist we return, so we don't
+		 * actually need to project.  However, we may still need to ensure
+		 * proper sortgroupref labels, if the caller cares about those.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		if ((flags & CP_LABEL_TLIST) == 0)
+			tlist = subplan->targetlist;
+		else
+		{
+			tlist = copyObject(subplan->targetlist);
+			apply_pathtarget_labeling_to_tlist(tlist,
+											   best_path->path.pathtarget);
+		}
+	}
+	else if (is_projection_capable_path(best_path->subpath))
+	{
+		/*
+		 * Our caller requires that we return the exact tlist, but no separate
+		 * result node is needed because the subpath is projection-capable.
+		 * Tell create_plan_recurse that we're going to ignore the tlist it
+		 * produces.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath,
+									  CP_IGNORE_TLIST);
+		tlist = build_path_tlist(root, &best_path->path);
 	}
 	else
-		return subplan;
+	{
+		/*
+		 * It looks like we need a result node, unless by good fortune the
+		 * requested tlist is exactly the one the child wants to produce.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		tlist = build_path_tlist(root, &best_path->path);
+		needs_result_node = !tlist_same_exprs(tlist, subplan->targetlist);
+	}
 
 	/*
-	 * We might not really need a Result node here, either because the subplan
-	 * can project or because it's returning the right list of expressions
-	 * anyway.  Usually create_projection_path will have detected that and set
-	 * dummypp if we don't need a Result; but its decision can't be final,
-	 * because some createplan.c routines change the tlists of their nodes.
-	 * (An example is that create_merge_append_plan might add resjunk sort
-	 * columns to a MergeAppend.)  So we have to recheck here.  If we do
-	 * arrive at a different answer than create_projection_path did, we'll
-	 * have made slightly wrong cost estimates; but label the plan with the
-	 * cost estimates we actually used, not "corrected" ones.  (XXX this could
-	 * be cleaned up if we moved more of the sortcolumn setup logic into Path
-	 * creation, but that would add expense to creating Paths we might end up
-	 * not using.)
+	 * If we make a different decision about whether to include a Result node
+	 * than create_projection_path did, we'll have made slightly wrong cost
+	 * estimates; but label the plan with the cost estimates we actually used,
+	 * not "corrected" ones.  (XXX this could be cleaned up if we moved more of
+	 * the sortcolumn setup logic into Path creation, but that would add
+	 * expense to creating Paths we might end up not using.)
 	 */
-	if (is_projection_capable_path(best_path->subpath) ||
-		tlist_same_exprs(tlist, subplan->targetlist))
+	if (!needs_result_node)
 	{
 		/* Don't need a separate Result, just assign tlist to subplan */
 		plan = subplan;
-- 
2.14.3 (Apple Git-98)

0005-Remove-explicit-path-construction-logic-in-create_or.patchapplication/octet-stream; name=0005-Remove-explicit-path-construction-logic-in-create_or.patchDownload

From f40a9fd834f23b916cf24b6582c8931186386146 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Tue, 13 Mar 2018 13:42:56 -0400
Subject: [PATCH 5/7] Remove explicit path construction logic in
 create_ordered_paths.

Instead of having create_ordered_paths build a path for an explicit
Sort and Gather Merge of the cheapest partial path, add an
explicitly-sorted path to tlist_rel.  If this path looks advantageous
from a cost point of view, the call to generate_gather_paths() for
tlist_rel will take care of building a Gather Merge path for it.

This improves the accuracy of costing for such paths because it
avoids using apply_projection_to_path, which has the disadvantage
of modifying a path in place after the cost has already been
determined.

Along the way, this fixes a mistake in gather_grouping_paths
introduced by commit e2f1eb0ee30d144628ab523432320f174a2c8966:
gather_grouping_paths should try to sort by the group_pathkeys
when operating on a partially grouped rel, but not when operating
on a fully grouped rel.  In that case, it needs to sort by whatever
the next ordering requirement will be.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/planner.c | 104 +++++++++++++++--------------------
 1 file changed, 44 insertions(+), 60 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c48e98643a..90eddf73fd 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -217,7 +217,8 @@ static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
 							  grouping_sets_data *gd,
 							  GroupPathExtraData *extra,
 							  bool force_rel_creation);
-static void gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel);
+static void gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel,
+					  bool partial);
 static bool can_partial_agg(PlannerInfo *root,
 				const AggClauseCosts *agg_costs);
 static RelOptInfo *create_tlist_paths(PlannerInfo *root,
@@ -3999,7 +4000,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 	/* Gather any partially grouped partial paths. */
 	if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
 	{
-		gather_grouping_paths(root, partially_grouped_rel);
+		gather_grouping_paths(root, partially_grouped_rel, true);
 		set_cheapest(partially_grouped_rel);
 	}
 
@@ -4856,57 +4857,6 @@ create_ordered_paths(PlannerInfo *root,
 		}
 	}
 
-	/*
-	 * generate_gather_paths() will have already generated a simple Gather
-	 * path for the best parallel path, if any, and the loop above will have
-	 * considered sorting it.  Similarly, generate_gather_paths() will also
-	 * have generated order-preserving Gather Merge plans which can be used
-	 * without sorting if they happen to match the sort_pathkeys, and the loop
-	 * above will have handled those as well.  However, there's one more
-	 * possibility: it may make sense to sort the cheapest partial path
-	 * according to the required output order and then use Gather Merge.
-	 */
-	if (ordered_rel->consider_parallel && root->sort_pathkeys != NIL &&
-		input_rel->partial_pathlist != NIL)
-	{
-		Path	   *cheapest_partial_path;
-
-		cheapest_partial_path = linitial(input_rel->partial_pathlist);
-
-		/*
-		 * If cheapest partial path doesn't need a sort, this is redundant
-		 * with what's already been tried.
-		 */
-		if (!pathkeys_contained_in(root->sort_pathkeys,
-								   cheapest_partial_path->pathkeys))
-		{
-			Path	   *path;
-			double		total_groups;
-
-			path = (Path *) create_sort_path(root,
-											 ordered_rel,
-											 cheapest_partial_path,
-											 root->sort_pathkeys,
-											 limit_tuples);
-
-			total_groups = cheapest_partial_path->rows *
-				cheapest_partial_path->parallel_workers;
-			path = (Path *)
-				create_gather_merge_path(root, ordered_rel,
-										 path,
-										 path->pathtarget,
-										 root->sort_pathkeys, NULL,
-										 &total_groups);
-
-			/* Add projection step if needed */
-			if (path->pathtarget != target)
-				path = apply_projection_to_path(root, ordered_rel,
-												path, target);
-
-			add_path(ordered_rel, path);
-		}
-	}
-
 	/*
 	 * If there is an FDW that's responsible for all baserels of the query,
 	 * let it consider adding ForeignPaths.
@@ -6398,7 +6348,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
 	 * non-partial paths from each child.
 	 */
 	if (grouped_rel->partial_pathlist != NIL)
-		gather_grouping_paths(root, grouped_rel);
+		gather_grouping_paths(root, grouped_rel, false);
 }
 
 /*
@@ -6720,17 +6670,29 @@ create_partial_grouping_paths(PlannerInfo *root,
  * generate_gather_paths().
  */
 static void
-gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
+gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel, bool partial)
 {
 	Path	   *cheapest_partial_path;
+	List	   *pathkeys;
 
 	/* Try Gather for unordered paths and Gather Merge for ordered ones. */
 	generate_gather_paths(root, rel, true);
 
+	if (partial)
+		pathkeys = root->group_pathkeys;
+	else if (root->window_pathkeys)
+		pathkeys = root->window_pathkeys;
+	else if (list_length(root->distinct_pathkeys) >
+			 list_length(root->sort_pathkeys))
+		pathkeys = root->distinct_pathkeys;
+	else if (root->sort_pathkeys)
+		pathkeys = root->sort_pathkeys;
+	else
+		pathkeys = NIL;
+
 	/* Try cheapest partial path + explicit Sort + Gather Merge. */
 	cheapest_partial_path = linitial(rel->partial_pathlist);
-	if (!pathkeys_contained_in(root->group_pathkeys,
-							   cheapest_partial_path->pathkeys))
+	if (!pathkeys_contained_in(pathkeys, cheapest_partial_path->pathkeys))
 	{
 		Path	   *path;
 		double		total_groups;
@@ -6738,14 +6700,14 @@ gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel)
 		total_groups =
 			cheapest_partial_path->rows * cheapest_partial_path->parallel_workers;
 		path = (Path *) create_sort_path(root, rel, cheapest_partial_path,
-										 root->group_pathkeys,
+										 pathkeys,
 										 -1.0);
 		path = (Path *)
 			create_gather_merge_path(root,
 									 rel,
 									 path,
 									 rel->reltarget,
-									 root->group_pathkeys,
+									 pathkeys,
 									 NULL,
 									 &total_groups);
 
@@ -6934,8 +6896,10 @@ create_tlist_paths(PlannerInfo *root,
 		 * paths for the tlist_rel; these may be useful to upper planning
 		 * stages.
 		 */
-		if (tlist_rel->consider_parallel)
+		if (tlist_rel->consider_parallel && input_rel->partial_pathlist != NIL)
 		{
+			Path	   *cheapest_partial_path;
+
 			/* Apply the scan/join target to each partial path */
 			foreach(lc, input_rel->partial_pathlist)
 			{
@@ -6951,6 +6915,26 @@ create_tlist_paths(PlannerInfo *root,
 														  scanjoin_target);
 				add_partial_path(tlist_rel, newpath);
 			}
+
+			/* Also try explicitly sorting the cheapest path. */
+			cheapest_partial_path = linitial(input_rel->partial_pathlist);
+			if (!pathkeys_contained_in(root->query_pathkeys,
+									   cheapest_partial_path->pathkeys)
+				&& !is_other_rel)
+			{
+				Path	   *path;
+
+				path = (Path *) create_projection_path(root,
+													   tlist_rel,
+													   cheapest_partial_path,
+													   scanjoin_target);
+				path = (Path *) create_sort_path(root,
+												 tlist_rel,
+												 path,
+												 root->query_pathkeys,
+												 -1);
+				add_partial_path(tlist_rel, path);
+			}
 		}
 
 		/* Now fix things up if scan/join target contains SRFs */
-- 
2.14.3 (Apple Git-98)

0004-Postpone-generate_gather_paths-for-topmost-scan-join.patchapplication/octet-stream; name=0004-Postpone-generate_gather_paths-for-topmost-scan-join.patchDownload

From edd69ad781c2ec8ef96b4b2b1e46a6be354cdfd4 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 16:45:15 -0400
Subject: [PATCH 4/7] Postpone generate_gather_paths for topmost scan/join rel.

Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths.  If the scan/join target
is parallel-safe, we actually skip this for the topmost scan/join rel
altogether and instead do it for the tlist_rel, so that the
projection is done in the worker and costs are computed accordingly.

Amit Kapila and Robert Haas
---
 src/backend/optimizer/geqo/geqo_eval.c | 21 ++++++++++++++-------
 src/backend/optimizer/path/allpaths.c  | 26 +++++++++++++++++++-------
 src/backend/optimizer/plan/planner.c   | 24 +++++++++++++++++++++++-
 3 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 0be2a73e05..3ef7d7d8aa 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partitionwise joins. */
 				generate_partitionwise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel, false);
+				/*
+				 * Except for the topmost scan/join rel, consider gathering
+				 * partial paths.  We'll do the same for the topmost scan/join
+				 * rel once we know the final targetlist (see
+				 * grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, false);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..c4e4db15a6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,13 +479,20 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers.  Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan/join rel (that is, the only baserel),
+	 * we postpone this until the final scan/join targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
 		generate_gather_paths(root, rel, false);
 
 	/*
@@ -2699,8 +2706,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partitionwise joins. */
 			generate_partitionwise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel, false);
+			/*
+			 * Except for the topmost scan/join rel, consider gathering
+			 * partial paths.  We'll do the same for the topmost scan/join rel
+			 * once we know the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 1e1b363402..c48e98643a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1970,6 +1970,17 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_targets_contain_srfs = NIL;
 		}
 
+		/*
+		 * If the final scan/join target is not parallel-safe, we must
+		 * generate Gather paths now, since no partial paths will be generated
+		 * for the tlist rel.  Otherwise, the Gather or Gather Merge paths
+		 * generated for the tlist rel will be superior to these in that
+		 * projection will be done in by each participant rather than only in
+		 * the leader, so skip generating them here.
+		 */
+		if (!scanjoin_target_parallel_safe)
+			generate_gather_paths(root, current_rel, false);
+
 		/*
 		 * Apply SRF-free scan/join target to all Paths for the scanjoin rel
 		 * to produce paths for the tlist rel.
@@ -6791,6 +6802,7 @@ create_tlist_paths(PlannerInfo *root,
 {
 	ListCell   *lc;
 	RelOptInfo *tlist_rel;
+	bool		is_other_rel = IS_OTHER_REL(input_rel);
 
 	check_stack_depth();
 
@@ -6799,7 +6811,7 @@ create_tlist_paths(PlannerInfo *root,
 	 * projection.
 	 */
 	tlist_rel = fetch_upper_rel(root, UPPERREL_TLIST, input_rel->relids);
-	if (IS_OTHER_REL(input_rel))
+	if (is_other_rel)
 		tlist_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
 	tlist_rel->rows = input_rel->rows;
 	tlist_rel->reltarget = llast_node(PathTarget, scanjoin_targets);
@@ -6948,6 +6960,16 @@ create_tlist_paths(PlannerInfo *root,
 								  scanjoin_targets_contain_srfs);
 	}
 
+	/*
+	 * Consider generating Gather or Gather Merge paths.  We must only do this
+	 * if the relation is parallel safe, and we don't do it for child rels to
+	 * avoid creating multiple Gather nodes within the same plan. Also, we
+	 * must do it before calling set_cheapest, since one of the generated
+	 * paths may turn out to be the cheapest one.
+	 */
+	if (tlist_rel->consider_parallel && !is_other_rel)
+		generate_gather_paths(root, tlist_rel, false);
+
 	/* Determine cheapest paths, for the benefit of future planning steps. */
 	set_cheapest(tlist_rel);
 
-- 
2.14.3 (Apple Git-98)

0003-Add-new-upper-rel-to-represent-projecting-toplevel-s.patchapplication/octet-stream; name=0003-Add-new-upper-rel-to-represent-projecting-toplevel-s.patchDownload

From d077b7fab8c771f52fe24496b2d943742f189d20 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 13:16:30 -0400
Subject: [PATCH 3/7] Add new upper rel to represent projecting toplevel
 scan/join rel.

UPPERREL_TLIST represents the result of applying the scan/join target
to the final scan/join relation.  This requires us to use
create_projection_path() rather than apply_projection_to_path()
when projection non-partial paths from the scan/join relation, but
it also enables us to avoid needing to modify those paths in place.

It also avoids the need to clear the topmost scan/join rel's
partial_pathlist when the scan/join target is not parallel-safe, which
is sort of hack; see commit 3bf05e096b9f8375e640c5d7996aa57efd7f240c
for an example of a previous fix that eliminated a similar hack.

This also cleans up what appears to be incorrect SRF handling
introduced in commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old
code had no knowledge of SRFs for child scan/join rels.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/planner.c         | 262 +++++----
 src/include/nodes/relation.h                 |   1 +
 src/test/regress/expected/partition_join.out | 772 +++++++++++++--------------
 3 files changed, 542 insertions(+), 493 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 50f858e420..1e1b363402 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -220,11 +220,11 @@ static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
 static void gather_grouping_paths(PlannerInfo *root, RelOptInfo *rel);
 static bool can_partial_agg(PlannerInfo *root,
 				const AggClauseCosts *agg_costs);
-static void apply_scanjoin_target_to_paths(PlannerInfo *root,
-							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
-							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place);
+static RelOptInfo *create_tlist_paths(PlannerInfo *root,
+				   RelOptInfo *input_rel,
+				   List *scanjoin_targets,
+				   List *scanjoin_targets_contain_srfs,
+				   bool scanjoin_target_parallel_safe);
 static void create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *input_rel,
 									RelOptInfo *grouped_rel,
@@ -1962,25 +1962,21 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 		else
 		{
-			/* initialize lists, just to keep compiler quiet */
+			/* initialize lists; for most of these, dummy values are OK */
 			final_targets = final_targets_contain_srfs = NIL;
 			sort_input_targets = sort_input_targets_contain_srfs = NIL;
 			grouping_targets = grouping_targets_contain_srfs = NIL;
-			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
+			scanjoin_targets = list_make1(scanjoin_target);
+			scanjoin_targets_contain_srfs = NIL;
 		}
 
 		/*
-		 * Forcibly apply SRF-free scan/join target to all the Paths for the
-		 * scan/join rel.
+		 * Apply SRF-free scan/join target to all Paths for the scanjoin rel
+		 * to produce paths for the tlist rel.
 		 */
-		apply_scanjoin_target_to_paths(root, current_rel, scanjoin_target,
-									   scanjoin_target_parallel_safe, true);
-
-		/* Now fix things up if scan/join target contains SRFs */
-		if (parse->hasTargetSRFs)
-			adjust_paths_for_srfs(root, current_rel,
-								  scanjoin_targets,
-								  scanjoin_targets_contain_srfs);
+		current_rel = create_tlist_paths(root, current_rel, scanjoin_targets,
+										 scanjoin_targets_contain_srfs,
+										 scanjoin_target_parallel_safe);
 
 		/*
 		 * Save the various upper-rel PathTargets we just computed into
@@ -1992,6 +1988,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		root->upper_targets[UPPERREL_FINAL] = final_target;
 		root->upper_targets[UPPERREL_WINDOW] = sort_input_target;
 		root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target;
+		root->upper_targets[UPPERREL_TLIST] = scanjoin_target;
 
 		/*
 		 * If we have grouping and/or aggregation, consider ways to implement
@@ -6780,97 +6777,181 @@ can_partial_agg(PlannerInfo *root, const AggClauseCosts *agg_costs)
 }
 
 /*
- * apply_scanjoin_target_to_paths
+ * create_tlist_paths
  *
- * Applies scan/join target to all the Paths for the scan/join rel.
+ * Build up a new relation representing the result of applying the final
+ * scan/join targetlist to the paths returned for the topmost scan/join rel.
  */
-static void
-apply_scanjoin_target_to_paths(PlannerInfo *root,
-							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
-							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place)
+static RelOptInfo *
+create_tlist_paths(PlannerInfo *root,
+				   RelOptInfo *input_rel,
+				   List *scanjoin_targets,
+				   List *scanjoin_targets_contain_srfs,
+				   bool scanjoin_target_parallel_safe)
 {
 	ListCell   *lc;
+	RelOptInfo *tlist_rel;
+
+	check_stack_depth();
 
 	/*
-	 * In principle we should re-run set_cheapest() here to identify the
-	 * cheapest path, but it seems unlikely that adding the same tlist eval
-	 * costs to all the paths would change that, so we don't bother. Instead,
-	 * just assume that the cheapest-startup and cheapest-total paths remain
-	 * so.  (There should be no parameterized paths anymore, so we needn't
-	 * worry about updating cheapest_parameterized_paths.)
+	 * Create a new upper relation to represent the result of scan/join
+	 * projection.
 	 */
-	foreach(lc, rel->pathlist)
+	tlist_rel = fetch_upper_rel(root, UPPERREL_TLIST, input_rel->relids);
+	if (IS_OTHER_REL(input_rel))
+		tlist_rel->reloptkind = RELOPT_OTHER_UPPER_REL;
+	tlist_rel->rows = input_rel->rows;
+	tlist_rel->reltarget = llast_node(PathTarget, scanjoin_targets);
+	tlist_rel->consider_parallel = input_rel->consider_parallel &&
+		scanjoin_target_parallel_safe;
+
+	/*
+	 * If the input rel belongs to a single FDW, so does the tlist rel.
+	 */
+	tlist_rel->serverid = input_rel->serverid;
+	tlist_rel->userid = input_rel->userid;
+	tlist_rel->useridiscurrent = input_rel->useridiscurrent;
+	tlist_rel->fdwroutine = input_rel->fdwroutine;
+
+	/* If the input rel is dummy, so is this. */
+	if (IS_DUMMY_REL(input_rel))
 	{
-		Path	   *subpath = (Path *) lfirst(lc);
-		Path	   *newpath;
+		mark_dummy_rel(tlist_rel);
 
-		Assert(subpath->param_info == NULL);
+		return tlist_rel;
+	}
+
+	/*
+	 * If the input rel is partitioned, so is the tlist rel.  And, in fact, we
+	 * want to do the projection steps on a per-partition basis in this case.
+	 * Since Append is not projection-capable, that might save a separate
+	 * Result node, and it also is important for partitionwise aggregate.
+	 */
+	if (IS_PARTITIONED_REL(input_rel))
+	{
+		int			nparts = input_rel->nparts;
+		int			partition_idx;
+		List	   *live_children = NIL;
 
 		/*
-		 * Don't use apply_projection_to_path() when modify_in_place is false,
-		 * because there could be other pointers to these paths, and therefore
-		 * we mustn't modify them in place.
+		 * Copy partitioning properties from underlying relation -- except for
+		 * part_rels.
 		 */
-		if (modify_in_place)
-			newpath = apply_projection_to_path(root, rel, subpath,
-											   scanjoin_target);
-		else
-			newpath = (Path *) create_projection_path(root, rel, subpath,
-													  scanjoin_target);
+		tlist_rel->part_scheme = input_rel->part_scheme;
+		tlist_rel->nparts = input_rel->nparts;
+		tlist_rel->boundinfo = input_rel->boundinfo;
+		tlist_rel->partexprs = input_rel->partexprs;
+		tlist_rel->nullable_partexprs = input_rel->nullable_partexprs;
 
-		/* If we had to add a Result, newpath is different from subpath */
-		if (newpath != subpath)
+		/*
+		 * Populate part_rels by generating a child tlist rel for each child
+		 * input rel.
+		 */
+		tlist_rel->part_rels =
+			palloc(tlist_rel->nparts * sizeof(RelOptInfo *));
+		for (partition_idx = 0; partition_idx < nparts; partition_idx++)
 		{
-			lfirst(lc) = newpath;
-			if (subpath == rel->cheapest_startup_path)
-				rel->cheapest_startup_path = newpath;
-			if (subpath == rel->cheapest_total_path)
-				rel->cheapest_total_path = newpath;
+			RelOptInfo *child_input_rel = input_rel->part_rels[partition_idx];
+			RelOptInfo *child_tlist_rel;
+			ListCell   *lc;
+			AppendRelInfo **appinfos;
+			int			nappinfos;
+			List	   *child_scanjoin_targets = NIL;
+
+			/* Translate scan/join targets for this child. */
+			appinfos = find_appinfos_by_relids(root, child_input_rel->relids,
+											   &nappinfos);
+			foreach(lc, scanjoin_targets)
+			{
+				PathTarget *target = lfirst_node(PathTarget, lc);
+
+				target = copy_pathtarget(target);
+				target->exprs = (List *)
+					adjust_appendrel_attrs(root,
+										   (Node *) target->exprs,
+										   nappinfos, appinfos);
+				child_scanjoin_targets = lappend(child_scanjoin_targets,
+												 target);
+			}
+			pfree(appinfos);
+
+			/* Now we can build the child rel. */
+			child_tlist_rel =
+				create_tlist_paths(root, child_input_rel,
+								   child_scanjoin_targets,
+								   scanjoin_targets_contain_srfs,
+								   scanjoin_target_parallel_safe);
+			tlist_rel->part_rels[partition_idx] = child_tlist_rel;
+
+			/* Save non-dummy children for Append paths. */
+			if (!IS_DUMMY_REL(child_tlist_rel))
+				live_children = lappend(live_children, child_tlist_rel);
 		}
-	}
 
-	/*
-	 * Upper planning steps which make use of the top scan/join rel's partial
-	 * pathlist will expect partial paths for that rel to produce the same
-	 * output as complete paths ... and we just changed the output for the
-	 * complete paths, so we'll need to do the same thing for partial paths.
-	 * But only parallel-safe expressions can be computed by partial paths.
-	 */
-	if (rel->partial_pathlist && scanjoin_target_parallel_safe)
+		/* Build paths for this relation by appending child paths. */
+		add_paths_to_append_rel(root, tlist_rel, live_children);
+	}
+	else
 	{
-		/* Apply the scan/join target to each partial path */
-		foreach(lc, rel->partial_pathlist)
+		PathTarget *scanjoin_target;
+
+		/* Extract SRF-free scan/join target. */
+		scanjoin_target = linitial_node(PathTarget, scanjoin_targets);
+
+		/*
+		 * This is not a partitioned relation, so just create a projection
+		 * path for each input path, in each case applying the SRF-free
+		 * scan/join target.
+		 */
+		foreach(lc, input_rel->pathlist)
 		{
 			Path	   *subpath = (Path *) lfirst(lc);
 			Path	   *newpath;
 
-			/* Shouldn't have any parameterized paths anymore */
 			Assert(subpath->param_info == NULL);
 
-			/*
-			 * Don't use apply_projection_to_path() here, because there could
-			 * be other pointers to these paths, and therefore we mustn't
-			 * modify them in place.
-			 */
-			newpath = (Path *) create_projection_path(root,
-													  rel,
-													  subpath,
+			newpath = (Path *) create_projection_path(root, tlist_rel, subpath,
 													  scanjoin_target);
-			lfirst(lc) = newpath;
+
+			add_path(tlist_rel, newpath);
 		}
-	}
-	else
-	{
+
 		/*
-		 * In the unfortunate event that scanjoin_target is not parallel-safe,
-		 * we can't apply it to the partial paths; in that case, we'll need to
-		 * forget about the partial paths, which aren't valid input for upper
-		 * planning steps.
+		 * If parallel query is possible at this level, also generate partial
+		 * paths for the tlist_rel; these may be useful to upper planning
+		 * stages.
 		 */
-		rel->partial_pathlist = NIL;
+		if (tlist_rel->consider_parallel)
+		{
+			/* Apply the scan/join target to each partial path */
+			foreach(lc, input_rel->partial_pathlist)
+			{
+				Path	   *subpath = (Path *) lfirst(lc);
+				Path	   *newpath;
+
+				/* Shouldn't have any parameterized paths anymore */
+				Assert(subpath->param_info == NULL);
+
+				newpath = (Path *) create_projection_path(root,
+														  tlist_rel,
+														  subpath,
+														  scanjoin_target);
+				add_partial_path(tlist_rel, newpath);
+			}
+		}
+
+		/* Now fix things up if scan/join target contains SRFs */
+		if (root->parse->hasTargetSRFs)
+			adjust_paths_for_srfs(root, tlist_rel,
+								  scanjoin_targets,
+								  scanjoin_targets_contain_srfs);
 	}
+
+	/* Determine cheapest paths, for the benefit of future planning steps. */
+	set_cheapest(tlist_rel);
+
+	return tlist_rel;
 }
 
 /*
@@ -6917,7 +6998,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 		PathTarget *child_target = copy_pathtarget(target);
 		AppendRelInfo **appinfos;
 		int			nappinfos;
-		PathTarget *scanjoin_target;
 		GroupPathExtraData child_extra;
 		RelOptInfo *child_grouped_rel;
 		RelOptInfo *child_partially_grouped_rel;
@@ -6974,26 +7054,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			continue;
 		}
 
-		/*
-		 * Copy pathtarget from underneath scan/join as we are modifying it
-		 * and translate its Vars with respect to this appendrel.  The input
-		 * relation's reltarget might not be the final scanjoin_target, but
-		 * the pathtarget any given individual path should be.
-		 */
-		scanjoin_target =
-			copy_pathtarget(input_rel->cheapest_startup_path->pathtarget);
-		scanjoin_target->exprs = (List *)
-			adjust_appendrel_attrs(root,
-								   (Node *) scanjoin_target->exprs,
-								   nappinfos, appinfos);
-
-		/*
-		 * Forcibly apply scan/join target to all the Paths for the scan/join
-		 * rel.
-		 */
-		apply_scanjoin_target_to_paths(root, child_input_rel, scanjoin_target,
-									   extra->target_parallel_safe, false);
-
 		/* Create grouping paths for this child relation. */
 		create_ordinary_grouping_paths(root, child_input_rel,
 									   child_grouped_rel,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..d4bffbc281 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,7 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_TLIST,				/* result of projecting final scan/join rel */
 	UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
 								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 4fccd9ae54..b983f9c506 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -65,31 +65,30 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
 -- left outer join, with whole-row reference
 EXPLAIN (COSTS OFF)
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
-                       QUERY PLAN                       
---------------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t2.b = t1.a)
-                     ->  Seq Scan on prt2_p1 t2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p1 t1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_1.b = t1_1.a)
-                     ->  Seq Scan on prt2_p2 t2_1
-                     ->  Hash
-                           ->  Seq Scan on prt1_p2 t1_1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_2.b = t1_2.a)
-                     ->  Seq Scan on prt2_p3 t2_2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p3 t1_2
-                                 Filter: (b = 0)
-(22 rows)
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t2.b = t1.a)
+               ->  Seq Scan on prt2_p1 t2
+               ->  Hash
+                     ->  Seq Scan on prt1_p1 t1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_1.b = t1_1.a)
+               ->  Seq Scan on prt2_p2 t2_1
+               ->  Hash
+                     ->  Seq Scan on prt1_p2 t1_1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_2.b = t1_2.a)
+               ->  Seq Scan on prt2_p3 t2_2
+               ->  Hash
+                     ->  Seq Scan on prt1_p3 t1_2
+                           Filter: (b = 0)
+(21 rows)
 
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
       t1      |      t2      
@@ -111,30 +110,29 @@ SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER
 -- right outer join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                          QUERY PLAN                           
+---------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t1.a = t2.b)
-                     ->  Seq Scan on prt1_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t1_1.a = t2_1.b)
-                     ->  Seq Scan on prt1_p2 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p2 t2_1
-                                 Filter: (a = 0)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt2_p3 t2_2
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t1.a = t2.b)
+               ->  Seq Scan on prt1_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_p1 t2
                            Filter: (a = 0)
-                     ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
-                           Index Cond: (a = t2_2.b)
-(21 rows)
+         ->  Hash Right Join
+               Hash Cond: (t1_1.a = t2_1.b)
+               ->  Seq Scan on prt1_p2 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_p2 t2_1
+                           Filter: (a = 0)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt2_p3 t2_2
+                     Filter: (a = 0)
+               ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
+                     Index Cond: (a = t2_2.b)
+(20 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -375,37 +373,36 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
 			  ON t1.a = ss.t2a WHERE t1.b = 0 ORDER BY t1.a;
-                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p1 t1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
-                                 Index Cond: (a = t1.a)
-                           ->  Index Scan using iprt2_p1_b on prt2_p1 t3
-                                 Index Cond: (b = t2.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p2 t1_1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
-                                 Index Cond: (a = t1_1.a)
-                           ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
-                                 Index Cond: (b = t2_1.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p3 t1_2
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
-                                 Index Cond: (a = t1_2.a)
-                           ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
-                                 Index Cond: (b = t2_2.a)
-(28 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p1 t1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
+                           Index Cond: (a = t1.a)
+                     ->  Index Scan using iprt2_p1_b on prt2_p1 t3
+                           Index Cond: (b = t2.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p2 t1_1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
+                           Index Cond: (a = t1_1.a)
+                     ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
+                           Index Cond: (b = t2_1.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p3 t1_2
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
+                           Index Cond: (a = t1_2.a)
+                     ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
+                           Index Cond: (b = t2_2.a)
+(27 rows)
 
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
@@ -538,92 +535,90 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_e t1, prt2_e t2 WHERE (t1.a + t1.b)/2 =
 --
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-                                QUERY PLAN                                 
----------------------------------------------------------------------------
+                             QUERY PLAN                              
+---------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop
-                     Join Filter: (t1.a = ((t3.a + t3.b) / 2))
-                     ->  Hash Join
+   ->  Append
+         ->  Nested Loop
+               Join Filter: (t1.a = ((t3.a + t3.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2.b = t1.a)
+                     ->  Seq Scan on prt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p1 t1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
+                     Index Cond: (((a + b) / 2) = t2.b)
+         ->  Nested Loop
+               Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_1.b = t1_1.a)
+                     ->  Seq Scan on prt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_p2 t1_1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
+                     Index Cond: (((a + b) / 2) = t2_1.b)
+         ->  Nested Loop
+               Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_2.b = t1_2.a)
+                     ->  Seq Scan on prt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p3 t1_2
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
+                     Index Cond: (((a + b) / 2) = t2_2.b)
+(33 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
+  a  |  c   |  b  |  c   | ?column? | c 
+-----+------+-----+------+----------+---
+   0 | 0000 |   0 | 0000 |        0 | 0
+ 150 | 0150 | 150 | 0150 |      300 | 0
+ 300 | 0300 | 300 | 0300 |      600 | 0
+ 450 | 0450 | 450 | 0450 |      900 | 0
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
+               ->  Seq Scan on prt1_e_p1 t3
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2.b = t1.a)
                            ->  Seq Scan on prt2_p1 t2
                            ->  Hash
                                  ->  Seq Scan on prt1_p1 t1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
-                           Index Cond: (((a + b) / 2) = t2.b)
-               ->  Nested Loop
-                     Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
+               ->  Seq Scan on prt1_e_p2 t3_1
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_1.b = t1_1.a)
                            ->  Seq Scan on prt2_p2 t2_1
                            ->  Hash
                                  ->  Seq Scan on prt1_p2 t1_1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
-                           Index Cond: (((a + b) / 2) = t2_1.b)
-               ->  Nested Loop
-                     Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
+               ->  Seq Scan on prt1_e_p3 t3_2
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_2.b = t1_2.a)
                            ->  Seq Scan on prt2_p3 t2_2
                            ->  Hash
                                  ->  Seq Scan on prt1_p3 t1_2
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
-                           Index Cond: (((a + b) / 2) = t2_2.b)
-(34 rows)
-
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-  a  |  c   |  b  |  c   | ?column? | c 
------+------+-----+------+----------+---
-   0 | 0000 |   0 | 0000 |        0 | 0
- 150 | 0150 | 150 | 0150 |      300 | 0
- 300 | 0300 | 300 | 0300 |      600 | 0
- 450 | 0450 | 450 | 0450 |      900 | 0
-(4 rows)
-
-EXPLAIN (COSTS OFF)
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Sort
-   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
-                     ->  Seq Scan on prt1_e_p1 t3
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2.b = t1.a)
-                                 ->  Seq Scan on prt2_p1 t2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p1 t1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
-                     ->  Seq Scan on prt1_e_p2 t3_1
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_1.b = t1_1.a)
-                                 ->  Seq Scan on prt2_p2 t2_1
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p2 t1_1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
-                     ->  Seq Scan on prt1_e_p3 t3_2
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_2.b = t1_2.a)
-                                 ->  Seq Scan on prt2_p3 t2_2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p3 t1_2
-                                             Filter: (b = 0)
-(34 rows)
+(33 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -644,40 +639,39 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                            QUERY PLAN                             
+-------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
-                           ->  Seq Scan on prt1_p1 t1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p1 t3
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p1_b on prt2_p1 t2
-                           Index Cond: (t1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                           ->  Seq Scan on prt1_p2 t1_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p2 t3_1
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
-                           Index Cond: (t1_1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                           ->  Seq Scan on prt1_p3 t1_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p3 t3_2
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
-                           Index Cond: (t1_2.a = b)
-(31 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
+                     ->  Seq Scan on prt1_p1 t1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p1 t3
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p1_b on prt2_p1 t2
+                     Index Cond: (t1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+                     ->  Seq Scan on prt1_p2 t1_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p2 t3_1
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
+                     Index Cond: (t1_1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+                     ->  Seq Scan on prt1_p3 t1_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p3 t3_2
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
+                     Index Cond: (t1_2.a = b)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -700,52 +694,51 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                                      QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
+                                                   QUERY PLAN                                                   
+----------------------------------------------------------------------------------------------------------------
  Sort
    Sort Key: prt1_p1.a, prt2_p1.b, ((prt1_e_p1.a + prt1_e_p1.b))
-   ->  Result
-         ->  Append
+   ->  Append
+         ->  Hash Full Join
+               Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
+               Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
-                     Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p1.a = prt2_p1.b)
-                           ->  Seq Scan on prt1_p1
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p1
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p1.a = prt2_p1.b)
+                     ->  Seq Scan on prt1_p1
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p1
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p1
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p1
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
+               Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
-                     Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p2.a = prt2_p2.b)
-                           ->  Seq Scan on prt1_p2
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p2
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p2.a = prt2_p2.b)
+                     ->  Seq Scan on prt1_p2
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p2
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p2
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p2
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
+               Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
-                     Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p3.a = prt2_p3.b)
-                           ->  Seq Scan on prt1_p3
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p3
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p3.a = prt2_p3.b)
+                     ->  Seq Scan on prt1_p3
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p3
-                                 Filter: (c = 0)
-(43 rows)
+                           ->  Seq Scan on prt2_p3
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p3
+                           Filter: (c = 0)
+(42 rows)
 
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
  a  | phv | b  | phv | ?column? | phv 
@@ -933,61 +926,60 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                    QUERY PLAN                                    
-----------------------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Merge Left Join
-                     Merge Cond: (t1.a = t2.b)
-                     ->  Sort
-                           Sort Key: t1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3.a + t3.b) / 2))
-                                       ->  Seq Scan on prt1_e_p1 t3
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1.a
-                                       ->  Seq Scan on prt1_p1 t1
-                     ->  Sort
-                           Sort Key: t2.b
-                           ->  Seq Scan on prt2_p1 t2
-               ->  Merge Left Join
-                     Merge Cond: (t1_1.a = t2_1.b)
-                     ->  Sort
-                           Sort Key: t1_1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_1.a + t3_1.b) / 2))
-                                       ->  Seq Scan on prt1_e_p2 t3_1
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_1.a
-                                       ->  Seq Scan on prt1_p2 t1_1
-                     ->  Sort
-                           Sort Key: t2_1.b
-                           ->  Seq Scan on prt2_p2 t2_1
-               ->  Merge Left Join
-                     Merge Cond: (t1_2.a = t2_2.b)
-                     ->  Sort
-                           Sort Key: t1_2.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_2.a + t3_2.b) / 2))
-                                       ->  Seq Scan on prt1_e_p3 t3_2
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_2.a
-                                       ->  Seq Scan on prt1_p3 t1_2
-                     ->  Sort
-                           Sort Key: t2_2.b
-                           ->  Seq Scan on prt2_p3 t2_2
-(52 rows)
+   ->  Append
+         ->  Merge Left Join
+               Merge Cond: (t1.a = t2.b)
+               ->  Sort
+                     Sort Key: t1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
+                           ->  Sort
+                                 Sort Key: (((t3.a + t3.b) / 2))
+                                 ->  Seq Scan on prt1_e_p1 t3
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1.a
+                                 ->  Seq Scan on prt1_p1 t1
+               ->  Sort
+                     Sort Key: t2.b
+                     ->  Seq Scan on prt2_p1 t2
+         ->  Merge Left Join
+               Merge Cond: (t1_1.a = t2_1.b)
+               ->  Sort
+                     Sort Key: t1_1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
+                           ->  Sort
+                                 Sort Key: (((t3_1.a + t3_1.b) / 2))
+                                 ->  Seq Scan on prt1_e_p2 t3_1
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_1.a
+                                 ->  Seq Scan on prt1_p2 t1_1
+               ->  Sort
+                     Sort Key: t2_1.b
+                     ->  Seq Scan on prt2_p2 t2_1
+         ->  Merge Left Join
+               Merge Cond: (t1_2.a = t2_2.b)
+               ->  Sort
+                     Sort Key: t1_2.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
+                           ->  Sort
+                                 Sort Key: (((t3_2.a + t3_2.b) / 2))
+                                 ->  Seq Scan on prt1_e_p3 t3_2
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_2.a
+                                 ->  Seq Scan on prt1_p3 t1_2
+               ->  Sort
+                     Sort Key: t2_2.b
+                     ->  Seq Scan on prt2_p3 t2_2
+(51 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -1145,42 +1137,41 @@ ANALYZE plt1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on plt1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on plt1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p1 t3
+                                 ->  Seq Scan on plt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on plt1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on plt1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p2 t3_1
+                                 ->  Seq Scan on plt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on plt1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on plt1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on plt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |          avg          |  c   |  c   |   c   
@@ -1290,42 +1281,41 @@ ANALYZE pht1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on pht1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on pht1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p1 t3
+                                 ->  Seq Scan on pht2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on pht1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on pht1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p2 t3_1
+                                 ->  Seq Scan on pht2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on pht1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on pht1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on pht2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |         avg          |  c   |  c   |   c   
@@ -1463,40 +1453,39 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b
 -- right join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
-                     ->  Seq Scan on prt1_l_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p1 t2_1
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p2 t2_2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
+               ->  Seq Scan on prt1_l_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p1 t2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p1 t2_1
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p2 t2_2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
+                     ->  Seq Scan on prt1_l_p3_p2 t1_4
+               ->  Hash
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                           ->  Seq Scan on prt1_l_p3_p2 t1_4
-                     ->  Hash
-                           ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t2_3
-                                       Filter: (a = 0)
-(31 rows)
+                           ->  Seq Scan on prt2_l_p3_p1 t2_3
+                                 Filter: (a = 0)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -1577,55 +1566,54 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
 			  ON t1.a = ss.t2a AND t1.c = ss.t2c WHERE t1.b = 0 ORDER BY t1.a;
-                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p1 t1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
-                           ->  Seq Scan on prt2_l_p1 t3
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p1 t2
-                                       Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p1 t3_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p1 t2_1
-                                       Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p1 t1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
+                     ->  Seq Scan on prt2_l_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p1 t2
+                                 Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p1 t2_1
+                                 Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p2 t2_2
+                                 Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
                            Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p2 t3_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p2 t2_2
-                                       Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
-               ->  Nested Loop Left Join
+               ->  Hash Join
+                     Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                                 Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
+                           ->  Seq Scan on prt2_l_p3_p1 t3_3
+                           ->  Seq Scan on prt2_l_p3_p2 t3_4
+                     ->  Hash
                            ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t3_3
-                                 ->  Seq Scan on prt2_l_p3_p2 t3_4
-                           ->  Hash
-                                 ->  Append
-                                       ->  Seq Scan on prt1_l_p3_p1 t2_3
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-                                       ->  Seq Scan on prt1_l_p3_p2 t2_4
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-(46 rows)
+                                 ->  Seq Scan on prt1_l_p3_p1 t2_3
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+                                 ->  Seq Scan on prt1_l_p3_p2 t2_4
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+(45 rows)
 
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
-- 
2.14.3 (Apple Git-98)

0002-Adjust-use_physical_tlist-to-work-on-upper-rels.patchapplication/octet-stream; name=0002-Adjust-use_physical_tlist-to-work-on-upper-rels.patchDownload

From 6329bf690a327e003a8de6e21aef3ffa019c561d Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 13:52:01 -0400
Subject: [PATCH 2/7] Adjust use_physical_tlist to work on upper rels.

Instead of testing for the inheritance case by checking specifically
for RELOPT_BASEREL, use IS_OTHER_REL().  This requires a small
adjustment later in the function: upper rels won't have attr_neeeded
set, so just skip that test if the information is not present.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/createplan.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 997d032939..b5107988d6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -805,7 +805,7 @@ use_physical_tlist(PlannerInfo *root, Path *path, int flags)
 	 * doesn't project; this test may be unnecessary now that
 	 * create_append_plan instructs its children to return an exact tlist).
 	 */
-	if (rel->reloptkind != RELOPT_BASEREL)
+	if (IS_OTHER_REL(rel))
 		return false;
 
 	/*
@@ -831,10 +831,13 @@ use_physical_tlist(PlannerInfo *root, Path *path, int flags)
 	 * (This could possibly be fixed but would take some fragile assumptions
 	 * in setrefs.c, I think.)
 	 */
-	for (i = rel->min_attr; i <= 0; i++)
+	if (rel->attr_needed)
 	{
-		if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
-			return false;
+		for (i = rel->min_attr; i <= 0; i++)
+		{
+			if (!bms_is_empty(rel->attr_needed[i - rel->min_attr]))
+				return false;
+		}
 	}
 
 	/*
-- 
2.14.3 (Apple Git-98)

0001-Teach-create_projection_plan-to-omit-projection-wher.patchapplication/octet-stream; name=0001-Teach-create_projection_plan-to-omit-projection-wher.patchDownload

From 61be68ba0ac7df3fc308410e8f7aaea8739ac051 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 12:36:57 -0400
Subject: [PATCH 1/7] Teach create_projection_plan to omit projection where
 possible.

We sometimes insert a ProjectionPath into a plan tree when it isn't
actually needed.  The existing code already provides for the case
where the ProjectionPath's subpath can perform the projection itself
instead of needing a Result node to do it, but previously it didn't
consider the possibility that the parent node might not actually
require the projection.  This optimization also allows the "physical
tlist" optimization to be preserved in some cases where it would not
otherwise happen.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/createplan.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8b4f031d96..997d032939 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -87,7 +87,9 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 				   int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
-static Plan *create_projection_plan(PlannerInfo *root, ProjectionPath *best_path);
+static Plan *create_projection_plan(PlannerInfo *root,
+					   ProjectionPath *best_path,
+					   int flags);
 static Plan *inject_projection_plan(Plan *subplan, List *tlist, bool parallel_safe);
 static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags);
 static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
@@ -400,7 +402,8 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 			if (IsA(best_path, ProjectionPath))
 			{
 				plan = create_projection_plan(root,
-											  (ProjectionPath *) best_path);
+											  (ProjectionPath *) best_path,
+											  flags);
 			}
 			else if (IsA(best_path, MinMaxAggPath))
 			{
@@ -1567,7 +1570,7 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
  *	  but sometimes we can just let the subplan do the work.
  */
 static Plan *
-create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
+create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 {
 	Plan	   *plan;
 	Plan	   *subplan;
@@ -1576,7 +1579,22 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	/* Since we intend to project, we don't need to constrain child tlist */
 	subplan = create_plan_recurse(root, best_path->subpath, 0);
 
-	tlist = build_path_tlist(root, &best_path->path);
+	/*
+	 * If our caller doesn't really care what tlist we return, then we might
+	 * not really need to project.  If use_physical_tlist returns false, then
+	 * we're obliged to project.  If it returns true, we can skip actually
+	 * projecting but must still correctly label the input path's tlist with
+	 * the sortgroupref information if the caller has so requested.
+	 */
+	if (!use_physical_tlist(root, &best_path->path, flags))
+		tlist = build_path_tlist(root, &best_path->path);
+	else if ((flags & CP_LABEL_TLIST) != 0)
+	{
+		tlist = copyObject(subplan->targetlist);
+		apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+	}
+	else
+		return subplan;
 
 	/*
 	 * We might not really need a Result node here, either because the subplan
-- 
2.14.3 (Apple Git-98)

#82

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#81)

Re: [HACKERS] why not parallel seq scan for slow functions

On Sat, Mar 24, 2018 at 8:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 23, 2018 at 12:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Yeah, sometimes that kind of stuff change performance characteristics,
but I think what is going on here is that create_projection_plan is
causing the lower node to build physical tlist which takes some
additional time. I have tried below change on top of the patch series
and it brings back the performance for me.

I tried another approach inspired by this, which is to altogether skip
building the child scan tlist if it will just be replaced. See 0006.
In testing here, that seems to be a bit better than your proposal, but
I wonder what your results will be.

It looks in my testing like this still underperforms master on your
test case. Do you get the same result?

For me, it is equivalent to the master. The average of ten runs on
the master is 20664.3683 and with all the patches applied it is
20590.4734. I think there is some run-to-run variation, but more or
less there is no visible degradation. I think we have found the root
cause and eliminated it. OTOH, I have found another case where new
patch series seems to degrade.

Test case
--------------
DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain Select count(ten) from tenk1';
END LOOP;
END;
$$;

The average of ten runs on the master is 31593.9533 and with all the
patches applied it is 34008.7341. The patch takes approximately 7.6%
more time. I think this patch series is doing something costly in the
common code path. I am also worried that the new code proposed by you
in 0003* patch might degrade planner performance for partitioned rels,
though I have not tested it yet. It is difficult to say without
testing it, but before going there, I think we should first
investigate whats happening in the non-partitioned case.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#83

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#82)

4 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Sat, Mar 24, 2018 at 9:40 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

For me, it is equivalent to the master. The average of ten runs on
the master is 20664.3683 and with all the patches applied it is
20590.4734. I think there is some run-to-run variation, but more or
less there is no visible degradation. I think we have found the root
cause and eliminated it. OTOH, I have found another case where new
patch series seems to degrade.

All right, I have scaled my ambitions back further. Here is a revised
and slimmed-down version of the patch series. If we forget about
"Remove explicit path construction logic in create_ordered_paths" for
now, then we don't really need a new upperrel. So this patch just
modifies the toplevel scan/join rel in place, which should avoid a
bunch of overhead in add_path() and other places, while hopefully
still fixing the originally-reported problem. I haven't tested this
beyond verifying that it passes the regression test, but I've run out
of time for today.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0004-Rewrite-the-code-that-applies-scan-join-targets-to-p.patchapplication/octet-stream; name=0004-Rewrite-the-code-that-applies-scan-join-targets-to-p.patchDownload

From cef36ce24e368b4f7b94478b35d7a9db8d39b482 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 13:16:30 -0400
Subject: [PATCH 4/4] Rewrite the code that applies scan/join targets to paths.

If the toplevel scan/join target list is parallel-safe, postpone
generating Gather (or Gather Merge) paths until after the toplevel
has been adjusted to return it.

If the toplevel scan/join relation is partitioned, recursively apply
the changes to all partitions.  This sometimes allows us to get rid of
Result nodes, because Append is not projection-capable but its
children may be.  It a also cleans up what appears to be incorrect SRF
handling from commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old
code had no knowledge of SRFs for child scan/join rels.

Patch by me.
---
 src/backend/optimizer/plan/planner.c         | 277 ++++++----
 src/include/nodes/relation.h                 |   1 +
 src/test/regress/expected/partition_join.out | 772 +++++++++++++--------------
 3 files changed, 552 insertions(+), 498 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index cffb90999a..d69be7a013 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -222,9 +222,9 @@ static bool can_partial_agg(PlannerInfo *root,
 				const AggClauseCosts *agg_costs);
 static void apply_scanjoin_target_to_paths(PlannerInfo *root,
 							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
-							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place);
+							   List *scanjoin_targets,
+							   List *scanjoin_targets_contain_srfs,
+							   bool scanjoin_target_parallel_safe);
 static void create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *input_rel,
 									RelOptInfo *grouped_rel,
@@ -1962,34 +1962,30 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 		else
 		{
-			/* initialize lists, just to keep compiler quiet */
+			/* initialize lists; for most of these, dummy values are OK */
 			final_targets = final_targets_contain_srfs = NIL;
 			sort_input_targets = sort_input_targets_contain_srfs = NIL;
 			grouping_targets = grouping_targets_contain_srfs = NIL;
-			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
+			scanjoin_targets = list_make1(scanjoin_target);
+			scanjoin_targets_contain_srfs = NIL;
 		}
 
 		/*
-		 * Generate Gather or Gather Merge paths for the topmost scan/join
-		 * relation.  Once that's done, we must re-determine which paths are
-		 * cheapest.  (The previously-cheapest path might even have been
-		 * pfree'd!)
+		 * If the final scan/join target is not parallel-safe, we must
+		 * generate Gather paths now, since no partial paths will be generated
+		 * with the final scan/join targetlist.  Otherwise, the Gather or
+		 * Gather Merge paths generated within apply_scanjoin_target_to_paths
+		 * will be superior to any we might generate now in that the
+		 * projection will be done in by each participant rather than only in
+		 * the leader.
 		 */
-		generate_gather_paths(root, current_rel, false);
-		set_cheapest(current_rel);
+		if (!scanjoin_target_parallel_safe)
+			generate_gather_paths(root, current_rel, false);
 
-		/*
-		 * Forcibly apply SRF-free scan/join target to all the Paths for the
-		 * scan/join rel.
-		 */
-		apply_scanjoin_target_to_paths(root, current_rel, scanjoin_target,
-									   scanjoin_target_parallel_safe, true);
-
-		/* Now fix things up if scan/join target contains SRFs */
-		if (parse->hasTargetSRFs)
-			adjust_paths_for_srfs(root, current_rel,
-								  scanjoin_targets,
-								  scanjoin_targets_contain_srfs);
+		/* Apply scan/join target. */
+		apply_scanjoin_target_to_paths(root, current_rel, scanjoin_targets,
+									   scanjoin_targets_contain_srfs,
+									   scanjoin_target_parallel_safe);
 
 		/*
 		 * Save the various upper-rel PathTargets we just computed into
@@ -2001,6 +1997,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		root->upper_targets[UPPERREL_FINAL] = final_target;
 		root->upper_targets[UPPERREL_WINDOW] = sort_input_target;
 		root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target;
+		root->upper_targets[UPPERREL_TLIST] = scanjoin_target;
 
 		/*
 		 * If we have grouping and/or aggregation, consider ways to implement
@@ -6791,24 +6788,80 @@ can_partial_agg(PlannerInfo *root, const AggClauseCosts *agg_costs)
 /*
  * apply_scanjoin_target_to_paths
  *
- * Applies scan/join target to all the Paths for the scan/join rel.
+ * Adjust the final scan/join relation, and recursively all of its children,
+ * to generate the final scan/join target.  It would be more correct to model
+ * this as a separate planning step with a new RelOptInfo at the toplevel and
+ * for each child relation, but doing it this way is noticeably cheaper.
+ * Maybe that problem can be solved at some point, but for now we do this.
  */
 static void
 apply_scanjoin_target_to_paths(PlannerInfo *root,
 							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
-							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place)
+							   List *scanjoin_targets,
+							   List *scanjoin_targets_contain_srfs,
+							   bool scanjoin_target_parallel_safe)
 {
 	ListCell   *lc;
+	PathTarget *scanjoin_target;
+
+	check_stack_depth();
 
 	/*
-	 * In principle we should re-run set_cheapest() here to identify the
-	 * cheapest path, but it seems unlikely that adding the same tlist eval
-	 * costs to all the paths would change that, so we don't bother. Instead,
-	 * just assume that the cheapest-startup and cheapest-total paths remain
-	 * so.  (There should be no parameterized paths anymore, so we needn't
-	 * worry about updating cheapest_parameterized_paths.)
+	 * If the scan/join target is not parallel-safe, then the new partial
+	 * pathlist is the empty list.
+	 */
+	if (!scanjoin_target_parallel_safe)
+	{
+		rel->partial_pathlist = NIL;
+		rel->consider_parallel = false;
+	}
+
+	/*
+	 * Update the reltarget.  This may not be strictly necessary in all cases,
+	 * but it is at least necessary when create_append_path() gets called
+	 * below directly or indirectly, since that function uses the reltarget as
+	 * the pathtarget for the resulting path.  It seems like a good idea to do
+	 * it unconditionally.
+	 */
+	rel->reltarget = llast_node(PathTarget, scanjoin_targets);
+
+	/* Special case: handly dummy relations separately. */
+	if (IS_DUMMY_REL(rel))
+	{
+		/*
+		 * Since this is a dummy rel, it's got a single Append path with no
+		 * child paths.  Replace it with a new path having the final scan/join
+		 * target.  (Note that since Append is not projection-capable, it would
+		 * be bad to handle this using the general purpose code below; we'd
+		 * end up putting a ProjectionPath on top of the existing Append node,
+		 * which would cause this relation to stop appearing to be a dummy
+		 * rel.)
+		 */
+		rel->pathlist = list_make1(create_append_path(rel, NIL, NIL, NULL,
+													  0, false, NIL, -1));
+		rel->partial_pathlist = NIL;
+		set_cheapest(rel);
+		Assert(IS_DUMMY_REL(rel));
+
+		/*
+		 * Forget about any child relations.  There's no point in adjusting
+		 * them and no point in using them for later planning stages (in
+		 * particular, partitionwise aggregate).
+		 */
+		rel->nparts = 0;
+		rel->part_rels = NULL;
+		rel->boundinfo = NULL;
+
+		return;
+	}
+
+	/* Extract SRF-free scan/join target. */
+	scanjoin_target = linitial_node(PathTarget, scanjoin_targets);
+
+	/*
+	 * Create a projection path for each input path, in each case applying the
+	 * SRF-free scan/join target.  This can't change the ordering of paths
+	 * within rel->pathlist, so we just modify the list in place.
 	 */
 	foreach(lc, rel->pathlist)
 	{
@@ -6817,69 +6870,102 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 
 		Assert(subpath->param_info == NULL);
 
-		/*
-		 * Don't use apply_projection_to_path() when modify_in_place is false,
-		 * because there could be other pointers to these paths, and therefore
-		 * we mustn't modify them in place.
-		 */
-		if (modify_in_place)
-			newpath = apply_projection_to_path(root, rel, subpath,
-											   scanjoin_target);
-		else
-			newpath = (Path *) create_projection_path(root, rel, subpath,
-													  scanjoin_target);
+		newpath = (Path *) create_projection_path(root, rel, subpath,
+												  scanjoin_target);
+		lfirst(lc) = newpath;
+	}
 
-		/* If we had to add a Result, newpath is different from subpath */
-		if (newpath != subpath)
-		{
-			lfirst(lc) = newpath;
-			if (subpath == rel->cheapest_startup_path)
-				rel->cheapest_startup_path = newpath;
-			if (subpath == rel->cheapest_total_path)
-				rel->cheapest_total_path = newpath;
-		}
+	/* Same for partial paths. */
+	foreach(lc, rel->partial_pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+		Path	   *newpath;
+
+		/* Shouldn't have any parameterized paths anymore */
+		Assert(subpath->param_info == NULL);
+
+		newpath = (Path *) create_projection_path(root,
+												  rel,
+												  subpath,
+												  scanjoin_target);
+		lfirst(lc) = newpath;
 	}
 
+	/* Now fix things up if scan/join target contains SRFs */
+	if (root->parse->hasTargetSRFs)
+		adjust_paths_for_srfs(root, rel,
+							  scanjoin_targets,
+							  scanjoin_targets_contain_srfs);
+
 	/*
-	 * Upper planning steps which make use of the top scan/join rel's partial
-	 * pathlist will expect partial paths for that rel to produce the same
-	 * output as complete paths ... and we just changed the output for the
-	 * complete paths, so we'll need to do the same thing for partial paths.
-	 * But only parallel-safe expressions can be computed by partial paths.
+	 * If the relation is partitioned, recurseively apply the same changes to
+	 * all partitions and generate new Append paths. Since Append is not
+	 * projection-capable, that might save a separate Result node, and it also
+	 * is important for partitionwise aggregate.
 	 */
-	if (rel->partial_pathlist && scanjoin_target_parallel_safe)
+	if (rel->part_scheme && rel->part_rels)
 	{
-		/* Apply the scan/join target to each partial path */
-		foreach(lc, rel->partial_pathlist)
+		int			partition_idx;
+		List	   *live_children = NIL;
+
+		/* Adjust each partition. */
+		for (partition_idx = 0; partition_idx < rel->nparts; partition_idx++)
 		{
-			Path	   *subpath = (Path *) lfirst(lc);
-			Path	   *newpath;
+			RelOptInfo *child_rel = rel->part_rels[partition_idx];
+			ListCell   *lc;
+			AppendRelInfo **appinfos;
+			int			nappinfos;
+			List	   *child_scanjoin_targets = NIL;
+
+			/* Translate scan/join targets for this child. */
+			appinfos = find_appinfos_by_relids(root, child_rel->relids,
+											   &nappinfos);
+			foreach(lc, scanjoin_targets)
+			{
+				PathTarget *target = lfirst_node(PathTarget, lc);
+
+				target = copy_pathtarget(target);
+				target->exprs = (List *)
+					adjust_appendrel_attrs(root,
+										   (Node *) target->exprs,
+										   nappinfos, appinfos);
+				child_scanjoin_targets = lappend(child_scanjoin_targets,
+												 target);
+			}
+			pfree(appinfos);
 
-			/* Shouldn't have any parameterized paths anymore */
-			Assert(subpath->param_info == NULL);
+			/* Recursion does the real work. */
+			apply_scanjoin_target_to_paths(root, child_rel,
+										   child_scanjoin_targets,
+										   scanjoin_targets_contain_srfs,
+										   scanjoin_target_parallel_safe);
 
-			/*
-			 * Don't use apply_projection_to_path() here, because there could
-			 * be other pointers to these paths, and therefore we mustn't
-			 * modify them in place.
-			 */
-			newpath = (Path *) create_projection_path(root,
-													  rel,
-													  subpath,
-													  scanjoin_target);
-			lfirst(lc) = newpath;
+			/* Save non-dummy children for Append paths. */
+			if (!IS_DUMMY_REL(child_rel))
+				live_children = lappend(live_children, child_rel);
 		}
+
+		/* Build new paths for this relation by appending child paths. */
+		if (live_children != NIL)
+			add_paths_to_append_rel(root, rel, live_children);
 	}
-	else
-	{
-		/*
-		 * In the unfortunate event that scanjoin_target is not parallel-safe,
-		 * we can't apply it to the partial paths; in that case, we'll need to
-		 * forget about the partial paths, which aren't valid input for upper
-		 * planning steps.
-		 */
-		rel->partial_pathlist = NIL;
-	}
+
+	/*
+	 * Consider generating Gather or Gather Merge paths.  We must only do this
+	 * if the relation is parallel safe, and we don't do it for child rels to
+	 * avoid creating multiple Gather nodes within the same plan. We must do
+	 * this after all paths have been generated and before set_cheapest, since
+	 * one of the generated paths may turn out to be the cheapest one.
+	 */
+	if (rel->consider_parallel && !IS_OTHER_REL(rel))
+		generate_gather_paths(root, rel, false);
+
+	/*
+	 * Reassess which paths are the cheapest, now that we've potentially added
+	 * new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
+	 * this relation.
+	 */
+	set_cheapest(rel);
 }
 
 /*
@@ -6926,7 +7012,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 		PathTarget *child_target = copy_pathtarget(target);
 		AppendRelInfo **appinfos;
 		int			nappinfos;
-		PathTarget *scanjoin_target;
 		GroupPathExtraData child_extra;
 		RelOptInfo *child_grouped_rel;
 		RelOptInfo *child_partially_grouped_rel;
@@ -6983,26 +7068,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			continue;
 		}
 
-		/*
-		 * Copy pathtarget from underneath scan/join as we are modifying it
-		 * and translate its Vars with respect to this appendrel.  The input
-		 * relation's reltarget might not be the final scanjoin_target, but
-		 * the pathtarget any given individual path should be.
-		 */
-		scanjoin_target =
-			copy_pathtarget(input_rel->cheapest_startup_path->pathtarget);
-		scanjoin_target->exprs = (List *)
-			adjust_appendrel_attrs(root,
-								   (Node *) scanjoin_target->exprs,
-								   nappinfos, appinfos);
-
-		/*
-		 * Forcibly apply scan/join target to all the Paths for the scan/join
-		 * rel.
-		 */
-		apply_scanjoin_target_to_paths(root, child_input_rel, scanjoin_target,
-									   extra->target_parallel_safe, false);
-
 		/* Create grouping paths for this child relation. */
 		create_ordinary_grouping_paths(root, child_input_rel,
 									   child_grouped_rel,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..d4bffbc281 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,7 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_TLIST,				/* result of projecting final scan/join rel */
 	UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
 								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 4fccd9ae54..b983f9c506 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -65,31 +65,30 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
 -- left outer join, with whole-row reference
 EXPLAIN (COSTS OFF)
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
-                       QUERY PLAN                       
---------------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t2.b = t1.a)
-                     ->  Seq Scan on prt2_p1 t2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p1 t1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_1.b = t1_1.a)
-                     ->  Seq Scan on prt2_p2 t2_1
-                     ->  Hash
-                           ->  Seq Scan on prt1_p2 t1_1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_2.b = t1_2.a)
-                     ->  Seq Scan on prt2_p3 t2_2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p3 t1_2
-                                 Filter: (b = 0)
-(22 rows)
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t2.b = t1.a)
+               ->  Seq Scan on prt2_p1 t2
+               ->  Hash
+                     ->  Seq Scan on prt1_p1 t1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_1.b = t1_1.a)
+               ->  Seq Scan on prt2_p2 t2_1
+               ->  Hash
+                     ->  Seq Scan on prt1_p2 t1_1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_2.b = t1_2.a)
+               ->  Seq Scan on prt2_p3 t2_2
+               ->  Hash
+                     ->  Seq Scan on prt1_p3 t1_2
+                           Filter: (b = 0)
+(21 rows)
 
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
       t1      |      t2      
@@ -111,30 +110,29 @@ SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER
 -- right outer join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                          QUERY PLAN                           
+---------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t1.a = t2.b)
-                     ->  Seq Scan on prt1_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t1_1.a = t2_1.b)
-                     ->  Seq Scan on prt1_p2 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p2 t2_1
-                                 Filter: (a = 0)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt2_p3 t2_2
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t1.a = t2.b)
+               ->  Seq Scan on prt1_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_p1 t2
                            Filter: (a = 0)
-                     ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
-                           Index Cond: (a = t2_2.b)
-(21 rows)
+         ->  Hash Right Join
+               Hash Cond: (t1_1.a = t2_1.b)
+               ->  Seq Scan on prt1_p2 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_p2 t2_1
+                           Filter: (a = 0)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt2_p3 t2_2
+                     Filter: (a = 0)
+               ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
+                     Index Cond: (a = t2_2.b)
+(20 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -375,37 +373,36 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
 			  ON t1.a = ss.t2a WHERE t1.b = 0 ORDER BY t1.a;
-                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p1 t1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
-                                 Index Cond: (a = t1.a)
-                           ->  Index Scan using iprt2_p1_b on prt2_p1 t3
-                                 Index Cond: (b = t2.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p2 t1_1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
-                                 Index Cond: (a = t1_1.a)
-                           ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
-                                 Index Cond: (b = t2_1.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p3 t1_2
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
-                                 Index Cond: (a = t1_2.a)
-                           ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
-                                 Index Cond: (b = t2_2.a)
-(28 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p1 t1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
+                           Index Cond: (a = t1.a)
+                     ->  Index Scan using iprt2_p1_b on prt2_p1 t3
+                           Index Cond: (b = t2.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p2 t1_1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
+                           Index Cond: (a = t1_1.a)
+                     ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
+                           Index Cond: (b = t2_1.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p3 t1_2
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
+                           Index Cond: (a = t1_2.a)
+                     ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
+                           Index Cond: (b = t2_2.a)
+(27 rows)
 
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
@@ -538,92 +535,90 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_e t1, prt2_e t2 WHERE (t1.a + t1.b)/2 =
 --
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-                                QUERY PLAN                                 
----------------------------------------------------------------------------
+                             QUERY PLAN                              
+---------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop
-                     Join Filter: (t1.a = ((t3.a + t3.b) / 2))
-                     ->  Hash Join
+   ->  Append
+         ->  Nested Loop
+               Join Filter: (t1.a = ((t3.a + t3.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2.b = t1.a)
+                     ->  Seq Scan on prt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p1 t1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
+                     Index Cond: (((a + b) / 2) = t2.b)
+         ->  Nested Loop
+               Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_1.b = t1_1.a)
+                     ->  Seq Scan on prt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_p2 t1_1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
+                     Index Cond: (((a + b) / 2) = t2_1.b)
+         ->  Nested Loop
+               Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_2.b = t1_2.a)
+                     ->  Seq Scan on prt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p3 t1_2
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
+                     Index Cond: (((a + b) / 2) = t2_2.b)
+(33 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
+  a  |  c   |  b  |  c   | ?column? | c 
+-----+------+-----+------+----------+---
+   0 | 0000 |   0 | 0000 |        0 | 0
+ 150 | 0150 | 150 | 0150 |      300 | 0
+ 300 | 0300 | 300 | 0300 |      600 | 0
+ 450 | 0450 | 450 | 0450 |      900 | 0
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
+               ->  Seq Scan on prt1_e_p1 t3
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2.b = t1.a)
                            ->  Seq Scan on prt2_p1 t2
                            ->  Hash
                                  ->  Seq Scan on prt1_p1 t1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
-                           Index Cond: (((a + b) / 2) = t2.b)
-               ->  Nested Loop
-                     Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
+               ->  Seq Scan on prt1_e_p2 t3_1
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_1.b = t1_1.a)
                            ->  Seq Scan on prt2_p2 t2_1
                            ->  Hash
                                  ->  Seq Scan on prt1_p2 t1_1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
-                           Index Cond: (((a + b) / 2) = t2_1.b)
-               ->  Nested Loop
-                     Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
+               ->  Seq Scan on prt1_e_p3 t3_2
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_2.b = t1_2.a)
                            ->  Seq Scan on prt2_p3 t2_2
                            ->  Hash
                                  ->  Seq Scan on prt1_p3 t1_2
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
-                           Index Cond: (((a + b) / 2) = t2_2.b)
-(34 rows)
-
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-  a  |  c   |  b  |  c   | ?column? | c 
------+------+-----+------+----------+---
-   0 | 0000 |   0 | 0000 |        0 | 0
- 150 | 0150 | 150 | 0150 |      300 | 0
- 300 | 0300 | 300 | 0300 |      600 | 0
- 450 | 0450 | 450 | 0450 |      900 | 0
-(4 rows)
-
-EXPLAIN (COSTS OFF)
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Sort
-   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
-                     ->  Seq Scan on prt1_e_p1 t3
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2.b = t1.a)
-                                 ->  Seq Scan on prt2_p1 t2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p1 t1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
-                     ->  Seq Scan on prt1_e_p2 t3_1
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_1.b = t1_1.a)
-                                 ->  Seq Scan on prt2_p2 t2_1
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p2 t1_1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
-                     ->  Seq Scan on prt1_e_p3 t3_2
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_2.b = t1_2.a)
-                                 ->  Seq Scan on prt2_p3 t2_2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p3 t1_2
-                                             Filter: (b = 0)
-(34 rows)
+(33 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -644,40 +639,39 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                            QUERY PLAN                             
+-------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
-                           ->  Seq Scan on prt1_p1 t1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p1 t3
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p1_b on prt2_p1 t2
-                           Index Cond: (t1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                           ->  Seq Scan on prt1_p2 t1_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p2 t3_1
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
-                           Index Cond: (t1_1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                           ->  Seq Scan on prt1_p3 t1_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p3 t3_2
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
-                           Index Cond: (t1_2.a = b)
-(31 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
+                     ->  Seq Scan on prt1_p1 t1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p1 t3
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p1_b on prt2_p1 t2
+                     Index Cond: (t1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+                     ->  Seq Scan on prt1_p2 t1_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p2 t3_1
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
+                     Index Cond: (t1_1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+                     ->  Seq Scan on prt1_p3 t1_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p3 t3_2
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
+                     Index Cond: (t1_2.a = b)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -700,52 +694,51 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                                      QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
+                                                   QUERY PLAN                                                   
+----------------------------------------------------------------------------------------------------------------
  Sort
    Sort Key: prt1_p1.a, prt2_p1.b, ((prt1_e_p1.a + prt1_e_p1.b))
-   ->  Result
-         ->  Append
+   ->  Append
+         ->  Hash Full Join
+               Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
+               Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
-                     Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p1.a = prt2_p1.b)
-                           ->  Seq Scan on prt1_p1
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p1
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p1.a = prt2_p1.b)
+                     ->  Seq Scan on prt1_p1
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p1
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p1
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p1
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
+               Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
-                     Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p2.a = prt2_p2.b)
-                           ->  Seq Scan on prt1_p2
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p2
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p2.a = prt2_p2.b)
+                     ->  Seq Scan on prt1_p2
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p2
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p2
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p2
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
+               Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
-                     Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p3.a = prt2_p3.b)
-                           ->  Seq Scan on prt1_p3
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p3
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p3.a = prt2_p3.b)
+                     ->  Seq Scan on prt1_p3
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p3
-                                 Filter: (c = 0)
-(43 rows)
+                           ->  Seq Scan on prt2_p3
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p3
+                           Filter: (c = 0)
+(42 rows)
 
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
  a  | phv | b  | phv | ?column? | phv 
@@ -933,61 +926,60 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                    QUERY PLAN                                    
-----------------------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Merge Left Join
-                     Merge Cond: (t1.a = t2.b)
-                     ->  Sort
-                           Sort Key: t1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3.a + t3.b) / 2))
-                                       ->  Seq Scan on prt1_e_p1 t3
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1.a
-                                       ->  Seq Scan on prt1_p1 t1
-                     ->  Sort
-                           Sort Key: t2.b
-                           ->  Seq Scan on prt2_p1 t2
-               ->  Merge Left Join
-                     Merge Cond: (t1_1.a = t2_1.b)
-                     ->  Sort
-                           Sort Key: t1_1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_1.a + t3_1.b) / 2))
-                                       ->  Seq Scan on prt1_e_p2 t3_1
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_1.a
-                                       ->  Seq Scan on prt1_p2 t1_1
-                     ->  Sort
-                           Sort Key: t2_1.b
-                           ->  Seq Scan on prt2_p2 t2_1
-               ->  Merge Left Join
-                     Merge Cond: (t1_2.a = t2_2.b)
-                     ->  Sort
-                           Sort Key: t1_2.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_2.a + t3_2.b) / 2))
-                                       ->  Seq Scan on prt1_e_p3 t3_2
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_2.a
-                                       ->  Seq Scan on prt1_p3 t1_2
-                     ->  Sort
-                           Sort Key: t2_2.b
-                           ->  Seq Scan on prt2_p3 t2_2
-(52 rows)
+   ->  Append
+         ->  Merge Left Join
+               Merge Cond: (t1.a = t2.b)
+               ->  Sort
+                     Sort Key: t1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
+                           ->  Sort
+                                 Sort Key: (((t3.a + t3.b) / 2))
+                                 ->  Seq Scan on prt1_e_p1 t3
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1.a
+                                 ->  Seq Scan on prt1_p1 t1
+               ->  Sort
+                     Sort Key: t2.b
+                     ->  Seq Scan on prt2_p1 t2
+         ->  Merge Left Join
+               Merge Cond: (t1_1.a = t2_1.b)
+               ->  Sort
+                     Sort Key: t1_1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
+                           ->  Sort
+                                 Sort Key: (((t3_1.a + t3_1.b) / 2))
+                                 ->  Seq Scan on prt1_e_p2 t3_1
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_1.a
+                                 ->  Seq Scan on prt1_p2 t1_1
+               ->  Sort
+                     Sort Key: t2_1.b
+                     ->  Seq Scan on prt2_p2 t2_1
+         ->  Merge Left Join
+               Merge Cond: (t1_2.a = t2_2.b)
+               ->  Sort
+                     Sort Key: t1_2.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
+                           ->  Sort
+                                 Sort Key: (((t3_2.a + t3_2.b) / 2))
+                                 ->  Seq Scan on prt1_e_p3 t3_2
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_2.a
+                                 ->  Seq Scan on prt1_p3 t1_2
+               ->  Sort
+                     Sort Key: t2_2.b
+                     ->  Seq Scan on prt2_p3 t2_2
+(51 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -1145,42 +1137,41 @@ ANALYZE plt1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on plt1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on plt1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p1 t3
+                                 ->  Seq Scan on plt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on plt1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on plt1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p2 t3_1
+                                 ->  Seq Scan on plt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on plt1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on plt1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on plt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |          avg          |  c   |  c   |   c   
@@ -1290,42 +1281,41 @@ ANALYZE pht1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on pht1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on pht1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p1 t3
+                                 ->  Seq Scan on pht2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on pht1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on pht1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p2 t3_1
+                                 ->  Seq Scan on pht2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on pht1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on pht1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on pht2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |         avg          |  c   |  c   |   c   
@@ -1463,40 +1453,39 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b
 -- right join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
-                     ->  Seq Scan on prt1_l_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p1 t2_1
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p2 t2_2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
+               ->  Seq Scan on prt1_l_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p1 t2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p1 t2_1
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p2 t2_2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
+                     ->  Seq Scan on prt1_l_p3_p2 t1_4
+               ->  Hash
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                           ->  Seq Scan on prt1_l_p3_p2 t1_4
-                     ->  Hash
-                           ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t2_3
-                                       Filter: (a = 0)
-(31 rows)
+                           ->  Seq Scan on prt2_l_p3_p1 t2_3
+                                 Filter: (a = 0)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -1577,55 +1566,54 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
 			  ON t1.a = ss.t2a AND t1.c = ss.t2c WHERE t1.b = 0 ORDER BY t1.a;
-                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p1 t1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
-                           ->  Seq Scan on prt2_l_p1 t3
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p1 t2
-                                       Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p1 t3_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p1 t2_1
-                                       Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p1 t1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
+                     ->  Seq Scan on prt2_l_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p1 t2
+                                 Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p1 t2_1
+                                 Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p2 t2_2
+                                 Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
                            Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p2 t3_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p2 t2_2
-                                       Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
-               ->  Nested Loop Left Join
+               ->  Hash Join
+                     Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                                 Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
+                           ->  Seq Scan on prt2_l_p3_p1 t3_3
+                           ->  Seq Scan on prt2_l_p3_p2 t3_4
+                     ->  Hash
                            ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t3_3
-                                 ->  Seq Scan on prt2_l_p3_p2 t3_4
-                           ->  Hash
-                                 ->  Append
-                                       ->  Seq Scan on prt1_l_p3_p1 t2_3
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-                                       ->  Seq Scan on prt1_l_p3_p2 t2_4
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-(46 rows)
+                                 ->  Seq Scan on prt1_l_p3_p1 t2_3
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+                                 ->  Seq Scan on prt1_l_p3_p2 t2_4
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+(45 rows)
 
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
-- 
2.14.3 (Apple Git-98)

0003-Postpone-generate_gather_paths-for-topmost-scan-join.patchapplication/octet-stream; name=0003-Postpone-generate_gather_paths-for-topmost-scan-join.patchDownload

From 1c7e19721a6c3dd606996bf3de735e88319be44c Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 16:45:15 -0400
Subject: [PATCH 3/4] Postpone generate_gather_paths for topmost scan/join rel.

Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths.  If the scan/join target
is parallel-safe, we actually skip this for the topmost scan/join rel
altogether and instead do it for the tlist_rel, so that the
projection is done in the worker and costs are computed accordingly.

Amit Kapila and Robert Haas
---
 src/backend/optimizer/geqo/geqo_eval.c | 21 ++++++++++++++-------
 src/backend/optimizer/path/allpaths.c  | 26 +++++++++++++++++++-------
 src/backend/optimizer/plan/planner.c   |  9 +++++++++
 3 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 0be2a73e05..3ef7d7d8aa 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partitionwise joins. */
 				generate_partitionwise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel, false);
+				/*
+				 * Except for the topmost scan/join rel, consider gathering
+				 * partial paths.  We'll do the same for the topmost scan/join
+				 * rel once we know the final targetlist (see
+				 * grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, false);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..c4e4db15a6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,13 +479,20 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers.  Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan/join rel (that is, the only baserel),
+	 * we postpone this until the final scan/join targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
 		generate_gather_paths(root, rel, false);
 
 	/*
@@ -2699,8 +2706,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partitionwise joins. */
 			generate_partitionwise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel, false);
+			/*
+			 * Except for the topmost scan/join rel, consider gathering
+			 * partial paths.  We'll do the same for the topmost scan/join rel
+			 * once we know the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 50f858e420..cffb90999a 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1969,6 +1969,15 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
 		}
 
+		/*
+		 * Generate Gather or Gather Merge paths for the topmost scan/join
+		 * relation.  Once that's done, we must re-determine which paths are
+		 * cheapest.  (The previously-cheapest path might even have been
+		 * pfree'd!)
+		 */
+		generate_gather_paths(root, current_rel, false);
+		set_cheapest(current_rel);
+
 		/*
 		 * Forcibly apply SRF-free scan/join target to all the Paths for the
 		 * scan/join rel.
-- 
2.14.3 (Apple Git-98)

0002-CP_IGNORE_TLIST.patchapplication/octet-stream; name=0002-CP_IGNORE_TLIST.patchDownload

From 2217bfc9e92a5a865c20fb60e6b4e6d31fa347db Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Mar 2018 21:32:36 -0400
Subject: [PATCH 2/4] CP_IGNORE_TLIST.

---
 src/backend/optimizer/plan/createplan.c | 98 +++++++++++++++++++++++----------
 1 file changed, 68 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 997d032939..4344557a1d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -62,10 +62,14 @@
  * any sortgrouprefs specified in its pathtarget, with appropriate
  * ressortgroupref labels.  This is passed down by parent nodes such as Sort
  * and Group, which need these values to be available in their inputs.
+ *
+ * CP_IGNORE_TLIST specifies that the caller plans to replace the targetlist,
+ * and therefore it doens't matter a bit what target list gets generated.
  */
 #define CP_EXACT_TLIST		0x0001	/* Plan must return specified tlist */
 #define CP_SMALL_TLIST		0x0002	/* Prefer narrower tlists */
 #define CP_LABEL_TLIST		0x0004	/* tlist must contain sortgrouprefs */
+#define CP_IGNORE_TLIST		0x0008	/* caller will replace tlist */
 
 
 static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path,
@@ -566,8 +570,16 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 	 * only those Vars actually needed by the query), we prefer to generate a
 	 * tlist containing all Vars in order.  This will allow the executor to
 	 * optimize away projection of the table tuples, if possible.
+	 *
+	 * But if the caller is going to ignore our tlist anyway, then don't
+	 * bother generating one at all.  We use an exact equality test here,
+	 * so that this only applies when CP_IGNORE_TLIST is the only flag set.
 	 */
-	if (use_physical_tlist(root, best_path, flags))
+	if (flags == CP_IGNORE_TLIST)
+	{
+		tlist = NULL;
+	}
+	else if (use_physical_tlist(root, best_path, flags))
 	{
 		if (best_path->pathtype == T_IndexOnlyScan)
 		{
@@ -1575,44 +1587,70 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 	Plan	   *plan;
 	Plan	   *subplan;
 	List	   *tlist;
-
-	/* Since we intend to project, we don't need to constrain child tlist */
-	subplan = create_plan_recurse(root, best_path->subpath, 0);
+	bool		needs_result_node = false;
 
 	/*
-	 * If our caller doesn't really care what tlist we return, then we might
-	 * not really need to project.  If use_physical_tlist returns false, then
-	 * we're obliged to project.  If it returns true, we can skip actually
-	 * projecting but must still correctly label the input path's tlist with
-	 * the sortgroupref information if the caller has so requested.
+	 * Convert our subpath to a Plan and determine whether we need a Result
+	 * node.
+	 *
+	 * In most cases where we don't need to project, creation_projection_path
+	 * will have set dummypp, but not always.  First, some createplan.c
+	 * routines change the tlists of their nodes.  (An example is that
+	 * create_merge_append_plan might add resjunk sort columns to a
+	 * MergeAppend.)  Second, create_projection_path has no way of knowing
+	 * what path node will be placed on top of the projection path and
+	 * therefore can't predict whether it will require an exact tlist.
+	 * For both of these reasons, we have to recheck here.
 	 */
-	if (!use_physical_tlist(root, &best_path->path, flags))
-		tlist = build_path_tlist(root, &best_path->path);
-	else if ((flags & CP_LABEL_TLIST) != 0)
+	if (use_physical_tlist(root, &best_path->path, flags))
 	{
-		tlist = copyObject(subplan->targetlist);
-		apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+		/*
+		 * Our caller doesn't really care what tlist we return, so we don't
+		 * actually need to project.  However, we may still need to ensure
+		 * proper sortgroupref labels, if the caller cares about those.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		if ((flags & CP_LABEL_TLIST) == 0)
+			tlist = subplan->targetlist;
+		else
+		{
+			tlist = copyObject(subplan->targetlist);
+			apply_pathtarget_labeling_to_tlist(tlist,
+											   best_path->path.pathtarget);
+		}
+	}
+	else if (is_projection_capable_path(best_path->subpath))
+	{
+		/*
+		 * Our caller requires that we return the exact tlist, but no separate
+		 * result node is needed because the subpath is projection-capable.
+		 * Tell create_plan_recurse that we're going to ignore the tlist it
+		 * produces.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath,
+									  CP_IGNORE_TLIST);
+		tlist = build_path_tlist(root, &best_path->path);
 	}
 	else
-		return subplan;
+	{
+		/*
+		 * It looks like we need a result node, unless by good fortune the
+		 * requested tlist is exactly the one the child wants to produce.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		tlist = build_path_tlist(root, &best_path->path);
+		needs_result_node = !tlist_same_exprs(tlist, subplan->targetlist);
+	}
 
 	/*
-	 * We might not really need a Result node here, either because the subplan
-	 * can project or because it's returning the right list of expressions
-	 * anyway.  Usually create_projection_path will have detected that and set
-	 * dummypp if we don't need a Result; but its decision can't be final,
-	 * because some createplan.c routines change the tlists of their nodes.
-	 * (An example is that create_merge_append_plan might add resjunk sort
-	 * columns to a MergeAppend.)  So we have to recheck here.  If we do
-	 * arrive at a different answer than create_projection_path did, we'll
-	 * have made slightly wrong cost estimates; but label the plan with the
-	 * cost estimates we actually used, not "corrected" ones.  (XXX this could
-	 * be cleaned up if we moved more of the sortcolumn setup logic into Path
-	 * creation, but that would add expense to creating Paths we might end up
-	 * not using.)
+	 * If we make a different decision about whether to include a Result node
+	 * than create_projection_path did, we'll have made slightly wrong cost
+	 * estimates; but label the plan with the cost estimates we actually used,
+	 * not "corrected" ones.  (XXX this could be cleaned up if we moved more of
+	 * the sortcolumn setup logic into Path creation, but that would add
+	 * expense to creating Paths we might end up not using.)
 	 */
-	if (is_projection_capable_path(best_path->subpath) ||
-		tlist_same_exprs(tlist, subplan->targetlist))
+	if (!needs_result_node)
 	{
 		/* Don't need a separate Result, just assign tlist to subplan */
 		plan = subplan;
-- 
2.14.3 (Apple Git-98)

0001-Teach-create_projection_plan-to-omit-projection-wher.patchapplication/octet-stream; name=0001-Teach-create_projection_plan-to-omit-projection-wher.patchDownload

From c9e1e168e2baeeb7203b549c25fb4b7ce305092b Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 12:36:57 -0400
Subject: [PATCH 1/4] Teach create_projection_plan to omit projection where
 possible.

We sometimes insert a ProjectionPath into a plan tree when it isn't
actually needed.  The existing code already provides for the case
where the ProjectionPath's subpath can perform the projection itself
instead of needing a Result node to do it, but previously it didn't
consider the possibility that the parent node might not actually
require the projection.  This optimization also allows the "physical
tlist" optimization to be preserved in some cases where it would not
otherwise happen.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/createplan.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8b4f031d96..997d032939 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -87,7 +87,9 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 				   int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
-static Plan *create_projection_plan(PlannerInfo *root, ProjectionPath *best_path);
+static Plan *create_projection_plan(PlannerInfo *root,
+					   ProjectionPath *best_path,
+					   int flags);
 static Plan *inject_projection_plan(Plan *subplan, List *tlist, bool parallel_safe);
 static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags);
 static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
@@ -400,7 +402,8 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 			if (IsA(best_path, ProjectionPath))
 			{
 				plan = create_projection_plan(root,
-											  (ProjectionPath *) best_path);
+											  (ProjectionPath *) best_path,
+											  flags);
 			}
 			else if (IsA(best_path, MinMaxAggPath))
 			{
@@ -1567,7 +1570,7 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
  *	  but sometimes we can just let the subplan do the work.
  */
 static Plan *
-create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
+create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 {
 	Plan	   *plan;
 	Plan	   *subplan;
@@ -1576,7 +1579,22 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	/* Since we intend to project, we don't need to constrain child tlist */
 	subplan = create_plan_recurse(root, best_path->subpath, 0);
 
-	tlist = build_path_tlist(root, &best_path->path);
+	/*
+	 * If our caller doesn't really care what tlist we return, then we might
+	 * not really need to project.  If use_physical_tlist returns false, then
+	 * we're obliged to project.  If it returns true, we can skip actually
+	 * projecting but must still correctly label the input path's tlist with
+	 * the sortgroupref information if the caller has so requested.
+	 */
+	if (!use_physical_tlist(root, &best_path->path, flags))
+		tlist = build_path_tlist(root, &best_path->path);
+	else if ((flags & CP_LABEL_TLIST) != 0)
+	{
+		tlist = copyObject(subplan->targetlist);
+		apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+	}
+	else
+		return subplan;
 
 	/*
 	 * We might not really need a Result node here, either because the subplan
-- 
2.14.3 (Apple Git-98)

#84

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#83)

Re: [HACKERS] why not parallel seq scan for slow functions

On Tue, Mar 27, 2018 at 3:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Mar 24, 2018 at 9:40 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

For me, it is equivalent to the master. The average of ten runs on
the master is 20664.3683 and with all the patches applied it is
20590.4734. I think there is some run-to-run variation, but more or
less there is no visible degradation. I think we have found the root
cause and eliminated it. OTOH, I have found another case where new
patch series seems to degrade.

All right, I have scaled my ambitions back further. Here is a revised
and slimmed-down version of the patch series.

It still didn't help much. I am seeing the similar regression in one
of the tests [1]Test case -------------- DO $$ DECLARE count integer; BEGIN For count In 1..1000000 Loop Execute 'explain Select count(ten) from tenk1'; END LOOP; END; $$; posted previously.

If we forget about
"Remove explicit path construction logic in create_ordered_paths" for
now, then we don't really need a new upperrel. So this patch just
modifies the toplevel scan/join rel in place, which should avoid a
bunch of overhead in add_path() and other places, while hopefully
still fixing the originally-reported problem.

If we don't want to go with the upperrel logic, then maybe we should
consider just merging some of the other changes from my previous patch
in 0003* patch you have posted and then see if it gets rid of all the
cases where we are seeing a regression with this new approach. I
think that with this approach you want to target the problem of
partitonwise aggregate, but maybe we can deal with it in a separate
patch.

[1]: Test case -------------- DO $$ DECLARE count integer; BEGIN For count In 1..1000000 Loop Execute 'explain Select count(ten) from tenk1'; END LOOP; END; $$;
Test case
--------------
DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain Select count(ten) from tenk1';
END LOOP;
END;
$$;

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#85

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#84)

Re: [HACKERS] why not parallel seq scan for slow functions

On Tue, Mar 27, 2018 at 1:45 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

If we don't want to go with the upperrel logic, then maybe we should
consider just merging some of the other changes from my previous patch
in 0003* patch you have posted and then see if it gets rid of all the
cases where we are seeing a regression with this new approach.

Which changes are you talking about?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#86

robertmhaas@gmail.com

almost 8 years ago

In reply to: Robert Haas (#85)

4 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Tue, Mar 27, 2018 at 7:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 27, 2018 at 1:45 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

If we don't want to go with the upperrel logic, then maybe we should
consider just merging some of the other changes from my previous patch
in 0003* patch you have posted and then see if it gets rid of all the
cases where we are seeing a regression with this new approach.

Which changes are you talking about?

I realized that this version could be optimized in the case where the
scanjoin_target and the topmost scan/join rel have the same
expressions in the target list. Here's a revised patch series that
does that. For me, this is faster than master on your last test case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0004-Rewrite-the-code-that-applies-scan-join-targets-to-p.patchapplication/octet-stream; name=0004-Rewrite-the-code-that-applies-scan-join-targets-to-p.patchDownload

From a04f7590a0fdc093e5f606a3bbcb48019d7df3e1 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 13:16:30 -0400
Subject: [PATCH 4/4] Rewrite the code that applies scan/join targets to paths.

If the toplevel scan/join target list is parallel-safe, postpone
generating Gather (or Gather Merge) paths until after the toplevel has
been adjusted to return it.  This (correctly) makes queries with
expensive functions in the target list more likely to choose a
parallel plan, since the cost of the plan now reflects the fact that
the evaluation will happen in the workers rather than the leader.

If the toplevel scan/join relation is partitioned, recursively apply
the changes to all partitions.  This sometimes allows us to get rid of
Result nodes, because Append is not projection-capable but its
children may be.  It a also cleans up what appears to be incorrect SRF
handling from commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old
code had no knowledge of SRFs for child scan/join rels.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/planner.c         | 283 ++++++----
 src/include/nodes/relation.h                 |   1 +
 src/test/regress/expected/partition_join.out | 772 +++++++++++++--------------
 3 files changed, 568 insertions(+), 488 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 11b20d546b..8562f20f62 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -222,9 +222,10 @@ static bool can_partial_agg(PlannerInfo *root,
 				const AggClauseCosts *agg_costs);
 static void apply_scanjoin_target_to_paths(PlannerInfo *root,
 							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
+							   List *scanjoin_targets,
+							   List *scanjoin_targets_contain_srfs,
 							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place);
+							   bool tlist_same_exprs);
 static void create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *input_rel,
 									RelOptInfo *grouped_rel,
@@ -1746,6 +1747,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		List	   *scanjoin_targets;
 		List	   *scanjoin_targets_contain_srfs;
 		bool		scanjoin_target_parallel_safe;
+		bool		scanjoin_target_same_exprs;
 		bool		have_grouping;
 		AggClauseCosts agg_costs;
 		WindowFuncLists *wflists = NULL;
@@ -1964,34 +1966,33 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 		else
 		{
-			/* initialize lists, just to keep compiler quiet */
+			/* initialize lists; for most of these, dummy values are OK */
 			final_targets = final_targets_contain_srfs = NIL;
 			sort_input_targets = sort_input_targets_contain_srfs = NIL;
 			grouping_targets = grouping_targets_contain_srfs = NIL;
-			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
+			scanjoin_targets = list_make1(scanjoin_target);
+			scanjoin_targets_contain_srfs = NIL;
 		}
 
 		/*
-		 * Generate Gather or Gather Merge paths for the topmost scan/join
-		 * relation.  Once that's done, we must re-determine which paths are
-		 * cheapest.  (The previously-cheapest path might even have been
-		 * pfree'd!)
+		 * If the final scan/join target is not parallel-safe, we must
+		 * generate Gather paths now, since no partial paths will be generated
+		 * with the final scan/join targetlist.  Otherwise, the Gather or
+		 * Gather Merge paths generated within apply_scanjoin_target_to_paths
+		 * will be superior to any we might generate now in that the
+		 * projection will be done in by each participant rather than only in
+		 * the leader.
 		 */
-		generate_gather_paths(root, current_rel, false);
-		set_cheapest(current_rel);
+		if (!scanjoin_target_parallel_safe)
+			generate_gather_paths(root, current_rel, false);
 
-		/*
-		 * Forcibly apply SRF-free scan/join target to all the Paths for the
-		 * scan/join rel.
-		 */
-		apply_scanjoin_target_to_paths(root, current_rel, scanjoin_target,
-									   scanjoin_target_parallel_safe, true);
-
-		/* Now fix things up if scan/join target contains SRFs */
-		if (parse->hasTargetSRFs)
-			adjust_paths_for_srfs(root, current_rel,
-								  scanjoin_targets,
-								  scanjoin_targets_contain_srfs);
+		/* Apply scan/join target. */
+		scanjoin_target_same_exprs = list_length(scanjoin_targets) == 1
+			&& equal(scanjoin_target->exprs, current_rel->reltarget->exprs);
+		apply_scanjoin_target_to_paths(root, current_rel, scanjoin_targets,
+									   scanjoin_targets_contain_srfs,
+									   scanjoin_target_parallel_safe,
+									   scanjoin_target_same_exprs);
 
 		/*
 		 * Save the various upper-rel PathTargets we just computed into
@@ -2003,6 +2004,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		root->upper_targets[UPPERREL_FINAL] = final_target;
 		root->upper_targets[UPPERREL_WINDOW] = sort_input_target;
 		root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target;
+		root->upper_targets[UPPERREL_TLIST] = scanjoin_target;
 
 		/*
 		 * If we have grouping and/or aggregation, consider ways to implement
@@ -6793,24 +6795,88 @@ can_partial_agg(PlannerInfo *root, const AggClauseCosts *agg_costs)
 /*
  * apply_scanjoin_target_to_paths
  *
- * Applies scan/join target to all the Paths for the scan/join rel.
+ * Adjust the final scan/join relation, and recursively all of its children,
+ * to generate the final scan/join target.  It would be more correct to model
+ * this as a separate planning step with a new RelOptInfo at the toplevel and
+ * for each child relation, but doing it this way is noticeably cheaper.
+ * Maybe that problem can be solved at some point, but for now we do this.
+ *
+ * If tlist_same_exprs is true, then the scan/join target to be applied has
+ * the same expressions as the existing reltarget, so we need only insert the
+ * appropriate sortgroupref information.  By avoiding the creation of
+ * projection paths we save effort both immediately and at plan creation time.
  */
 static void
 apply_scanjoin_target_to_paths(PlannerInfo *root,
 							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
+							   List *scanjoin_targets,
+							   List *scanjoin_targets_contain_srfs,
 							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place)
+							   bool tlist_same_exprs)
 {
 	ListCell   *lc;
+	PathTarget *scanjoin_target;
+
+	check_stack_depth();
 
 	/*
-	 * In principle we should re-run set_cheapest() here to identify the
-	 * cheapest path, but it seems unlikely that adding the same tlist eval
-	 * costs to all the paths would change that, so we don't bother. Instead,
-	 * just assume that the cheapest-startup and cheapest-total paths remain
-	 * so.  (There should be no parameterized paths anymore, so we needn't
-	 * worry about updating cheapest_parameterized_paths.)
+	 * If the scan/join target is not parallel-safe, then the new partial
+	 * pathlist is the empty list.
+	 */
+	if (!scanjoin_target_parallel_safe)
+	{
+		rel->partial_pathlist = NIL;
+		rel->consider_parallel = false;
+	}
+
+	/*
+	 * Update the reltarget.  This may not be strictly necessary in all cases,
+	 * but it is at least necessary when create_append_path() gets called
+	 * below directly or indirectly, since that function uses the reltarget as
+	 * the pathtarget for the resulting path.  It seems like a good idea to do
+	 * it unconditionally.
+	 */
+	rel->reltarget = llast_node(PathTarget, scanjoin_targets);
+
+	/* Special case: handly dummy relations separately. */
+	if (IS_DUMMY_REL(rel))
+	{
+		/*
+		 * Since this is a dummy rel, it's got a single Append path with no
+		 * child paths.  Replace it with a new path having the final scan/join
+		 * target.  (Note that since Append is not projection-capable, it
+		 * would be bad to handle this using the general purpose code below;
+		 * we'd end up putting a ProjectionPath on top of the existing Append
+		 * node, which would cause this relation to stop appearing to be a
+		 * dummy rel.)
+		 */
+		rel->pathlist = list_make1(create_append_path(rel, NIL, NIL, NULL,
+													  0, false, NIL, -1));
+		rel->partial_pathlist = NIL;
+		set_cheapest(rel);
+		Assert(IS_DUMMY_REL(rel));
+
+		/*
+		 * Forget about any child relations.  There's no point in adjusting
+		 * them and no point in using them for later planning stages (in
+		 * particular, partitionwise aggregate).
+		 */
+		rel->nparts = 0;
+		rel->part_rels = NULL;
+		rel->boundinfo = NULL;
+
+		return;
+	}
+
+	/* Extract SRF-free scan/join target. */
+	scanjoin_target = linitial_node(PathTarget, scanjoin_targets);
+
+	/*
+	 * Adjust each input path.  If the tlist exprs are the same, we can just
+	 * inject the sortgroupref information into the existing pathtarget.
+	 * Otherwise, replace each path with a projection path that generates the
+	 * SRF-free scan/join target.  This can't change the ordering of paths
+	 * within rel->pathlist, so we just modify the list in place.
 	 */
 	foreach(lc, rel->pathlist)
 	{
@@ -6819,52 +6885,31 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 
 		Assert(subpath->param_info == NULL);
 
-		/*
-		 * Don't use apply_projection_to_path() when modify_in_place is false,
-		 * because there could be other pointers to these paths, and therefore
-		 * we mustn't modify them in place.
-		 */
-		if (modify_in_place)
-			newpath = apply_projection_to_path(root, rel, subpath,
-											   scanjoin_target);
+		if (tlist_same_exprs)
+			subpath->pathtarget->sortgrouprefs =
+				scanjoin_target->sortgrouprefs;
 		else
+		{
 			newpath = (Path *) create_projection_path(root, rel, subpath,
 													  scanjoin_target);
-
-		/* If we had to add a Result, newpath is different from subpath */
-		if (newpath != subpath)
-		{
 			lfirst(lc) = newpath;
-			if (subpath == rel->cheapest_startup_path)
-				rel->cheapest_startup_path = newpath;
-			if (subpath == rel->cheapest_total_path)
-				rel->cheapest_total_path = newpath;
 		}
 	}
 
-	/*
-	 * Upper planning steps which make use of the top scan/join rel's partial
-	 * pathlist will expect partial paths for that rel to produce the same
-	 * output as complete paths ... and we just changed the output for the
-	 * complete paths, so we'll need to do the same thing for partial paths.
-	 * But only parallel-safe expressions can be computed by partial paths.
-	 */
-	if (rel->partial_pathlist && scanjoin_target_parallel_safe)
+	/* Same for partial paths. */
+	foreach(lc, rel->partial_pathlist)
 	{
-		/* Apply the scan/join target to each partial path */
-		foreach(lc, rel->partial_pathlist)
-		{
-			Path	   *subpath = (Path *) lfirst(lc);
-			Path	   *newpath;
+		Path	   *subpath = (Path *) lfirst(lc);
+		Path	   *newpath;
 
-			/* Shouldn't have any parameterized paths anymore */
-			Assert(subpath->param_info == NULL);
+		/* Shouldn't have any parameterized paths anymore */
+		Assert(subpath->param_info == NULL);
 
-			/*
-			 * Don't use apply_projection_to_path() here, because there could
-			 * be other pointers to these paths, and therefore we mustn't
-			 * modify them in place.
-			 */
+		if (tlist_same_exprs)
+			subpath->pathtarget->sortgrouprefs =
+				scanjoin_target->sortgrouprefs;
+		else
+		{
 			newpath = (Path *) create_projection_path(root,
 													  rel,
 													  subpath,
@@ -6872,16 +6917,83 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 			lfirst(lc) = newpath;
 		}
 	}
-	else
+
+	/* Now fix things up if scan/join target contains SRFs */
+	if (root->parse->hasTargetSRFs)
+		adjust_paths_for_srfs(root, rel,
+							  scanjoin_targets,
+							  scanjoin_targets_contain_srfs);
+
+	/*
+	 * If the relation is partitioned, recurseively apply the same changes to
+	 * all partitions and generate new Append paths. Since Append is not
+	 * projection-capable, that might save a separate Result node, and it also
+	 * is important for partitionwise aggregate.
+	 */
+	if (rel->part_scheme && rel->part_rels)
 	{
-		/*
-		 * In the unfortunate event that scanjoin_target is not parallel-safe,
-		 * we can't apply it to the partial paths; in that case, we'll need to
-		 * forget about the partial paths, which aren't valid input for upper
-		 * planning steps.
-		 */
-		rel->partial_pathlist = NIL;
+		int			partition_idx;
+		List	   *live_children = NIL;
+
+		/* Adjust each partition. */
+		for (partition_idx = 0; partition_idx < rel->nparts; partition_idx++)
+		{
+			RelOptInfo *child_rel = rel->part_rels[partition_idx];
+			ListCell   *lc;
+			AppendRelInfo **appinfos;
+			int			nappinfos;
+			List	   *child_scanjoin_targets = NIL;
+
+			/* Translate scan/join targets for this child. */
+			appinfos = find_appinfos_by_relids(root, child_rel->relids,
+											   &nappinfos);
+			foreach(lc, scanjoin_targets)
+			{
+				PathTarget *target = lfirst_node(PathTarget, lc);
+
+				target = copy_pathtarget(target);
+				target->exprs = (List *)
+					adjust_appendrel_attrs(root,
+										   (Node *) target->exprs,
+										   nappinfos, appinfos);
+				child_scanjoin_targets = lappend(child_scanjoin_targets,
+												 target);
+			}
+			pfree(appinfos);
+
+			/* Recursion does the real work. */
+			apply_scanjoin_target_to_paths(root, child_rel,
+										   child_scanjoin_targets,
+										   scanjoin_targets_contain_srfs,
+										   scanjoin_target_parallel_safe,
+										   tlist_same_exprs);
+
+			/* Save non-dummy children for Append paths. */
+			if (!IS_DUMMY_REL(child_rel))
+				live_children = lappend(live_children, child_rel);
+		}
+
+		/* Build new paths for this relation by appending child paths. */
+		if (live_children != NIL)
+			add_paths_to_append_rel(root, rel, live_children);
 	}
+
+	/*
+	 * Consider generating Gather or Gather Merge paths.  We must only do this
+	 * if the relation is parallel safe, and we don't do it for child rels to
+	 * avoid creating multiple Gather nodes within the same plan. We must do
+	 * this after all paths have been generated and before set_cheapest, since
+	 * one of the generated paths may turn out to be the cheapest one.
+	 */
+	if (rel->consider_parallel && !IS_OTHER_REL(rel))
+		generate_gather_paths(root, rel, false);
+
+	/*
+	 * Reassess which paths are the cheapest, now that we've potentially added
+	 * new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
+	 * this relation.
+	 */
+	set_cheapest(rel);
 }
 
 /*
@@ -6928,7 +7040,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 		PathTarget *child_target = copy_pathtarget(target);
 		AppendRelInfo **appinfos;
 		int			nappinfos;
-		PathTarget *scanjoin_target;
 		GroupPathExtraData child_extra;
 		RelOptInfo *child_grouped_rel;
 		RelOptInfo *child_partially_grouped_rel;
@@ -6985,26 +7096,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			continue;
 		}
 
-		/*
-		 * Copy pathtarget from underneath scan/join as we are modifying it
-		 * and translate its Vars with respect to this appendrel.  The input
-		 * relation's reltarget might not be the final scanjoin_target, but
-		 * the pathtarget any given individual path should be.
-		 */
-		scanjoin_target =
-			copy_pathtarget(input_rel->cheapest_startup_path->pathtarget);
-		scanjoin_target->exprs = (List *)
-			adjust_appendrel_attrs(root,
-								   (Node *) scanjoin_target->exprs,
-								   nappinfos, appinfos);
-
-		/*
-		 * Forcibly apply scan/join target to all the Paths for the scan/join
-		 * rel.
-		 */
-		apply_scanjoin_target_to_paths(root, child_input_rel, scanjoin_target,
-									   extra->target_parallel_safe, false);
-
 		/* Create grouping paths for this child relation. */
 		create_ordinary_grouping_paths(root, child_input_rel,
 									   child_grouped_rel,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..d4bffbc281 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,7 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_TLIST,				/* result of projecting final scan/join rel */
 	UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
 								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 4fccd9ae54..b983f9c506 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -65,31 +65,30 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
 -- left outer join, with whole-row reference
 EXPLAIN (COSTS OFF)
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
-                       QUERY PLAN                       
---------------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t2.b = t1.a)
-                     ->  Seq Scan on prt2_p1 t2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p1 t1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_1.b = t1_1.a)
-                     ->  Seq Scan on prt2_p2 t2_1
-                     ->  Hash
-                           ->  Seq Scan on prt1_p2 t1_1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_2.b = t1_2.a)
-                     ->  Seq Scan on prt2_p3 t2_2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p3 t1_2
-                                 Filter: (b = 0)
-(22 rows)
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t2.b = t1.a)
+               ->  Seq Scan on prt2_p1 t2
+               ->  Hash
+                     ->  Seq Scan on prt1_p1 t1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_1.b = t1_1.a)
+               ->  Seq Scan on prt2_p2 t2_1
+               ->  Hash
+                     ->  Seq Scan on prt1_p2 t1_1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_2.b = t1_2.a)
+               ->  Seq Scan on prt2_p3 t2_2
+               ->  Hash
+                     ->  Seq Scan on prt1_p3 t1_2
+                           Filter: (b = 0)
+(21 rows)
 
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
       t1      |      t2      
@@ -111,30 +110,29 @@ SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER
 -- right outer join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                          QUERY PLAN                           
+---------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t1.a = t2.b)
-                     ->  Seq Scan on prt1_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t1_1.a = t2_1.b)
-                     ->  Seq Scan on prt1_p2 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p2 t2_1
-                                 Filter: (a = 0)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt2_p3 t2_2
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t1.a = t2.b)
+               ->  Seq Scan on prt1_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_p1 t2
                            Filter: (a = 0)
-                     ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
-                           Index Cond: (a = t2_2.b)
-(21 rows)
+         ->  Hash Right Join
+               Hash Cond: (t1_1.a = t2_1.b)
+               ->  Seq Scan on prt1_p2 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_p2 t2_1
+                           Filter: (a = 0)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt2_p3 t2_2
+                     Filter: (a = 0)
+               ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
+                     Index Cond: (a = t2_2.b)
+(20 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -375,37 +373,36 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
 			  ON t1.a = ss.t2a WHERE t1.b = 0 ORDER BY t1.a;
-                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p1 t1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
-                                 Index Cond: (a = t1.a)
-                           ->  Index Scan using iprt2_p1_b on prt2_p1 t3
-                                 Index Cond: (b = t2.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p2 t1_1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
-                                 Index Cond: (a = t1_1.a)
-                           ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
-                                 Index Cond: (b = t2_1.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p3 t1_2
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
-                                 Index Cond: (a = t1_2.a)
-                           ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
-                                 Index Cond: (b = t2_2.a)
-(28 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p1 t1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
+                           Index Cond: (a = t1.a)
+                     ->  Index Scan using iprt2_p1_b on prt2_p1 t3
+                           Index Cond: (b = t2.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p2 t1_1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
+                           Index Cond: (a = t1_1.a)
+                     ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
+                           Index Cond: (b = t2_1.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p3 t1_2
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
+                           Index Cond: (a = t1_2.a)
+                     ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
+                           Index Cond: (b = t2_2.a)
+(27 rows)
 
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
@@ -538,92 +535,90 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_e t1, prt2_e t2 WHERE (t1.a + t1.b)/2 =
 --
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-                                QUERY PLAN                                 
----------------------------------------------------------------------------
+                             QUERY PLAN                              
+---------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop
-                     Join Filter: (t1.a = ((t3.a + t3.b) / 2))
-                     ->  Hash Join
+   ->  Append
+         ->  Nested Loop
+               Join Filter: (t1.a = ((t3.a + t3.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2.b = t1.a)
+                     ->  Seq Scan on prt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p1 t1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
+                     Index Cond: (((a + b) / 2) = t2.b)
+         ->  Nested Loop
+               Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_1.b = t1_1.a)
+                     ->  Seq Scan on prt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_p2 t1_1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
+                     Index Cond: (((a + b) / 2) = t2_1.b)
+         ->  Nested Loop
+               Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_2.b = t1_2.a)
+                     ->  Seq Scan on prt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p3 t1_2
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
+                     Index Cond: (((a + b) / 2) = t2_2.b)
+(33 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
+  a  |  c   |  b  |  c   | ?column? | c 
+-----+------+-----+------+----------+---
+   0 | 0000 |   0 | 0000 |        0 | 0
+ 150 | 0150 | 150 | 0150 |      300 | 0
+ 300 | 0300 | 300 | 0300 |      600 | 0
+ 450 | 0450 | 450 | 0450 |      900 | 0
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
+               ->  Seq Scan on prt1_e_p1 t3
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2.b = t1.a)
                            ->  Seq Scan on prt2_p1 t2
                            ->  Hash
                                  ->  Seq Scan on prt1_p1 t1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
-                           Index Cond: (((a + b) / 2) = t2.b)
-               ->  Nested Loop
-                     Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
+               ->  Seq Scan on prt1_e_p2 t3_1
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_1.b = t1_1.a)
                            ->  Seq Scan on prt2_p2 t2_1
                            ->  Hash
                                  ->  Seq Scan on prt1_p2 t1_1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
-                           Index Cond: (((a + b) / 2) = t2_1.b)
-               ->  Nested Loop
-                     Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
+               ->  Seq Scan on prt1_e_p3 t3_2
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_2.b = t1_2.a)
                            ->  Seq Scan on prt2_p3 t2_2
                            ->  Hash
                                  ->  Seq Scan on prt1_p3 t1_2
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
-                           Index Cond: (((a + b) / 2) = t2_2.b)
-(34 rows)
-
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-  a  |  c   |  b  |  c   | ?column? | c 
------+------+-----+------+----------+---
-   0 | 0000 |   0 | 0000 |        0 | 0
- 150 | 0150 | 150 | 0150 |      300 | 0
- 300 | 0300 | 300 | 0300 |      600 | 0
- 450 | 0450 | 450 | 0450 |      900 | 0
-(4 rows)
-
-EXPLAIN (COSTS OFF)
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Sort
-   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
-                     ->  Seq Scan on prt1_e_p1 t3
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2.b = t1.a)
-                                 ->  Seq Scan on prt2_p1 t2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p1 t1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
-                     ->  Seq Scan on prt1_e_p2 t3_1
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_1.b = t1_1.a)
-                                 ->  Seq Scan on prt2_p2 t2_1
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p2 t1_1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
-                     ->  Seq Scan on prt1_e_p3 t3_2
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_2.b = t1_2.a)
-                                 ->  Seq Scan on prt2_p3 t2_2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p3 t1_2
-                                             Filter: (b = 0)
-(34 rows)
+(33 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -644,40 +639,39 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                            QUERY PLAN                             
+-------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
-                           ->  Seq Scan on prt1_p1 t1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p1 t3
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p1_b on prt2_p1 t2
-                           Index Cond: (t1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                           ->  Seq Scan on prt1_p2 t1_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p2 t3_1
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
-                           Index Cond: (t1_1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                           ->  Seq Scan on prt1_p3 t1_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p3 t3_2
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
-                           Index Cond: (t1_2.a = b)
-(31 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
+                     ->  Seq Scan on prt1_p1 t1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p1 t3
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p1_b on prt2_p1 t2
+                     Index Cond: (t1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+                     ->  Seq Scan on prt1_p2 t1_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p2 t3_1
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
+                     Index Cond: (t1_1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+                     ->  Seq Scan on prt1_p3 t1_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p3 t3_2
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
+                     Index Cond: (t1_2.a = b)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -700,52 +694,51 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                                      QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
+                                                   QUERY PLAN                                                   
+----------------------------------------------------------------------------------------------------------------
  Sort
    Sort Key: prt1_p1.a, prt2_p1.b, ((prt1_e_p1.a + prt1_e_p1.b))
-   ->  Result
-         ->  Append
+   ->  Append
+         ->  Hash Full Join
+               Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
+               Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
-                     Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p1.a = prt2_p1.b)
-                           ->  Seq Scan on prt1_p1
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p1
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p1.a = prt2_p1.b)
+                     ->  Seq Scan on prt1_p1
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p1
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p1
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p1
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
+               Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
-                     Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p2.a = prt2_p2.b)
-                           ->  Seq Scan on prt1_p2
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p2
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p2.a = prt2_p2.b)
+                     ->  Seq Scan on prt1_p2
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p2
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p2
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p2
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
+               Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
-                     Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p3.a = prt2_p3.b)
-                           ->  Seq Scan on prt1_p3
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p3
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p3.a = prt2_p3.b)
+                     ->  Seq Scan on prt1_p3
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p3
-                                 Filter: (c = 0)
-(43 rows)
+                           ->  Seq Scan on prt2_p3
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p3
+                           Filter: (c = 0)
+(42 rows)
 
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
  a  | phv | b  | phv | ?column? | phv 
@@ -933,61 +926,60 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                    QUERY PLAN                                    
-----------------------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Merge Left Join
-                     Merge Cond: (t1.a = t2.b)
-                     ->  Sort
-                           Sort Key: t1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3.a + t3.b) / 2))
-                                       ->  Seq Scan on prt1_e_p1 t3
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1.a
-                                       ->  Seq Scan on prt1_p1 t1
-                     ->  Sort
-                           Sort Key: t2.b
-                           ->  Seq Scan on prt2_p1 t2
-               ->  Merge Left Join
-                     Merge Cond: (t1_1.a = t2_1.b)
-                     ->  Sort
-                           Sort Key: t1_1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_1.a + t3_1.b) / 2))
-                                       ->  Seq Scan on prt1_e_p2 t3_1
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_1.a
-                                       ->  Seq Scan on prt1_p2 t1_1
-                     ->  Sort
-                           Sort Key: t2_1.b
-                           ->  Seq Scan on prt2_p2 t2_1
-               ->  Merge Left Join
-                     Merge Cond: (t1_2.a = t2_2.b)
-                     ->  Sort
-                           Sort Key: t1_2.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_2.a + t3_2.b) / 2))
-                                       ->  Seq Scan on prt1_e_p3 t3_2
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_2.a
-                                       ->  Seq Scan on prt1_p3 t1_2
-                     ->  Sort
-                           Sort Key: t2_2.b
-                           ->  Seq Scan on prt2_p3 t2_2
-(52 rows)
+   ->  Append
+         ->  Merge Left Join
+               Merge Cond: (t1.a = t2.b)
+               ->  Sort
+                     Sort Key: t1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
+                           ->  Sort
+                                 Sort Key: (((t3.a + t3.b) / 2))
+                                 ->  Seq Scan on prt1_e_p1 t3
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1.a
+                                 ->  Seq Scan on prt1_p1 t1
+               ->  Sort
+                     Sort Key: t2.b
+                     ->  Seq Scan on prt2_p1 t2
+         ->  Merge Left Join
+               Merge Cond: (t1_1.a = t2_1.b)
+               ->  Sort
+                     Sort Key: t1_1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
+                           ->  Sort
+                                 Sort Key: (((t3_1.a + t3_1.b) / 2))
+                                 ->  Seq Scan on prt1_e_p2 t3_1
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_1.a
+                                 ->  Seq Scan on prt1_p2 t1_1
+               ->  Sort
+                     Sort Key: t2_1.b
+                     ->  Seq Scan on prt2_p2 t2_1
+         ->  Merge Left Join
+               Merge Cond: (t1_2.a = t2_2.b)
+               ->  Sort
+                     Sort Key: t1_2.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
+                           ->  Sort
+                                 Sort Key: (((t3_2.a + t3_2.b) / 2))
+                                 ->  Seq Scan on prt1_e_p3 t3_2
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_2.a
+                                 ->  Seq Scan on prt1_p3 t1_2
+               ->  Sort
+                     Sort Key: t2_2.b
+                     ->  Seq Scan on prt2_p3 t2_2
+(51 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -1145,42 +1137,41 @@ ANALYZE plt1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on plt1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on plt1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p1 t3
+                                 ->  Seq Scan on plt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on plt1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on plt1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p2 t3_1
+                                 ->  Seq Scan on plt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on plt1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on plt1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on plt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |          avg          |  c   |  c   |   c   
@@ -1290,42 +1281,41 @@ ANALYZE pht1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on pht1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on pht1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p1 t3
+                                 ->  Seq Scan on pht2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on pht1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on pht1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p2 t3_1
+                                 ->  Seq Scan on pht2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on pht1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on pht1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on pht2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |         avg          |  c   |  c   |   c   
@@ -1463,40 +1453,39 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b
 -- right join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
-                     ->  Seq Scan on prt1_l_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p1 t2_1
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p2 t2_2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
+               ->  Seq Scan on prt1_l_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p1 t2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p1 t2_1
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p2 t2_2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
+                     ->  Seq Scan on prt1_l_p3_p2 t1_4
+               ->  Hash
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                           ->  Seq Scan on prt1_l_p3_p2 t1_4
-                     ->  Hash
-                           ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t2_3
-                                       Filter: (a = 0)
-(31 rows)
+                           ->  Seq Scan on prt2_l_p3_p1 t2_3
+                                 Filter: (a = 0)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -1577,55 +1566,54 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
 			  ON t1.a = ss.t2a AND t1.c = ss.t2c WHERE t1.b = 0 ORDER BY t1.a;
-                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p1 t1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
-                           ->  Seq Scan on prt2_l_p1 t3
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p1 t2
-                                       Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p1 t3_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p1 t2_1
-                                       Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p1 t1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
+                     ->  Seq Scan on prt2_l_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p1 t2
+                                 Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p1 t2_1
+                                 Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p2 t2_2
+                                 Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
                            Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p2 t3_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p2 t2_2
-                                       Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
-               ->  Nested Loop Left Join
+               ->  Hash Join
+                     Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                                 Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
+                           ->  Seq Scan on prt2_l_p3_p1 t3_3
+                           ->  Seq Scan on prt2_l_p3_p2 t3_4
+                     ->  Hash
                            ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t3_3
-                                 ->  Seq Scan on prt2_l_p3_p2 t3_4
-                           ->  Hash
-                                 ->  Append
-                                       ->  Seq Scan on prt1_l_p3_p1 t2_3
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-                                       ->  Seq Scan on prt1_l_p3_p2 t2_4
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-(46 rows)
+                                 ->  Seq Scan on prt1_l_p3_p1 t2_3
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+                                 ->  Seq Scan on prt1_l_p3_p2 t2_4
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+(45 rows)
 
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
-- 
2.14.3 (Apple Git-98)

0003-Postpone-generate_gather_paths-for-topmost-scan-join.patchapplication/octet-stream; name=0003-Postpone-generate_gather_paths-for-topmost-scan-join.patchDownload

From 8132b6d5a247ae9ebf83623b29fec80a50fb6047 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 16:45:15 -0400
Subject: [PATCH 3/4] Postpone generate_gather_paths for topmost scan/join rel.

Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths.  If the scan/join target
is parallel-safe, we actually skip this for the topmost scan/join rel
altogether and instead do it for the tlist_rel, so that the
projection is done in the worker and costs are computed accordingly.

Amit Kapila and Robert Haas
---
 src/backend/optimizer/geqo/geqo_eval.c | 21 ++++++++++++++-------
 src/backend/optimizer/path/allpaths.c  | 26 +++++++++++++++++++-------
 src/backend/optimizer/plan/planner.c   |  9 +++++++++
 3 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 0be2a73e05..3ef7d7d8aa 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partitionwise joins. */
 				generate_partitionwise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel, false);
+				/*
+				 * Except for the topmost scan/join rel, consider gathering
+				 * partial paths.  We'll do the same for the topmost scan/join
+				 * rel once we know the final targetlist (see
+				 * grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, false);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..c4e4db15a6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,13 +479,20 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers.  Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan/join rel (that is, the only baserel),
+	 * we postpone this until the final scan/join targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
 		generate_gather_paths(root, rel, false);
 
 	/*
@@ -2699,8 +2706,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partitionwise joins. */
 			generate_partitionwise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel, false);
+			/*
+			 * Except for the topmost scan/join rel, consider gathering
+			 * partial paths.  We'll do the same for the topmost scan/join rel
+			 * once we know the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 52c21e6870..11b20d546b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1971,6 +1971,15 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
 		}
 
+		/*
+		 * Generate Gather or Gather Merge paths for the topmost scan/join
+		 * relation.  Once that's done, we must re-determine which paths are
+		 * cheapest.  (The previously-cheapest path might even have been
+		 * pfree'd!)
+		 */
+		generate_gather_paths(root, current_rel, false);
+		set_cheapest(current_rel);
+
 		/*
 		 * Forcibly apply SRF-free scan/join target to all the Paths for the
 		 * scan/join rel.
-- 
2.14.3 (Apple Git-98)

0002-CP_IGNORE_TLIST.patchapplication/octet-stream; name=0002-CP_IGNORE_TLIST.patchDownload

From 0f9a5e1cfd5808bda400cec03d62c72665d72d54 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Fri, 23 Mar 2018 21:32:36 -0400
Subject: [PATCH 2/4] CP_IGNORE_TLIST.

---
 src/backend/optimizer/plan/createplan.c | 98 +++++++++++++++++++++++----------
 1 file changed, 68 insertions(+), 30 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 997d032939..4344557a1d 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -62,10 +62,14 @@
  * any sortgrouprefs specified in its pathtarget, with appropriate
  * ressortgroupref labels.  This is passed down by parent nodes such as Sort
  * and Group, which need these values to be available in their inputs.
+ *
+ * CP_IGNORE_TLIST specifies that the caller plans to replace the targetlist,
+ * and therefore it doens't matter a bit what target list gets generated.
  */
 #define CP_EXACT_TLIST		0x0001	/* Plan must return specified tlist */
 #define CP_SMALL_TLIST		0x0002	/* Prefer narrower tlists */
 #define CP_LABEL_TLIST		0x0004	/* tlist must contain sortgrouprefs */
+#define CP_IGNORE_TLIST		0x0008	/* caller will replace tlist */
 
 
 static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path,
@@ -566,8 +570,16 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 	 * only those Vars actually needed by the query), we prefer to generate a
 	 * tlist containing all Vars in order.  This will allow the executor to
 	 * optimize away projection of the table tuples, if possible.
+	 *
+	 * But if the caller is going to ignore our tlist anyway, then don't
+	 * bother generating one at all.  We use an exact equality test here,
+	 * so that this only applies when CP_IGNORE_TLIST is the only flag set.
 	 */
-	if (use_physical_tlist(root, best_path, flags))
+	if (flags == CP_IGNORE_TLIST)
+	{
+		tlist = NULL;
+	}
+	else if (use_physical_tlist(root, best_path, flags))
 	{
 		if (best_path->pathtype == T_IndexOnlyScan)
 		{
@@ -1575,44 +1587,70 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 	Plan	   *plan;
 	Plan	   *subplan;
 	List	   *tlist;
-
-	/* Since we intend to project, we don't need to constrain child tlist */
-	subplan = create_plan_recurse(root, best_path->subpath, 0);
+	bool		needs_result_node = false;
 
 	/*
-	 * If our caller doesn't really care what tlist we return, then we might
-	 * not really need to project.  If use_physical_tlist returns false, then
-	 * we're obliged to project.  If it returns true, we can skip actually
-	 * projecting but must still correctly label the input path's tlist with
-	 * the sortgroupref information if the caller has so requested.
+	 * Convert our subpath to a Plan and determine whether we need a Result
+	 * node.
+	 *
+	 * In most cases where we don't need to project, creation_projection_path
+	 * will have set dummypp, but not always.  First, some createplan.c
+	 * routines change the tlists of their nodes.  (An example is that
+	 * create_merge_append_plan might add resjunk sort columns to a
+	 * MergeAppend.)  Second, create_projection_path has no way of knowing
+	 * what path node will be placed on top of the projection path and
+	 * therefore can't predict whether it will require an exact tlist.
+	 * For both of these reasons, we have to recheck here.
 	 */
-	if (!use_physical_tlist(root, &best_path->path, flags))
-		tlist = build_path_tlist(root, &best_path->path);
-	else if ((flags & CP_LABEL_TLIST) != 0)
+	if (use_physical_tlist(root, &best_path->path, flags))
 	{
-		tlist = copyObject(subplan->targetlist);
-		apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+		/*
+		 * Our caller doesn't really care what tlist we return, so we don't
+		 * actually need to project.  However, we may still need to ensure
+		 * proper sortgroupref labels, if the caller cares about those.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		if ((flags & CP_LABEL_TLIST) == 0)
+			tlist = subplan->targetlist;
+		else
+		{
+			tlist = copyObject(subplan->targetlist);
+			apply_pathtarget_labeling_to_tlist(tlist,
+											   best_path->path.pathtarget);
+		}
+	}
+	else if (is_projection_capable_path(best_path->subpath))
+	{
+		/*
+		 * Our caller requires that we return the exact tlist, but no separate
+		 * result node is needed because the subpath is projection-capable.
+		 * Tell create_plan_recurse that we're going to ignore the tlist it
+		 * produces.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath,
+									  CP_IGNORE_TLIST);
+		tlist = build_path_tlist(root, &best_path->path);
 	}
 	else
-		return subplan;
+	{
+		/*
+		 * It looks like we need a result node, unless by good fortune the
+		 * requested tlist is exactly the one the child wants to produce.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		tlist = build_path_tlist(root, &best_path->path);
+		needs_result_node = !tlist_same_exprs(tlist, subplan->targetlist);
+	}
 
 	/*
-	 * We might not really need a Result node here, either because the subplan
-	 * can project or because it's returning the right list of expressions
-	 * anyway.  Usually create_projection_path will have detected that and set
-	 * dummypp if we don't need a Result; but its decision can't be final,
-	 * because some createplan.c routines change the tlists of their nodes.
-	 * (An example is that create_merge_append_plan might add resjunk sort
-	 * columns to a MergeAppend.)  So we have to recheck here.  If we do
-	 * arrive at a different answer than create_projection_path did, we'll
-	 * have made slightly wrong cost estimates; but label the plan with the
-	 * cost estimates we actually used, not "corrected" ones.  (XXX this could
-	 * be cleaned up if we moved more of the sortcolumn setup logic into Path
-	 * creation, but that would add expense to creating Paths we might end up
-	 * not using.)
+	 * If we make a different decision about whether to include a Result node
+	 * than create_projection_path did, we'll have made slightly wrong cost
+	 * estimates; but label the plan with the cost estimates we actually used,
+	 * not "corrected" ones.  (XXX this could be cleaned up if we moved more of
+	 * the sortcolumn setup logic into Path creation, but that would add
+	 * expense to creating Paths we might end up not using.)
 	 */
-	if (is_projection_capable_path(best_path->subpath) ||
-		tlist_same_exprs(tlist, subplan->targetlist))
+	if (!needs_result_node)
 	{
 		/* Don't need a separate Result, just assign tlist to subplan */
 		plan = subplan;
-- 
2.14.3 (Apple Git-98)

0001-Teach-create_projection_plan-to-omit-projection-wher.patchapplication/octet-stream; name=0001-Teach-create_projection_plan-to-omit-projection-wher.patchDownload

From a312e38f8f78e384a505761e0fc2f514060d6d06 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 12:36:57 -0400
Subject: [PATCH 1/4] Teach create_projection_plan to omit projection where
 possible.

We sometimes insert a ProjectionPath into a plan tree when it isn't
actually needed.  The existing code already provides for the case
where the ProjectionPath's subpath can perform the projection itself
instead of needing a Result node to do it, but previously it didn't
consider the possibility that the parent node might not actually
require the projection.  This optimization also allows the "physical
tlist" optimization to be preserved in some cases where it would not
otherwise happen.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/createplan.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8b4f031d96..997d032939 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -87,7 +87,9 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 				   int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
-static Plan *create_projection_plan(PlannerInfo *root, ProjectionPath *best_path);
+static Plan *create_projection_plan(PlannerInfo *root,
+					   ProjectionPath *best_path,
+					   int flags);
 static Plan *inject_projection_plan(Plan *subplan, List *tlist, bool parallel_safe);
 static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags);
 static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
@@ -400,7 +402,8 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 			if (IsA(best_path, ProjectionPath))
 			{
 				plan = create_projection_plan(root,
-											  (ProjectionPath *) best_path);
+											  (ProjectionPath *) best_path,
+											  flags);
 			}
 			else if (IsA(best_path, MinMaxAggPath))
 			{
@@ -1567,7 +1570,7 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
  *	  but sometimes we can just let the subplan do the work.
  */
 static Plan *
-create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
+create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 {
 	Plan	   *plan;
 	Plan	   *subplan;
@@ -1576,7 +1579,22 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	/* Since we intend to project, we don't need to constrain child tlist */
 	subplan = create_plan_recurse(root, best_path->subpath, 0);
 
-	tlist = build_path_tlist(root, &best_path->path);
+	/*
+	 * If our caller doesn't really care what tlist we return, then we might
+	 * not really need to project.  If use_physical_tlist returns false, then
+	 * we're obliged to project.  If it returns true, we can skip actually
+	 * projecting but must still correctly label the input path's tlist with
+	 * the sortgroupref information if the caller has so requested.
+	 */
+	if (!use_physical_tlist(root, &best_path->path, flags))
+		tlist = build_path_tlist(root, &best_path->path);
+	else if ((flags & CP_LABEL_TLIST) != 0)
+	{
+		tlist = copyObject(subplan->targetlist);
+		apply_pathtarget_labeling_to_tlist(tlist, best_path->path.pathtarget);
+	}
+	else
+		return subplan;
 
 	/*
 	 * We might not really need a Result node here, either because the subplan
-- 
2.14.3 (Apple Git-98)

#87

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#86)

Re: [HACKERS] why not parallel seq scan for slow functions

On Tue, Mar 27, 2018 at 10:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 27, 2018 at 7:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 27, 2018 at 1:45 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

If we don't want to go with the upperrel logic, then maybe we should
consider just merging some of the other changes from my previous patch
in 0003* patch you have posted and then see if it gets rid of all the
cases where we are seeing a regression with this new approach.

Which changes are you talking about?

I realized that this version could be optimized in the case where the
scanjoin_target and the topmost scan/join rel have the same
expressions in the target list.

Good idea, such an optimization will ensure that the cases reported
above will not have regression. However isn't it somewhat beating the
idea you are using in the patch which is to avoid modifying the path
in-place? In any case, I think one will still see regression in cases
where this optimization doesn't apply. For example,

DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain Select sum(thousand)from tenk1 group by ten';
END LOOP;
END;
$$;

The above block takes 43700.0289 ms on Head and 45025.3779 ms with the
patch which is approximately 3% regression. In this case, the
regression is lesser than the previously reported cases, but I guess
we might see bigger regression if the number of columns is more or
with a different set of queries. I think the cases which can
slowdown are where we need to use physical tlist in
create_projection_plan and the caller has requested CP_LABEL_TLIST. I
have checked in regression tests and there seem to be more cases which
will be impacted. Another such example from regression tests is:
select count(*) >= 0 as ok from pg_available_extension_versions;

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#88

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#87)

3 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Wed, Mar 28, 2018 at 3:06 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

Good idea, such an optimization will ensure that the cases reported
above will not have regression. However isn't it somewhat beating the
idea you are using in the patch which is to avoid modifying the path
in-place?

Sure, but you can't have everything. I don't think modifying the
sortgroupref data in place is really quite the same thing as changing
the pathtarget in place; the sortgroupref stuff is some extra
information about the targets being computed, not really a change in
targets per se. But in any case if you want to eliminate extra work
then we've gotta eliminate it...

In any case, I think one will still see regression in cases
where this optimization doesn't apply. For example,

DO $$
DECLARE count integer;
BEGIN
For count In 1..1000000 Loop
Execute 'explain Select sum(thousand)from tenk1 group by ten';
END LOOP;
END;
$$;

The above block takes 43700.0289 ms on Head and 45025.3779 ms with the
patch which is approximately 3% regression.

Thanks for the analysis -- the observation that this seemed to affect
cases where CP_LABEL_TLIST gets passed to create_projection_plan
allowed me to recognize that I was doing an unnecessary copyObject()
call. Removing that seems to have reduced this regression below 1% in
my testing.

Also, keep in mind that we're talking about extremely small amounts of
time here. On a trivial query that you're not even executing, you're
seeing a difference of (45025.3779 - 43700.0289)/1000000 = 0.00132 ms
per execution. Sure, it's still 3%, but it's 3% of the time in an
artificial case where you don't actually run the query. In real life,
nobody runs EXPLAIN in a tight loop a million times without ever
running the query, because that's not a useful thing to do. The
overhead on a realistic test case will be smaller. Furthermore, at
least in my testing, there are now cases where this is faster than
master. Now, I welcome further ideas for optimization, but a patch
that makes some cases slightly slower while making others slightly
faster, and also improving the quality of plans in some cases, is not
to my mind an unreasonable thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0003-Rewrite-the-code-that-applies-scan-join-targets-to-p.patchapplication/octet-stream; name=0003-Rewrite-the-code-that-applies-scan-join-targets-to-p.patchDownload

From 8ed4b4b18f47e54ced0c4f68f349f5436fe01376 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 13:16:30 -0400
Subject: [PATCH 3/3] Rewrite the code that applies scan/join targets to paths.

If the toplevel scan/join target list is parallel-safe, postpone
generating Gather (or Gather Merge) paths until after the toplevel has
been adjusted to return it.  This (correctly) makes queries with
expensive functions in the target list more likely to choose a
parallel plan, since the cost of the plan now reflects the fact that
the evaluation will happen in the workers rather than the leader.

If the toplevel scan/join relation is partitioned, recursively apply
the changes to all partitions.  This sometimes allows us to get rid of
Result nodes, because Append is not projection-capable but its
children may be.  It a also cleans up what appears to be incorrect SRF
handling from commit e2f1eb0ee30d144628ab523432320f174a2c8966: the old
code had no knowledge of SRFs for child scan/join rels.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/planner.c         | 283 ++++++----
 src/include/nodes/relation.h                 |   1 +
 src/test/regress/expected/partition_join.out | 772 +++++++++++++--------------
 3 files changed, 568 insertions(+), 488 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 11b20d546b..8562f20f62 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -222,9 +222,10 @@ static bool can_partial_agg(PlannerInfo *root,
 				const AggClauseCosts *agg_costs);
 static void apply_scanjoin_target_to_paths(PlannerInfo *root,
 							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
+							   List *scanjoin_targets,
+							   List *scanjoin_targets_contain_srfs,
 							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place);
+							   bool tlist_same_exprs);
 static void create_partitionwise_grouping_paths(PlannerInfo *root,
 									RelOptInfo *input_rel,
 									RelOptInfo *grouped_rel,
@@ -1746,6 +1747,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		List	   *scanjoin_targets;
 		List	   *scanjoin_targets_contain_srfs;
 		bool		scanjoin_target_parallel_safe;
+		bool		scanjoin_target_same_exprs;
 		bool		have_grouping;
 		AggClauseCosts agg_costs;
 		WindowFuncLists *wflists = NULL;
@@ -1964,34 +1966,33 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		}
 		else
 		{
-			/* initialize lists, just to keep compiler quiet */
+			/* initialize lists; for most of these, dummy values are OK */
 			final_targets = final_targets_contain_srfs = NIL;
 			sort_input_targets = sort_input_targets_contain_srfs = NIL;
 			grouping_targets = grouping_targets_contain_srfs = NIL;
-			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
+			scanjoin_targets = list_make1(scanjoin_target);
+			scanjoin_targets_contain_srfs = NIL;
 		}
 
 		/*
-		 * Generate Gather or Gather Merge paths for the topmost scan/join
-		 * relation.  Once that's done, we must re-determine which paths are
-		 * cheapest.  (The previously-cheapest path might even have been
-		 * pfree'd!)
+		 * If the final scan/join target is not parallel-safe, we must
+		 * generate Gather paths now, since no partial paths will be generated
+		 * with the final scan/join targetlist.  Otherwise, the Gather or
+		 * Gather Merge paths generated within apply_scanjoin_target_to_paths
+		 * will be superior to any we might generate now in that the
+		 * projection will be done in by each participant rather than only in
+		 * the leader.
 		 */
-		generate_gather_paths(root, current_rel, false);
-		set_cheapest(current_rel);
+		if (!scanjoin_target_parallel_safe)
+			generate_gather_paths(root, current_rel, false);
 
-		/*
-		 * Forcibly apply SRF-free scan/join target to all the Paths for the
-		 * scan/join rel.
-		 */
-		apply_scanjoin_target_to_paths(root, current_rel, scanjoin_target,
-									   scanjoin_target_parallel_safe, true);
-
-		/* Now fix things up if scan/join target contains SRFs */
-		if (parse->hasTargetSRFs)
-			adjust_paths_for_srfs(root, current_rel,
-								  scanjoin_targets,
-								  scanjoin_targets_contain_srfs);
+		/* Apply scan/join target. */
+		scanjoin_target_same_exprs = list_length(scanjoin_targets) == 1
+			&& equal(scanjoin_target->exprs, current_rel->reltarget->exprs);
+		apply_scanjoin_target_to_paths(root, current_rel, scanjoin_targets,
+									   scanjoin_targets_contain_srfs,
+									   scanjoin_target_parallel_safe,
+									   scanjoin_target_same_exprs);
 
 		/*
 		 * Save the various upper-rel PathTargets we just computed into
@@ -2003,6 +2004,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		root->upper_targets[UPPERREL_FINAL] = final_target;
 		root->upper_targets[UPPERREL_WINDOW] = sort_input_target;
 		root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target;
+		root->upper_targets[UPPERREL_TLIST] = scanjoin_target;
 
 		/*
 		 * If we have grouping and/or aggregation, consider ways to implement
@@ -6793,24 +6795,88 @@ can_partial_agg(PlannerInfo *root, const AggClauseCosts *agg_costs)
 /*
  * apply_scanjoin_target_to_paths
  *
- * Applies scan/join target to all the Paths for the scan/join rel.
+ * Adjust the final scan/join relation, and recursively all of its children,
+ * to generate the final scan/join target.  It would be more correct to model
+ * this as a separate planning step with a new RelOptInfo at the toplevel and
+ * for each child relation, but doing it this way is noticeably cheaper.
+ * Maybe that problem can be solved at some point, but for now we do this.
+ *
+ * If tlist_same_exprs is true, then the scan/join target to be applied has
+ * the same expressions as the existing reltarget, so we need only insert the
+ * appropriate sortgroupref information.  By avoiding the creation of
+ * projection paths we save effort both immediately and at plan creation time.
  */
 static void
 apply_scanjoin_target_to_paths(PlannerInfo *root,
 							   RelOptInfo *rel,
-							   PathTarget *scanjoin_target,
+							   List *scanjoin_targets,
+							   List *scanjoin_targets_contain_srfs,
 							   bool scanjoin_target_parallel_safe,
-							   bool modify_in_place)
+							   bool tlist_same_exprs)
 {
 	ListCell   *lc;
+	PathTarget *scanjoin_target;
+
+	check_stack_depth();
 
 	/*
-	 * In principle we should re-run set_cheapest() here to identify the
-	 * cheapest path, but it seems unlikely that adding the same tlist eval
-	 * costs to all the paths would change that, so we don't bother. Instead,
-	 * just assume that the cheapest-startup and cheapest-total paths remain
-	 * so.  (There should be no parameterized paths anymore, so we needn't
-	 * worry about updating cheapest_parameterized_paths.)
+	 * If the scan/join target is not parallel-safe, then the new partial
+	 * pathlist is the empty list.
+	 */
+	if (!scanjoin_target_parallel_safe)
+	{
+		rel->partial_pathlist = NIL;
+		rel->consider_parallel = false;
+	}
+
+	/*
+	 * Update the reltarget.  This may not be strictly necessary in all cases,
+	 * but it is at least necessary when create_append_path() gets called
+	 * below directly or indirectly, since that function uses the reltarget as
+	 * the pathtarget for the resulting path.  It seems like a good idea to do
+	 * it unconditionally.
+	 */
+	rel->reltarget = llast_node(PathTarget, scanjoin_targets);
+
+	/* Special case: handly dummy relations separately. */
+	if (IS_DUMMY_REL(rel))
+	{
+		/*
+		 * Since this is a dummy rel, it's got a single Append path with no
+		 * child paths.  Replace it with a new path having the final scan/join
+		 * target.  (Note that since Append is not projection-capable, it
+		 * would be bad to handle this using the general purpose code below;
+		 * we'd end up putting a ProjectionPath on top of the existing Append
+		 * node, which would cause this relation to stop appearing to be a
+		 * dummy rel.)
+		 */
+		rel->pathlist = list_make1(create_append_path(rel, NIL, NIL, NULL,
+													  0, false, NIL, -1));
+		rel->partial_pathlist = NIL;
+		set_cheapest(rel);
+		Assert(IS_DUMMY_REL(rel));
+
+		/*
+		 * Forget about any child relations.  There's no point in adjusting
+		 * them and no point in using them for later planning stages (in
+		 * particular, partitionwise aggregate).
+		 */
+		rel->nparts = 0;
+		rel->part_rels = NULL;
+		rel->boundinfo = NULL;
+
+		return;
+	}
+
+	/* Extract SRF-free scan/join target. */
+	scanjoin_target = linitial_node(PathTarget, scanjoin_targets);
+
+	/*
+	 * Adjust each input path.  If the tlist exprs are the same, we can just
+	 * inject the sortgroupref information into the existing pathtarget.
+	 * Otherwise, replace each path with a projection path that generates the
+	 * SRF-free scan/join target.  This can't change the ordering of paths
+	 * within rel->pathlist, so we just modify the list in place.
 	 */
 	foreach(lc, rel->pathlist)
 	{
@@ -6819,52 +6885,31 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 
 		Assert(subpath->param_info == NULL);
 
-		/*
-		 * Don't use apply_projection_to_path() when modify_in_place is false,
-		 * because there could be other pointers to these paths, and therefore
-		 * we mustn't modify them in place.
-		 */
-		if (modify_in_place)
-			newpath = apply_projection_to_path(root, rel, subpath,
-											   scanjoin_target);
+		if (tlist_same_exprs)
+			subpath->pathtarget->sortgrouprefs =
+				scanjoin_target->sortgrouprefs;
 		else
+		{
 			newpath = (Path *) create_projection_path(root, rel, subpath,
 													  scanjoin_target);
-
-		/* If we had to add a Result, newpath is different from subpath */
-		if (newpath != subpath)
-		{
 			lfirst(lc) = newpath;
-			if (subpath == rel->cheapest_startup_path)
-				rel->cheapest_startup_path = newpath;
-			if (subpath == rel->cheapest_total_path)
-				rel->cheapest_total_path = newpath;
 		}
 	}
 
-	/*
-	 * Upper planning steps which make use of the top scan/join rel's partial
-	 * pathlist will expect partial paths for that rel to produce the same
-	 * output as complete paths ... and we just changed the output for the
-	 * complete paths, so we'll need to do the same thing for partial paths.
-	 * But only parallel-safe expressions can be computed by partial paths.
-	 */
-	if (rel->partial_pathlist && scanjoin_target_parallel_safe)
+	/* Same for partial paths. */
+	foreach(lc, rel->partial_pathlist)
 	{
-		/* Apply the scan/join target to each partial path */
-		foreach(lc, rel->partial_pathlist)
-		{
-			Path	   *subpath = (Path *) lfirst(lc);
-			Path	   *newpath;
+		Path	   *subpath = (Path *) lfirst(lc);
+		Path	   *newpath;
 
-			/* Shouldn't have any parameterized paths anymore */
-			Assert(subpath->param_info == NULL);
+		/* Shouldn't have any parameterized paths anymore */
+		Assert(subpath->param_info == NULL);
 
-			/*
-			 * Don't use apply_projection_to_path() here, because there could
-			 * be other pointers to these paths, and therefore we mustn't
-			 * modify them in place.
-			 */
+		if (tlist_same_exprs)
+			subpath->pathtarget->sortgrouprefs =
+				scanjoin_target->sortgrouprefs;
+		else
+		{
 			newpath = (Path *) create_projection_path(root,
 													  rel,
 													  subpath,
@@ -6872,16 +6917,83 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 			lfirst(lc) = newpath;
 		}
 	}
-	else
+
+	/* Now fix things up if scan/join target contains SRFs */
+	if (root->parse->hasTargetSRFs)
+		adjust_paths_for_srfs(root, rel,
+							  scanjoin_targets,
+							  scanjoin_targets_contain_srfs);
+
+	/*
+	 * If the relation is partitioned, recurseively apply the same changes to
+	 * all partitions and generate new Append paths. Since Append is not
+	 * projection-capable, that might save a separate Result node, and it also
+	 * is important for partitionwise aggregate.
+	 */
+	if (rel->part_scheme && rel->part_rels)
 	{
-		/*
-		 * In the unfortunate event that scanjoin_target is not parallel-safe,
-		 * we can't apply it to the partial paths; in that case, we'll need to
-		 * forget about the partial paths, which aren't valid input for upper
-		 * planning steps.
-		 */
-		rel->partial_pathlist = NIL;
+		int			partition_idx;
+		List	   *live_children = NIL;
+
+		/* Adjust each partition. */
+		for (partition_idx = 0; partition_idx < rel->nparts; partition_idx++)
+		{
+			RelOptInfo *child_rel = rel->part_rels[partition_idx];
+			ListCell   *lc;
+			AppendRelInfo **appinfos;
+			int			nappinfos;
+			List	   *child_scanjoin_targets = NIL;
+
+			/* Translate scan/join targets for this child. */
+			appinfos = find_appinfos_by_relids(root, child_rel->relids,
+											   &nappinfos);
+			foreach(lc, scanjoin_targets)
+			{
+				PathTarget *target = lfirst_node(PathTarget, lc);
+
+				target = copy_pathtarget(target);
+				target->exprs = (List *)
+					adjust_appendrel_attrs(root,
+										   (Node *) target->exprs,
+										   nappinfos, appinfos);
+				child_scanjoin_targets = lappend(child_scanjoin_targets,
+												 target);
+			}
+			pfree(appinfos);
+
+			/* Recursion does the real work. */
+			apply_scanjoin_target_to_paths(root, child_rel,
+										   child_scanjoin_targets,
+										   scanjoin_targets_contain_srfs,
+										   scanjoin_target_parallel_safe,
+										   tlist_same_exprs);
+
+			/* Save non-dummy children for Append paths. */
+			if (!IS_DUMMY_REL(child_rel))
+				live_children = lappend(live_children, child_rel);
+		}
+
+		/* Build new paths for this relation by appending child paths. */
+		if (live_children != NIL)
+			add_paths_to_append_rel(root, rel, live_children);
 	}
+
+	/*
+	 * Consider generating Gather or Gather Merge paths.  We must only do this
+	 * if the relation is parallel safe, and we don't do it for child rels to
+	 * avoid creating multiple Gather nodes within the same plan. We must do
+	 * this after all paths have been generated and before set_cheapest, since
+	 * one of the generated paths may turn out to be the cheapest one.
+	 */
+	if (rel->consider_parallel && !IS_OTHER_REL(rel))
+		generate_gather_paths(root, rel, false);
+
+	/*
+	 * Reassess which paths are the cheapest, now that we've potentially added
+	 * new Gather (or Gather Merge) and/or Append (or MergeAppend) paths to
+	 * this relation.
+	 */
+	set_cheapest(rel);
 }
 
 /*
@@ -6928,7 +7040,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 		PathTarget *child_target = copy_pathtarget(target);
 		AppendRelInfo **appinfos;
 		int			nappinfos;
-		PathTarget *scanjoin_target;
 		GroupPathExtraData child_extra;
 		RelOptInfo *child_grouped_rel;
 		RelOptInfo *child_partially_grouped_rel;
@@ -6985,26 +7096,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
 			continue;
 		}
 
-		/*
-		 * Copy pathtarget from underneath scan/join as we are modifying it
-		 * and translate its Vars with respect to this appendrel.  The input
-		 * relation's reltarget might not be the final scanjoin_target, but
-		 * the pathtarget any given individual path should be.
-		 */
-		scanjoin_target =
-			copy_pathtarget(input_rel->cheapest_startup_path->pathtarget);
-		scanjoin_target->exprs = (List *)
-			adjust_appendrel_attrs(root,
-								   (Node *) scanjoin_target->exprs,
-								   nappinfos, appinfos);
-
-		/*
-		 * Forcibly apply scan/join target to all the Paths for the scan/join
-		 * rel.
-		 */
-		apply_scanjoin_target_to_paths(root, child_input_rel, scanjoin_target,
-									   extra->target_parallel_safe, false);
-
 		/* Create grouping paths for this child relation. */
 		create_ordinary_grouping_paths(root, child_input_rel,
 									   child_grouped_rel,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index abbbda9e91..d4bffbc281 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -71,6 +71,7 @@ typedef struct AggClauseCosts
 typedef enum UpperRelationKind
 {
 	UPPERREL_SETOP,				/* result of UNION/INTERSECT/EXCEPT, if any */
+	UPPERREL_TLIST,				/* result of projecting final scan/join rel */
 	UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
 								 * any */
 	UPPERREL_GROUP_AGG,			/* result of grouping/aggregation, if any */
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index 4fccd9ae54..b983f9c506 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -65,31 +65,30 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
 -- left outer join, with whole-row reference
 EXPLAIN (COSTS OFF)
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
-                       QUERY PLAN                       
---------------------------------------------------------
+                    QUERY PLAN                    
+--------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t2.b = t1.a)
-                     ->  Seq Scan on prt2_p1 t2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p1 t1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_1.b = t1_1.a)
-                     ->  Seq Scan on prt2_p2 t2_1
-                     ->  Hash
-                           ->  Seq Scan on prt1_p2 t1_1
-                                 Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t2_2.b = t1_2.a)
-                     ->  Seq Scan on prt2_p3 t2_2
-                     ->  Hash
-                           ->  Seq Scan on prt1_p3 t1_2
-                                 Filter: (b = 0)
-(22 rows)
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t2.b = t1.a)
+               ->  Seq Scan on prt2_p1 t2
+               ->  Hash
+                     ->  Seq Scan on prt1_p1 t1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_1.b = t1_1.a)
+               ->  Seq Scan on prt2_p2 t2_1
+               ->  Hash
+                     ->  Seq Scan on prt1_p2 t1_1
+                           Filter: (b = 0)
+         ->  Hash Right Join
+               Hash Cond: (t2_2.b = t1_2.a)
+               ->  Seq Scan on prt2_p3 t2_2
+               ->  Hash
+                     ->  Seq Scan on prt1_p3 t1_2
+                           Filter: (b = 0)
+(21 rows)
 
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
       t1      |      t2      
@@ -111,30 +110,29 @@ SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER
 -- right outer join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                          QUERY PLAN                           
+---------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (t1.a = t2.b)
-                     ->  Seq Scan on prt1_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: (t1_1.a = t2_1.b)
-                     ->  Seq Scan on prt1_p2 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_p2 t2_1
-                                 Filter: (a = 0)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt2_p3 t2_2
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (t1.a = t2.b)
+               ->  Seq Scan on prt1_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_p1 t2
                            Filter: (a = 0)
-                     ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
-                           Index Cond: (a = t2_2.b)
-(21 rows)
+         ->  Hash Right Join
+               Hash Cond: (t1_1.a = t2_1.b)
+               ->  Seq Scan on prt1_p2 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_p2 t2_1
+                           Filter: (a = 0)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt2_p3 t2_2
+                     Filter: (a = 0)
+               ->  Index Scan using iprt1_p3_a on prt1_p3 t1_2
+                     Index Cond: (a = t2_2.b)
+(20 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1 RIGHT JOIN prt2 t2 ON t1.a = t2.b WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -375,37 +373,36 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
 			  ON t1.a = ss.t2a WHERE t1.b = 0 ORDER BY t1.a;
-                                   QUERY PLAN                                   
---------------------------------------------------------------------------------
+                                QUERY PLAN                                
+--------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p1 t1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
-                                 Index Cond: (a = t1.a)
-                           ->  Index Scan using iprt2_p1_b on prt2_p1 t3
-                                 Index Cond: (b = t2.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p2 t1_1
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
-                                 Index Cond: (a = t1_1.a)
-                           ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
-                                 Index Cond: (b = t2_1.a)
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_p3 t1_2
-                           Filter: (b = 0)
-                     ->  Nested Loop
-                           ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
-                                 Index Cond: (a = t1_2.a)
-                           ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
-                                 Index Cond: (b = t2_2.a)
-(28 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p1 t1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p1_a on prt1_p1 t2
+                           Index Cond: (a = t1.a)
+                     ->  Index Scan using iprt2_p1_b on prt2_p1 t3
+                           Index Cond: (b = t2.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p2 t1_1
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p2_a on prt1_p2 t2_1
+                           Index Cond: (a = t1_1.a)
+                     ->  Index Scan using iprt2_p2_b on prt2_p2 t3_1
+                           Index Cond: (b = t2_1.a)
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_p3 t1_2
+                     Filter: (b = 0)
+               ->  Nested Loop
+                     ->  Index Only Scan using iprt1_p3_a on prt1_p3 t2_2
+                           Index Cond: (a = t1_2.a)
+                     ->  Index Scan using iprt2_p3_b on prt2_p3 t3_2
+                           Index Cond: (b = t2_2.a)
+(27 rows)
 
 SELECT * FROM prt1 t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t3.a AS t3a, least(t1.a,t2.a,t3.b) FROM prt1 t2 JOIN prt2 t3 ON (t2.a = t3.b)) ss
@@ -538,92 +535,90 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_e t1, prt2_e t2 WHERE (t1.a + t1.b)/2 =
 --
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-                                QUERY PLAN                                 
----------------------------------------------------------------------------
+                             QUERY PLAN                              
+---------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop
-                     Join Filter: (t1.a = ((t3.a + t3.b) / 2))
-                     ->  Hash Join
+   ->  Append
+         ->  Nested Loop
+               Join Filter: (t1.a = ((t3.a + t3.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2.b = t1.a)
+                     ->  Seq Scan on prt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p1 t1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
+                     Index Cond: (((a + b) / 2) = t2.b)
+         ->  Nested Loop
+               Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_1.b = t1_1.a)
+                     ->  Seq Scan on prt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_p2 t1_1
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
+                     Index Cond: (((a + b) / 2) = t2_1.b)
+         ->  Nested Loop
+               Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+               ->  Hash Join
+                     Hash Cond: (t2_2.b = t1_2.a)
+                     ->  Seq Scan on prt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_p3 t1_2
+                                 Filter: (b = 0)
+               ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
+                     Index Cond: (((a + b) / 2) = t2_2.b)
+(33 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
+  a  |  c   |  b  |  c   | ?column? | c 
+-----+------+-----+------+----------+---
+   0 | 0000 |   0 | 0000 |        0 | 0
+ 150 | 0150 | 150 | 0150 |      300 | 0
+ 300 | 0300 | 300 | 0300 |      600 | 0
+ 450 | 0450 | 450 | 0450 |      900 | 0
+(4 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Sort
+   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
+               ->  Seq Scan on prt1_e_p1 t3
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2.b = t1.a)
                            ->  Seq Scan on prt2_p1 t2
                            ->  Hash
                                  ->  Seq Scan on prt1_p1 t1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p1_ab2 on prt1_e_p1 t3
-                           Index Cond: (((a + b) / 2) = t2.b)
-               ->  Nested Loop
-                     Join Filter: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
+               ->  Seq Scan on prt1_e_p2 t3_1
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_1.b = t1_1.a)
                            ->  Seq Scan on prt2_p2 t2_1
                            ->  Hash
                                  ->  Seq Scan on prt1_p2 t1_1
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p2_ab2 on prt1_e_p2 t3_1
-                           Index Cond: (((a + b) / 2) = t2_1.b)
-               ->  Nested Loop
-                     Join Filter: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                     ->  Hash Join
+         ->  Hash Right Join
+               Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
+               ->  Seq Scan on prt1_e_p3 t3_2
+               ->  Hash
+                     ->  Hash Right Join
                            Hash Cond: (t2_2.b = t1_2.a)
                            ->  Seq Scan on prt2_p3 t2_2
                            ->  Hash
                                  ->  Seq Scan on prt1_p3 t1_2
                                        Filter: (b = 0)
-                     ->  Index Scan using iprt1_e_p3_ab2 on prt1_e_p3 t3_2
-                           Index Cond: (((a + b) / 2) = t2_2.b)
-(34 rows)
-
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM prt1 t1, prt2 t2, prt1_e t3 WHERE t1.a = t2.b AND t1.a = (t3.a + t3.b)/2 AND t1.b = 0 ORDER BY t1.a, t2.b;
-  a  |  c   |  b  |  c   | ?column? | c 
------+------+-----+------+----------+---
-   0 | 0000 |   0 | 0000 |        0 | 0
- 150 | 0150 | 150 | 0150 |      300 | 0
- 300 | 0300 | 300 | 0300 |      600 | 0
- 450 | 0450 | 450 | 0450 |      900 | 0
-(4 rows)
-
-EXPLAIN (COSTS OFF)
-SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                             QUERY PLAN                             
---------------------------------------------------------------------
- Sort
-   Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: (((t3.a + t3.b) / 2) = t1.a)
-                     ->  Seq Scan on prt1_e_p1 t3
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2.b = t1.a)
-                                 ->  Seq Scan on prt2_p1 t2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p1 t1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_1.a + t3_1.b) / 2) = t1_1.a)
-                     ->  Seq Scan on prt1_e_p2 t3_1
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_1.b = t1_1.a)
-                                 ->  Seq Scan on prt2_p2 t2_1
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p2 t1_1
-                                             Filter: (b = 0)
-               ->  Hash Right Join
-                     Hash Cond: (((t3_2.a + t3_2.b) / 2) = t1_2.a)
-                     ->  Seq Scan on prt1_e_p3 t3_2
-                     ->  Hash
-                           ->  Hash Right Join
-                                 Hash Cond: (t2_2.b = t1_2.a)
-                                 ->  Seq Scan on prt2_p3 t2_2
-                                 ->  Hash
-                                       ->  Seq Scan on prt1_p3 t1_2
-                                             Filter: (b = 0)
-(34 rows)
+(33 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) LEFT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.b = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -644,40 +639,39 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                               QUERY PLAN                                
--------------------------------------------------------------------------
+                            QUERY PLAN                             
+-------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
-                           ->  Seq Scan on prt1_p1 t1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p1 t3
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p1_b on prt2_p1 t2
-                           Index Cond: (t1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
-                           ->  Seq Scan on prt1_p2 t1_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p2 t3_1
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
-                           Index Cond: (t1_1.a = b)
-               ->  Nested Loop Left Join
-                     ->  Hash Right Join
-                           Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
-                           ->  Seq Scan on prt1_p3 t1_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_e_p3 t3_2
-                                       Filter: (c = 0)
-                     ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
-                           Index Cond: (t1_2.a = b)
-(31 rows)
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1.a = ((t3.a + t3.b) / 2))
+                     ->  Seq Scan on prt1_p1 t1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p1 t3
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p1_b on prt2_p1 t2
+                     Index Cond: (t1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_1.a = ((t3_1.a + t3_1.b) / 2))
+                     ->  Seq Scan on prt1_p2 t1_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p2 t3_1
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p2_b on prt2_p2 t2_1
+                     Index Cond: (t1_1.a = b)
+         ->  Nested Loop Left Join
+               ->  Hash Right Join
+                     Hash Cond: (t1_2.a = ((t3_2.a + t3_2.b) / 2))
+                     ->  Seq Scan on prt1_p3 t1_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_e_p3 t3_2
+                                 Filter: (c = 0)
+               ->  Index Scan using iprt2_p3_b on prt2_p3 t2_2
+                     Index Cond: (t1_2.a = b)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -700,52 +694,51 @@ SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2
 -- make sure these go to null as expected
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                                      QUERY PLAN                                                      
-----------------------------------------------------------------------------------------------------------------------
+                                                   QUERY PLAN                                                   
+----------------------------------------------------------------------------------------------------------------
  Sort
    Sort Key: prt1_p1.a, prt2_p1.b, ((prt1_e_p1.a + prt1_e_p1.b))
-   ->  Result
-         ->  Append
+   ->  Append
+         ->  Hash Full Join
+               Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
+               Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p1.a = ((prt1_e_p1.a + prt1_e_p1.b) / 2))
-                     Filter: ((prt1_p1.a = (50)) OR (prt2_p1.b = (75)) OR (((prt1_e_p1.a + prt1_e_p1.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p1.a = prt2_p1.b)
-                           ->  Seq Scan on prt1_p1
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p1
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p1.a = prt2_p1.b)
+                     ->  Seq Scan on prt1_p1
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p1
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p1
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p1
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
+               Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p2.a = ((prt1_e_p2.a + prt1_e_p2.b) / 2))
-                     Filter: ((prt1_p2.a = (50)) OR (prt2_p2.b = (75)) OR (((prt1_e_p2.a + prt1_e_p2.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p2.a = prt2_p2.b)
-                           ->  Seq Scan on prt1_p2
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p2
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p2.a = prt2_p2.b)
+                     ->  Seq Scan on prt1_p2
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p2
-                                 Filter: (c = 0)
+                           ->  Seq Scan on prt2_p2
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p2
+                           Filter: (c = 0)
+         ->  Hash Full Join
+               Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
+               Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
                ->  Hash Full Join
-                     Hash Cond: (prt1_p3.a = ((prt1_e_p3.a + prt1_e_p3.b) / 2))
-                     Filter: ((prt1_p3.a = (50)) OR (prt2_p3.b = (75)) OR (((prt1_e_p3.a + prt1_e_p3.b) / 2) = (50)))
-                     ->  Hash Full Join
-                           Hash Cond: (prt1_p3.a = prt2_p3.b)
-                           ->  Seq Scan on prt1_p3
-                                 Filter: (b = 0)
-                           ->  Hash
-                                 ->  Seq Scan on prt2_p3
-                                       Filter: (a = 0)
+                     Hash Cond: (prt1_p3.a = prt2_p3.b)
+                     ->  Seq Scan on prt1_p3
+                           Filter: (b = 0)
                      ->  Hash
-                           ->  Seq Scan on prt1_e_p3
-                                 Filter: (c = 0)
-(43 rows)
+                           ->  Seq Scan on prt2_p3
+                                 Filter: (a = 0)
+               ->  Hash
+                     ->  Seq Scan on prt1_e_p3
+                           Filter: (c = 0)
+(42 rows)
 
 SELECT t1.a, t1.phv, t2.b, t2.phv, t3.a + t3.b, t3.phv FROM ((SELECT 50 phv, * FROM prt1 WHERE prt1.b = 0) t1 FULL JOIN (SELECT 75 phv, * FROM prt2 WHERE prt2.a = 0) t2 ON (t1.a = t2.b)) FULL JOIN (SELECT 50 phv, * FROM prt1_e WHERE prt1_e.c = 0) t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t1.a = t1.phv OR t2.b = t2.phv OR (t3.a + t3.b)/2 = t3.phv ORDER BY t1.a, t2.b, t3.a + t3.b;
  a  | phv | b  | phv | ?column? | phv 
@@ -933,61 +926,60 @@ SELECT t1.* FROM prt1 t1 WHERE t1.a IN (SELECT t1.b FROM prt2 t1 WHERE t1.b IN (
 
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
-                                    QUERY PLAN                                    
-----------------------------------------------------------------------------------
+                                 QUERY PLAN                                 
+----------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b, ((t3.a + t3.b))
-   ->  Result
-         ->  Append
-               ->  Merge Left Join
-                     Merge Cond: (t1.a = t2.b)
-                     ->  Sort
-                           Sort Key: t1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3.a + t3.b) / 2))
-                                       ->  Seq Scan on prt1_e_p1 t3
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1.a
-                                       ->  Seq Scan on prt1_p1 t1
-                     ->  Sort
-                           Sort Key: t2.b
-                           ->  Seq Scan on prt2_p1 t2
-               ->  Merge Left Join
-                     Merge Cond: (t1_1.a = t2_1.b)
-                     ->  Sort
-                           Sort Key: t1_1.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_1.a + t3_1.b) / 2))
-                                       ->  Seq Scan on prt1_e_p2 t3_1
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_1.a
-                                       ->  Seq Scan on prt1_p2 t1_1
-                     ->  Sort
-                           Sort Key: t2_1.b
-                           ->  Seq Scan on prt2_p2 t2_1
-               ->  Merge Left Join
-                     Merge Cond: (t1_2.a = t2_2.b)
-                     ->  Sort
-                           Sort Key: t1_2.a
-                           ->  Merge Left Join
-                                 Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
-                                 ->  Sort
-                                       Sort Key: (((t3_2.a + t3_2.b) / 2))
-                                       ->  Seq Scan on prt1_e_p3 t3_2
-                                             Filter: (c = 0)
-                                 ->  Sort
-                                       Sort Key: t1_2.a
-                                       ->  Seq Scan on prt1_p3 t1_2
-                     ->  Sort
-                           Sort Key: t2_2.b
-                           ->  Seq Scan on prt2_p3 t2_2
-(52 rows)
+   ->  Append
+         ->  Merge Left Join
+               Merge Cond: (t1.a = t2.b)
+               ->  Sort
+                     Sort Key: t1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3.a + t3.b) / 2)) = t1.a)
+                           ->  Sort
+                                 Sort Key: (((t3.a + t3.b) / 2))
+                                 ->  Seq Scan on prt1_e_p1 t3
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1.a
+                                 ->  Seq Scan on prt1_p1 t1
+               ->  Sort
+                     Sort Key: t2.b
+                     ->  Seq Scan on prt2_p1 t2
+         ->  Merge Left Join
+               Merge Cond: (t1_1.a = t2_1.b)
+               ->  Sort
+                     Sort Key: t1_1.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_1.a + t3_1.b) / 2)) = t1_1.a)
+                           ->  Sort
+                                 Sort Key: (((t3_1.a + t3_1.b) / 2))
+                                 ->  Seq Scan on prt1_e_p2 t3_1
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_1.a
+                                 ->  Seq Scan on prt1_p2 t1_1
+               ->  Sort
+                     Sort Key: t2_1.b
+                     ->  Seq Scan on prt2_p2 t2_1
+         ->  Merge Left Join
+               Merge Cond: (t1_2.a = t2_2.b)
+               ->  Sort
+                     Sort Key: t1_2.a
+                     ->  Merge Left Join
+                           Merge Cond: ((((t3_2.a + t3_2.b) / 2)) = t1_2.a)
+                           ->  Sort
+                                 Sort Key: (((t3_2.a + t3_2.b) / 2))
+                                 ->  Seq Scan on prt1_e_p3 t3_2
+                                       Filter: (c = 0)
+                           ->  Sort
+                                 Sort Key: t1_2.a
+                                 ->  Seq Scan on prt1_p3 t1_2
+               ->  Sort
+                     Sort Key: t2_2.b
+                     ->  Seq Scan on prt2_p3 t2_2
+(51 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c, t3.a + t3.b, t3.c FROM (prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b) RIGHT JOIN prt1_e t3 ON (t1.a = (t3.a + t3.b)/2) WHERE t3.c = 0 ORDER BY t1.a, t2.b, t3.a + t3.b;
   a  |  c   |  b  |  c   | ?column? | c 
@@ -1145,42 +1137,41 @@ ANALYZE plt1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on plt1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on plt1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p1 t3
+                                 ->  Seq Scan on plt2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on plt1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on plt1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p2 t3_1
+                                 ->  Seq Scan on plt2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on plt1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on plt2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on plt1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on plt1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on plt2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on plt1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM plt1 t1, plt2 t2, plt1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |          avg          |  c   |  c   |   c   
@@ -1290,42 +1281,41 @@ ANALYZE pht1_e;
 -- test partition matching with N-way join
 EXPLAIN (COSTS OFF)
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
-                                      QUERY PLAN                                      
---------------------------------------------------------------------------------------
+                                   QUERY PLAN                                   
+--------------------------------------------------------------------------------
  GroupAggregate
    Group Key: t1.c, t2.c, t3.c
    ->  Sort
          Sort Key: t1.c, t3.c
-         ->  Result
-               ->  Append
+         ->  Append
+               ->  Hash Join
+                     Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1.c = ltrim(t3.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
-                                 ->  Seq Scan on pht1_p1 t1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p1 t2
+                           Hash Cond: ((t1.b = t2.b) AND (t1.c = t2.c))
+                           ->  Seq Scan on pht1_p1 t1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p1 t3
+                                 ->  Seq Scan on pht2_p1 t2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p1 t3
+               ->  Hash Join
+                     Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_1.c = ltrim(t3_1.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
-                                 ->  Seq Scan on pht1_p2 t1_1
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p2 t2_1
+                           Hash Cond: ((t1_1.b = t2_1.b) AND (t1_1.c = t2_1.c))
+                           ->  Seq Scan on pht1_p2 t1_1
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p2 t3_1
+                                 ->  Seq Scan on pht2_p2 t2_1
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p2 t3_1
+               ->  Hash Join
+                     Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
                      ->  Hash Join
-                           Hash Cond: (t1_2.c = ltrim(t3_2.c, 'A'::text))
-                           ->  Hash Join
-                                 Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
-                                 ->  Seq Scan on pht1_p3 t1_2
-                                 ->  Hash
-                                       ->  Seq Scan on pht2_p3 t2_2
+                           Hash Cond: ((t1_2.b = t2_2.b) AND (t1_2.c = t2_2.c))
+                           ->  Seq Scan on pht1_p3 t1_2
                            ->  Hash
-                                 ->  Seq Scan on pht1_e_p3 t3_2
-(33 rows)
+                                 ->  Seq Scan on pht2_p3 t2_2
+                     ->  Hash
+                           ->  Seq Scan on pht1_e_p3 t3_2
+(32 rows)
 
 SELECT avg(t1.a), avg(t2.b), avg(t3.a + t3.b), t1.c, t2.c, t3.c FROM pht1 t1, pht2 t2, pht1_e t3 WHERE t1.b = t2.b AND t1.c = t2.c AND ltrim(t3.c, 'A') = t1.c GROUP BY t1.c, t2.c, t3.c ORDER BY t1.c, t2.c, t3.c;
          avg          |         avg          |         avg          |  c   |  c   |   c   
@@ -1463,40 +1453,39 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b
 -- right join
 EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
-                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a, t2.b
-   ->  Result
-         ->  Append
-               ->  Hash Right Join
-                     Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
-                     ->  Seq Scan on prt1_l_p1 t1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p1 t2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p1 t2_1
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
-                     ->  Hash
-                           ->  Seq Scan on prt2_l_p2_p2 t2_2
-                                 Filter: (a = 0)
-               ->  Hash Right Join
-                     Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+   ->  Append
+         ->  Hash Right Join
+               Hash Cond: ((t1.a = t2.b) AND ((t1.c)::text = (t2.c)::text))
+               ->  Seq Scan on prt1_l_p1 t1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p1 t2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_1.a = t2_1.b) AND ((t1_1.c)::text = (t2_1.c)::text))
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p1 t2_1
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_2.a = t2_2.b) AND ((t1_2.c)::text = (t2_2.c)::text))
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+               ->  Hash
+                     ->  Seq Scan on prt2_l_p2_p2 t2_2
+                           Filter: (a = 0)
+         ->  Hash Right Join
+               Hash Cond: ((t1_3.a = t2_3.b) AND ((t1_3.c)::text = (t2_3.c)::text))
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
+                     ->  Seq Scan on prt1_l_p3_p2 t1_4
+               ->  Hash
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                           ->  Seq Scan on prt1_l_p3_p2 t1_4
-                     ->  Hash
-                           ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t2_3
-                                       Filter: (a = 0)
-(31 rows)
+                           ->  Seq Scan on prt2_l_p3_p1 t2_3
+                                 Filter: (a = 0)
+(30 rows)
 
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 RIGHT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t2.a = 0 ORDER BY t1.a, t2.b;
   a  |  c   |  b  |  c   
@@ -1577,55 +1566,54 @@ EXPLAIN (COSTS OFF)
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
 			  ON t1.a = ss.t2a AND t1.c = ss.t2c WHERE t1.b = 0 ORDER BY t1.a;
-                                             QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
+                                          QUERY PLAN                                           
+-----------------------------------------------------------------------------------------------
  Sort
    Sort Key: t1.a
-   ->  Result
-         ->  Append
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p1 t1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
-                           ->  Seq Scan on prt2_l_p1 t3
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p1 t2
-                                       Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p1 t1_1
-                           Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p1 t3_1
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p1 t2_1
-                                       Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
-               ->  Nested Loop Left Join
-                     ->  Seq Scan on prt1_l_p2_p2 t1_2
+   ->  Append
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p1 t1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3.b = t2.a) AND ((t3.c)::text = (t2.c)::text))
+                     ->  Seq Scan on prt2_l_p1 t3
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p1 t2
+                                 Filter: ((t1.a = a) AND ((t1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p1 t1_1
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_1.b = t2_1.a) AND ((t3_1.c)::text = (t2_1.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p1 t3_1
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p1 t2_1
+                                 Filter: ((t1_1.a = a) AND ((t1_1.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Seq Scan on prt1_l_p2_p2 t1_2
+                     Filter: (b = 0)
+               ->  Hash Join
+                     Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
+                     ->  Seq Scan on prt2_l_p2_p2 t3_2
+                     ->  Hash
+                           ->  Seq Scan on prt1_l_p2_p2 t2_2
+                                 Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
+         ->  Nested Loop Left Join
+               ->  Append
+                     ->  Seq Scan on prt1_l_p3_p1 t1_3
                            Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_2.b = t2_2.a) AND ((t3_2.c)::text = (t2_2.c)::text))
-                           ->  Seq Scan on prt2_l_p2_p2 t3_2
-                           ->  Hash
-                                 ->  Seq Scan on prt1_l_p2_p2 t2_2
-                                       Filter: ((t1_2.a = a) AND ((t1_2.c)::text = (c)::text))
-               ->  Nested Loop Left Join
+               ->  Hash Join
+                     Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
                      ->  Append
-                           ->  Seq Scan on prt1_l_p3_p1 t1_3
-                                 Filter: (b = 0)
-                     ->  Hash Join
-                           Hash Cond: ((t3_3.b = t2_3.a) AND ((t3_3.c)::text = (t2_3.c)::text))
+                           ->  Seq Scan on prt2_l_p3_p1 t3_3
+                           ->  Seq Scan on prt2_l_p3_p2 t3_4
+                     ->  Hash
                            ->  Append
-                                 ->  Seq Scan on prt2_l_p3_p1 t3_3
-                                 ->  Seq Scan on prt2_l_p3_p2 t3_4
-                           ->  Hash
-                                 ->  Append
-                                       ->  Seq Scan on prt1_l_p3_p1 t2_3
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-                                       ->  Seq Scan on prt1_l_p3_p2 t2_4
-                                             Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
-(46 rows)
+                                 ->  Seq Scan on prt1_l_p3_p1 t2_3
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+                                 ->  Seq Scan on prt1_l_p3_p2 t2_4
+                                       Filter: ((t1_3.a = a) AND ((t1_3.c)::text = (c)::text))
+(45 rows)
 
 SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
 			  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.b AS t3b, least(t1.a,t2.a,t3.b) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a = t3.b AND t2.c = t3.c)) ss
-- 
2.14.3 (Apple Git-98)

0002-Postpone-generate_gather_paths-for-topmost-scan-join.patchapplication/octet-stream; name=0002-Postpone-generate_gather_paths-for-topmost-scan-join.patchDownload

From 1aa0b52f5c9b5c59c4fd00f8c2550646911432ce Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 16:45:15 -0400
Subject: [PATCH 2/3] Postpone generate_gather_paths for topmost scan/join rel.

Don't call generate_gather_paths for the topmost scan/join relation
when it is initially populated with paths.  Instead, do the work in
grouping_planner.  By itself, this gains nothing; in fact it loses
slightly because we end up calling set_cheapest() for the topmost
scan/join rel twice rather than once.  However, it paves the way for
a future commit which will postpone generate_gather_paths for the
topmost scan/join relation even further, allowing more accurate
costing of parallel paths.

Amit Kapila and Robert Haas
---
 src/backend/optimizer/geqo/geqo_eval.c | 21 ++++++++++++++-------
 src/backend/optimizer/path/allpaths.c  | 26 +++++++++++++++++++-------
 src/backend/optimizer/plan/planner.c   |  9 +++++++++
 3 files changed, 42 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 0be2a73e05..3ef7d7d8aa 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -40,7 +40,7 @@ typedef struct
 } Clump;
 
 static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump,
-			bool force);
+			int num_gene, bool force);
 static bool desirable_join(PlannerInfo *root,
 			   RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 
@@ -196,7 +196,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		cur_clump->size = 1;
 
 		/* Merge it into the clumps list, using only desirable joins */
-		clumps = merge_clump(root, clumps, cur_clump, false);
+		clumps = merge_clump(root, clumps, cur_clump, num_gene, false);
 	}
 
 	if (list_length(clumps) > 1)
@@ -210,7 +210,7 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
 		{
 			Clump	   *clump = (Clump *) lfirst(lc);
 
-			fclumps = merge_clump(root, fclumps, clump, true);
+			fclumps = merge_clump(root, fclumps, clump, num_gene, true);
 		}
 		clumps = fclumps;
 	}
@@ -235,7 +235,8 @@ gimme_tree(PlannerInfo *root, Gene *tour, int num_gene)
  * "desirable" joins.
  */
 static List *
-merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
+merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
+			bool force)
 {
 	ListCell   *prev;
 	ListCell   *lc;
@@ -267,8 +268,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				/* Create paths for partitionwise joins. */
 				generate_partitionwise_join_paths(root, joinrel);
 
-				/* Create GatherPaths for any useful partial paths for rel */
-				generate_gather_paths(root, joinrel, false);
+				/*
+				 * Except for the topmost scan/join rel, consider gathering
+				 * partial paths.  We'll do the same for the topmost scan/join
+				 * rel once we know the final targetlist (see
+				 * grouping_planner).
+				 */
+				if (old_clump->size + new_clump->size < num_gene)
+					generate_gather_paths(root, joinrel, false);
 
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
@@ -286,7 +293,7 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, bool force)
 				 * others.  When no further merge is possible, we'll reinsert
 				 * it into the list.
 				 */
-				return merge_clump(root, clumps, old_clump, force);
+				return merge_clump(root, clumps, old_clump, num_gene, force);
 			}
 		}
 		prev = lc;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 43f4e75748..c4e4db15a6 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -479,13 +479,20 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	}
 
 	/*
-	 * If this is a baserel, consider gathering any partial paths we may have
-	 * created for it.  (If we tried to gather inheritance children, we could
+	 * If this is a baserel, we should normally consider gathering any partial
+	 * paths we may have created for it.
+	 *
+	 * However, if this is an inheritance child, skip it.  Otherwise, we could
 	 * end up with a very large number of gather nodes, each trying to grab
-	 * its own pool of workers, so don't do this for otherrels.  Instead,
-	 * we'll consider gathering partial paths for the parent appendrel.)
+	 * its own pool of workers.  Instead, we'll consider gathering partial
+	 * paths for the parent appendrel.
+	 *
+	 * Also, if this is the topmost scan/join rel (that is, the only baserel),
+	 * we postpone this until the final scan/join targelist is available (see
+	 * grouping_planner).
 	 */
-	if (rel->reloptkind == RELOPT_BASEREL)
+	if (rel->reloptkind == RELOPT_BASEREL &&
+		bms_membership(root->all_baserels) != BMS_SINGLETON)
 		generate_gather_paths(root, rel, false);
 
 	/*
@@ -2699,8 +2706,13 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Create paths for partitionwise joins. */
 			generate_partitionwise_join_paths(root, rel);
 
-			/* Create GatherPaths for any useful partial paths for rel */
-			generate_gather_paths(root, rel, false);
+			/*
+			 * Except for the topmost scan/join rel, consider gathering
+			 * partial paths.  We'll do the same for the topmost scan/join rel
+			 * once we know the final targetlist (see grouping_planner).
+			 */
+			if (lev < levels_needed)
+				generate_gather_paths(root, rel, false);
 
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 52c21e6870..11b20d546b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1971,6 +1971,15 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
 		}
 
+		/*
+		 * Generate Gather or Gather Merge paths for the topmost scan/join
+		 * relation.  Once that's done, we must re-determine which paths are
+		 * cheapest.  (The previously-cheapest path might even have been
+		 * pfree'd!)
+		 */
+		generate_gather_paths(root, current_rel, false);
+		set_cheapest(current_rel);
+
 		/*
 		 * Forcibly apply SRF-free scan/join target to all the Paths for the
 		 * scan/join rel.
-- 
2.14.3 (Apple Git-98)

0001-Teach-create_projection_plan-to-omit-projection-wher.patchapplication/octet-stream; name=0001-Teach-create_projection_plan-to-omit-projection-wher.patchDownload

From 902872834927d67f08325819f08e201894284e6f Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Mon, 12 Mar 2018 12:36:57 -0400
Subject: [PATCH 1/3] Teach create_projection_plan to omit projection where
 possible.

We sometimes insert a ProjectionPath into a plan tree when projection
is not strictly required. The existing code already arranges to avoid
emitting a Result node when the ProjectionPath's subpath can perform
the projection itself instead of needing a Result node to do it, but
previously it didn't consider the possibility that the parent node
might not actually require the projection to be performed.

Skipping projection when it's not required can not only avoid Result
nodes that aren't needed, but also avoid losing the "physical tlist"
optimization unneccessarily.

Patch by me, reviewed by Amit Kapila.
---
 src/backend/optimizer/plan/createplan.c | 98 +++++++++++++++++++++++++--------
 1 file changed, 75 insertions(+), 23 deletions(-)

diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8b4f031d96..7a0d1c6951 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -62,10 +62,14 @@
  * any sortgrouprefs specified in its pathtarget, with appropriate
  * ressortgroupref labels.  This is passed down by parent nodes such as Sort
  * and Group, which need these values to be available in their inputs.
+ *
+ * CP_IGNORE_TLIST specifies that the caller plans to replace the targetlist,
+ * and therefore it doens't matter a bit what target list gets generated.
  */
 #define CP_EXACT_TLIST		0x0001	/* Plan must return specified tlist */
 #define CP_SMALL_TLIST		0x0002	/* Prefer narrower tlists */
 #define CP_LABEL_TLIST		0x0004	/* tlist must contain sortgrouprefs */
+#define CP_IGNORE_TLIST		0x0008	/* caller will replace tlist */
 
 
 static Plan *create_plan_recurse(PlannerInfo *root, Path *best_path,
@@ -87,7 +91,9 @@ static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
 				   int flags);
 static Gather *create_gather_plan(PlannerInfo *root, GatherPath *best_path);
-static Plan *create_projection_plan(PlannerInfo *root, ProjectionPath *best_path);
+static Plan *create_projection_plan(PlannerInfo *root,
+					   ProjectionPath *best_path,
+					   int flags);
 static Plan *inject_projection_plan(Plan *subplan, List *tlist, bool parallel_safe);
 static Sort *create_sort_plan(PlannerInfo *root, SortPath *best_path, int flags);
 static Group *create_group_plan(PlannerInfo *root, GroupPath *best_path);
@@ -400,7 +406,8 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 			if (IsA(best_path, ProjectionPath))
 			{
 				plan = create_projection_plan(root,
-											  (ProjectionPath *) best_path);
+											  (ProjectionPath *) best_path,
+											  flags);
 			}
 			else if (IsA(best_path, MinMaxAggPath))
 			{
@@ -563,8 +570,16 @@ create_scan_plan(PlannerInfo *root, Path *best_path, int flags)
 	 * only those Vars actually needed by the query), we prefer to generate a
 	 * tlist containing all Vars in order.  This will allow the executor to
 	 * optimize away projection of the table tuples, if possible.
+	 *
+	 * But if the caller is going to ignore our tlist anyway, then don't
+	 * bother generating one at all.  We use an exact equality test here,
+	 * so that this only applies when CP_IGNORE_TLIST is the only flag set.
 	 */
-	if (use_physical_tlist(root, best_path, flags))
+	if (flags == CP_IGNORE_TLIST)
+	{
+		tlist = NULL;
+	}
+	else if (use_physical_tlist(root, best_path, flags))
 	{
 		if (best_path->pathtype == T_IndexOnlyScan)
 		{
@@ -1567,34 +1582,71 @@ create_gather_merge_plan(PlannerInfo *root, GatherMergePath *best_path)
  *	  but sometimes we can just let the subplan do the work.
  */
 static Plan *
-create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
+create_projection_plan(PlannerInfo *root, ProjectionPath *best_path, int flags)
 {
 	Plan	   *plan;
 	Plan	   *subplan;
 	List	   *tlist;
+	bool		needs_result_node = false;
 
-	/* Since we intend to project, we don't need to constrain child tlist */
-	subplan = create_plan_recurse(root, best_path->subpath, 0);
-
-	tlist = build_path_tlist(root, &best_path->path);
+	/*
+	 * Convert our subpath to a Plan and determine whether we need a Result
+	 * node.
+	 *
+	 * In most cases where we don't need to project, creation_projection_path
+	 * will have set dummypp, but not always.  First, some createplan.c
+	 * routines change the tlists of their nodes.  (An example is that
+	 * create_merge_append_plan might add resjunk sort columns to a
+	 * MergeAppend.)  Second, create_projection_path has no way of knowing
+	 * what path node will be placed on top of the projection path and
+	 * therefore can't predict whether it will require an exact tlist.
+	 * For both of these reasons, we have to recheck here.
+	 */
+	if (use_physical_tlist(root, &best_path->path, flags))
+	{
+		/*
+		 * Our caller doesn't really care what tlist we return, so we don't
+		 * actually need to project.  However, we may still need to ensure
+		 * proper sortgroupref labels, if the caller cares about those.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		tlist = subplan->targetlist;
+		if ((flags & CP_LABEL_TLIST) != 0)
+			apply_pathtarget_labeling_to_tlist(tlist,
+											   best_path->path.pathtarget);
+	}
+	else if (is_projection_capable_path(best_path->subpath))
+	{
+		/*
+		 * Our caller requires that we return the exact tlist, but no separate
+		 * result node is needed because the subpath is projection-capable.
+		 * Tell create_plan_recurse that we're going to ignore the tlist it
+		 * produces.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath,
+									  CP_IGNORE_TLIST);
+		tlist = build_path_tlist(root, &best_path->path);
+	}
+	else
+	{
+		/*
+		 * It looks like we need a result node, unless by good fortune the
+		 * requested tlist is exactly the one the child wants to produce.
+		 */
+		subplan = create_plan_recurse(root, best_path->subpath, 0);
+		tlist = build_path_tlist(root, &best_path->path);
+		needs_result_node = !tlist_same_exprs(tlist, subplan->targetlist);
+	}
 
 	/*
-	 * We might not really need a Result node here, either because the subplan
-	 * can project or because it's returning the right list of expressions
-	 * anyway.  Usually create_projection_path will have detected that and set
-	 * dummypp if we don't need a Result; but its decision can't be final,
-	 * because some createplan.c routines change the tlists of their nodes.
-	 * (An example is that create_merge_append_plan might add resjunk sort
-	 * columns to a MergeAppend.)  So we have to recheck here.  If we do
-	 * arrive at a different answer than create_projection_path did, we'll
-	 * have made slightly wrong cost estimates; but label the plan with the
-	 * cost estimates we actually used, not "corrected" ones.  (XXX this could
-	 * be cleaned up if we moved more of the sortcolumn setup logic into Path
-	 * creation, but that would add expense to creating Paths we might end up
-	 * not using.)
+	 * If we make a different decision about whether to include a Result node
+	 * than create_projection_path did, we'll have made slightly wrong cost
+	 * estimates; but label the plan with the cost estimates we actually used,
+	 * not "corrected" ones.  (XXX this could be cleaned up if we moved more of
+	 * the sortcolumn setup logic into Path creation, but that would add
+	 * expense to creating Paths we might end up not using.)
 	 */
-	if (is_projection_capable_path(best_path->subpath) ||
-		tlist_same_exprs(tlist, subplan->targetlist))
+	if (!needs_result_node)
 	{
 		/* Don't need a separate Result, just assign tlist to subplan */
 		plan = subplan;
-- 
2.14.3 (Apple Git-98)

#89

amit.kapila16@gmail.com

almost 8 years ago

In reply to: Robert Haas (#88)

1 attachment(s)

Re: [HACKERS] why not parallel seq scan for slow functions

On Thu, Mar 29, 2018 at 2:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Mar 28, 2018 at 3:06 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

The above block takes 43700.0289 ms on Head and 45025.3779 ms with the
patch which is approximately 3% regression.

Thanks for the analysis -- the observation that this seemed to affect
cases where CP_LABEL_TLIST gets passed to create_projection_plan
allowed me to recognize that I was doing an unnecessary copyObject()
call. Removing that seems to have reduced this regression below 1% in
my testing.

I think that is under acceptable range. I am seeing few regression
failures with the patch series. The order of targetlist seems to have
changed for Remote SQL. Kindly find the failure report attached. I
have requested my colleague Ashutosh Sharma to cross-verify this and
he is also seeing the same failures.

Few comments/suggestions:

1.
typedef enum UpperRelationKind
 {
  UPPERREL_SETOP, /* result of UNION/INTERSECT/EXCEPT, if any */
+ UPPERREL_TLIST, /* result of projecting final scan/join rel */
  UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
  * any */
  UPPERREL_GROUP_AGG, /* result of grouping/aggregation, if any */
...
...
  /*
  * Save the various upper-rel PathTargets we just computed into
@@ -2003,6 +2004,7 @@ grouping_planner(PlannerInfo *root, bool
inheritance_update,
  root->upper_targets[UPPERREL_FINAL] = final_target;
  root->upper_targets[UPPERREL_WINDOW] = sort_input_target;
  root->upper_targets[UPPERREL_GROUP_AGG] = grouping_target;
+ root->upper_targets[UPPERREL_TLIST] = scanjoin_target;

It seems UPPERREL_TLIST is redundant in the patch now. I think we can
remove it unless you have something else in mind.

2.
+ /*
+ * If the relation is partitioned, recurseively apply the same changes to
+ * all partitions and generate new Append paths. Since Append is not
+ * projection-capable, that might save a separate Result node, and it also
+ * is important for partitionwise aggregate.
+ */
+ if (rel->part_scheme && rel->part_rels)
  {

I think the handling of partitioned rels looks okay, but we might want
to once check the overhead of the same unless you are sure that this
shouldn't be a problem. If you think, we should check it once, then
is it possible that we can do it as a separate patch as this doesn't
look to be directly linked to the main patch. It can be treated as an
optimization for partitionwise aggregates. I think we can treat it
along with the main patch as well, but it might be somewhat simpler to
verify it if we do it separately.

Also, keep in mind that we're talking about extremely small amounts of
time here. On a trivial query that you're not even executing, you're
seeing a difference of (45025.3779 - 43700.0289)/1000000 = 0.00132 ms
per execution. Sure, it's still 3%, but it's 3% of the time in an
artificial case where you don't actually run the query. In real life,
nobody runs EXPLAIN in a tight loop a million times without ever
running the query, because that's not a useful thing to do.

Agreed, but this was to ensure that we should not do this optimization
at the cost of adding significant cycles to the planner time.

The
overhead on a realistic test case will be smaller. Furthermore, at
least in my testing, there are now cases where this is faster than
master. Now, I welcome further ideas for optimization, but a patch
that makes some cases slightly slower while making others slightly
faster, and also improving the quality of plans in some cases, is not
to my mind an unreasonable thing.

Agreed.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachments:

regression.diffsapplication/octet-stream; name=regression.diffsDownload

*** /home/amitkapila/mywork/pg/postgresql/contrib/postgres_fdw/expected/postgres_fdw.out	2018-03-24 17:26:25.054235359 +0530
--- /home/amitkapila/mywork/pg/postgresql/contrib/postgres_fdw/results/postgres_fdw.out	2018-03-29 08:13:53.953759447 +0530
***************
*** 1030,1036 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c1, t1.c3
           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
  (6 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
--- 1030,1036 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c1, t1.c3
           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
  (6 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
***************
*** 1061,1067 ****
           ->  Foreign Scan
                 Output: t1.c1, t2.c2, t3.c3, t1.c3
                 Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
!                Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1))))
  (9 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
--- 1061,1067 ----
           ->  Foreign Scan
                 Output: t1.c1, t2.c2, t3.c3, t1.c3
                 Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
!                Remote SQL: SELECT r1."C 1", r2.c2, r4.c3, r1.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1))))
  (9 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
***************
*** 1190,1196 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c1
           Relations: (public.ft4 t2) LEFT JOIN (public.ft5 t1)
!          Remote SQL: SELECT r2.c1, r1.c1 FROM ("S 1"."T 3" r2 LEFT JOIN "S 1"."T 4" r1 ON (((r1.c1 = r2.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r1.c1 ASC NULLS LAST
  (6 rows)
  
  SELECT t1.c1, t2.c1 FROM ft5 t1 RIGHT JOIN ft4 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t1.c1 OFFSET 10 LIMIT 10;
--- 1190,1196 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c1
           Relations: (public.ft4 t2) LEFT JOIN (public.ft5 t1)
!          Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r2 LEFT JOIN "S 1"."T 4" r1 ON (((r1.c1 = r2.c1)))) ORDER BY r2.c1 ASC NULLS LAST, r1.c1 ASC NULLS LAST
  (6 rows)
  
  SELECT t1.c1, t2.c1 FROM ft5 t1 RIGHT JOIN ft4 t2 ON (t1.c1 = t2.c1) ORDER BY t2.c1, t1.c1 OFFSET 10 LIMIT 10;
***************
*** 1218,1224 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft4 t3) LEFT JOIN (public.ft2 t2)) LEFT JOIN (public.ft2 t1)
!          Remote SQL: SELECT r4.c3, r2.c2, r1."C 1" FROM (("S 1"."T 3" r4 LEFT JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r4.c1)))) LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1"))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
--- 1218,1224 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft4 t3) LEFT JOIN (public.ft2 t2)) LEFT JOIN (public.ft2 t1)
!          Remote SQL: SELECT r1."C 1", r2.c2, r4.c3 FROM (("S 1"."T 3" r4 LEFT JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r4.c1)))) LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1"))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
***************
*** 1477,1483 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft4 t3) LEFT JOIN (public.ft2 t2)) LEFT JOIN (public.ft2 t1)
!          Remote SQL: SELECT r4.c3, r2.c2, r1."C 1" FROM (("S 1"."T 3" r4 LEFT JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r4.c1)))) LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1"))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
--- 1477,1483 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft4 t3) LEFT JOIN (public.ft2 t2)) LEFT JOIN (public.ft2 t1)
!          Remote SQL: SELECT r1."C 1", r2.c2, r4.c3 FROM (("S 1"."T 3" r4 LEFT JOIN "S 1"."T 1" r2 ON (((r2."C 1" = r4.c1)))) LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1"))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 FULL JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
***************
*** 1505,1511 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft2 t2) LEFT JOIN (public.ft2 t1)) FULL JOIN (public.ft4 t3)
!          Remote SQL: SELECT r2.c2, r1."C 1", r4.c3 FROM (("S 1"."T 1" r2 LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1")))) FULL JOIN "S 1"."T 3" r4 ON (((r2."C 1" = r4.c1))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
--- 1505,1511 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft2 t2) LEFT JOIN (public.ft2 t1)) FULL JOIN (public.ft4 t3)
!          Remote SQL: SELECT r1."C 1", r2.c2, r4.c3 FROM (("S 1"."T 1" r2 LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1")))) FULL JOIN "S 1"."T 3" r4 ON (((r2."C 1" = r4.c1))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) FULL JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
***************
*** 1589,1595 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft2 t2) LEFT JOIN (public.ft2 t1)) LEFT JOIN (public.ft4 t3)
!          Remote SQL: SELECT r2.c2, r1."C 1", r4.c3 FROM (("S 1"."T 1" r2 LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1")))) LEFT JOIN "S 1"."T 3" r4 ON (((r2."C 1" = r4.c1))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
--- 1589,1595 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: ((public.ft2 t2) LEFT JOIN (public.ft2 t1)) LEFT JOIN (public.ft4 t3)
!          Remote SQL: SELECT r1."C 1", r2.c2, r4.c3 FROM (("S 1"."T 1" r2 LEFT JOIN "S 1"."T 1" r1 ON (((r1."C 1" = r2."C 1")))) LEFT JOIN "S 1"."T 3" r4 ON (((r2."C 1" = r4.c1))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 RIGHT JOIN ft2 t2 ON (t1.c1 = t2.c1) LEFT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
***************
*** 1617,1623 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: (public.ft4 t3) LEFT JOIN ((public.ft2 t1) INNER JOIN (public.ft2 t2))
!          Remote SQL: SELECT r4.c3, r1."C 1", r2.c2 FROM ("S 1"."T 3" r4 LEFT JOIN ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ON (((r2."C 1" = r4.c1))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
--- 1617,1623 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t3.c3
           Relations: (public.ft4 t3) LEFT JOIN ((public.ft2 t1) INNER JOIN (public.ft2 t2))
!          Remote SQL: SELECT r1."C 1", r2.c2, r4.c3 FROM ("S 1"."T 3" r4 LEFT JOIN ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ON (((r2."C 1" = r4.c1))))
  (6 rows)
  
  SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT JOIN ft4 t3 ON (t2.c1 = t3.c1) OFFSET 10 LIMIT 10;
***************
*** 1676,1682 ****
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t1.c3
           Relations: (public.ft1 t1) FULL JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1."C 1", r1.c3, r2.c2 FROM ("S 1"."T 1" r1 FULL JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) WHERE ((public.postgres_fdw_abs(r1."C 1") > 0))
  (6 rows)
  
  ALTER SERVER loopback OPTIONS (DROP extensions);
--- 1676,1682 ----
     ->  Foreign Scan
           Output: t1.c1, t2.c2, t1.c3
           Relations: (public.ft1 t1) FULL JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1."C 1", r2.c2, r1.c3 FROM ("S 1"."T 1" r1 FULL JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) WHERE ((public.postgres_fdw_abs(r1."C 1") > 0))
  (6 rows)
  
  ALTER SERVER loopback OPTIONS (DROP extensions);
***************
*** 1691,1697 ****
           Output: t1.c1, t2.c2, t1.c3
           Filter: (postgres_fdw_abs(t1.c1) > 0)
           Relations: (public.ft1 t1) FULL JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1."C 1", r1.c3, r2.c2 FROM ("S 1"."T 1" r1 FULL JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"))))
  (7 rows)
  
  ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');
--- 1691,1697 ----
           Output: t1.c1, t2.c2, t1.c3
           Filter: (postgres_fdw_abs(t1.c1) > 0)
           Relations: (public.ft1 t1) FULL JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1."C 1", r2.c2, r1.c3 FROM ("S 1"."T 1" r1 FULL JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"))))
  (7 rows)
  
  ALTER SERVER loopback OPTIONS (ADD extensions 'postgres_fdw');
***************
*** 1708,1717 ****
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
--- 1708,1717 ----
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3 USING <, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
***************
*** 1755,1764 ****
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
--- 1755,1764 ----
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3 USING <, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
***************
*** 1803,1812 ****
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
--- 1803,1812 ----
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3 USING <, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
***************
*** 1850,1859 ****
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
--- 1850,1859 ----
           ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
                 ->  Sort
                       Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Sort Key: t1.c3 USING <, t1.c1
                       ->  Merge Join
                             Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
                             Merge Cond: (t1.c1 = t2.c1)
***************
*** 1930,1936 ****
     ->  Foreign Scan
           Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r1."C 1", r1.c3, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
  (6 rows)
  
  -- SEMI JOIN, not pushed down
--- 1930,1936 ----
     ->  Foreign Scan
           Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
           Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END, r1."C 1", r1.c3 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
  (6 rows)
  
  -- SEMI JOIN, not pushed down
***************
*** 2160,2166 ****
                 Output: t1.c1, t2.c1, t1.c3
                 Filter: (t1.c8 = t2.c8)
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1", r1.c8, r2.c8 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"))))
  (10 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = t2.c8 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
--- 2160,2166 ----
                 Output: t1.c1, t2.c1, t1.c3
                 Filter: (t1.c8 = t2.c8)
                 Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, r1.c8, r2.c8 FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"))))
  (10 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) WHERE t1.c8 = t2.c8 ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
***************
*** 2348,2356 ****
     ->  Foreign Scan
           Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
           Relations: (((public.ft1) INNER JOIN (public.ft2)) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
!          Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END, r3.c1, r3.c2, r3.c3, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, r4.c1, r4.c2, r4.c3, CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r2."C 1" < 100)) AND ((r1."C 1" < 100)))) INNER JOIN "S 1"."T 3" r3 ON (((r1.c2 = r3.c1)))) INNER JOIN "S 1"."T 4" r4 ON (((r1.c2 = r4.c1)))) FOR UPDATE OF r1 FOR UPDATE OF r2 FOR UPDATE OF r3 FOR UPDATE OF r4
           ->  Merge Join
!                Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*, ft5.c1, ft5.c2, ft5.c3, ft5.*
                 Merge Cond: (ft1.c2 = ft5.c1)
                 ->  Merge Join
                       Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*
--- 2348,2356 ----
     ->  Foreign Scan
           Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
           Relations: (((public.ft1) INNER JOIN (public.ft2)) INNER JOIN (public.ft4)) INNER JOIN (public.ft5)
!          Remote SQL: SELECT r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8, r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8, r3.c1, r3.c2, r3.c3, r4.c1, r4.c2, r4.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END, CASE WHEN (r3.*)::text IS NOT NULL THEN ROW(r3.c1, r3.c2, r3.c3) END, CASE WHEN (r4.*)::text IS NOT NULL THEN ROW(r4.c1, r4.c2, r4.c3) END FROM ((("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")) AND ((r2."C 1" < 100)) AND ((r1."C 1" < 100)))) INNER JOIN "S 1"."T 3" r3 ON (((r1.c2 = r3.c1)))) INNER JOIN "S 1"."T 4" r4 ON (((r1.c2 = r4.c1)))) FOR UPDATE OF r1 FOR UPDATE OF r2 FOR UPDATE OF r3 FOR UPDATE OF r4
           ->  Merge Join
!                Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft4.c1, ft4.c2, ft4.c3, ft5.c1, ft5.c2, ft5.c3, ft1.*, ft2.*, ft4.*, ft5.*
                 Merge Cond: (ft1.c2 = ft5.c1)
                 ->  Merge Join
                       Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft1.*, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft2.*, ft4.c1, ft4.c2, ft4.c3, ft4.*
***************
*** 2815,2821 ****
           Group Key: ft1.c2
           Filter: (avg((ft1.c1 * ((random() <= '1'::double precision))::integer)) > '100'::numeric)
           ->  Foreign Scan on public.ft1
!                Output: c2, c1
                 Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
  (10 rows)
  
--- 2815,2821 ----
           Group Key: ft1.c2
           Filter: (avg((ft1.c1 * ((random() <= '1'::double precision))::integer)) > '100'::numeric)
           ->  Foreign Scan on public.ft1
!                Output: c1, c2
                 Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
  (10 rows)
  
***************
*** 3039,3045 ****
           Output: sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision)), c2
           Group Key: ft1.c2
           ->  Foreign Scan on public.ft1
!                Output: c2, c1
                 Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
  (9 rows)
  
--- 3039,3045 ----
           Output: sum(c1) FILTER (WHERE ((((c1 / c1))::double precision * random()) <= '1'::double precision)), c2
           Group Key: ft1.c2
           ->  Foreign Scan on public.ft1
!                Output: c1, c2
                 Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1"
  (9 rows)
  
***************
*** 3213,3219 ****
     Output: array_agg(c1 ORDER BY c1 USING <^ NULLS LAST), c2
     Group Key: ft2.c2
     ->  Foreign Scan on public.ft2
!          Output: c2, c1
           Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (("C 1" < 100)) AND ((c2 = 6))
  (6 rows)
  
--- 3213,3219 ----
     Output: array_agg(c1 ORDER BY c1 USING <^ NULLS LAST), c2
     Group Key: ft2.c2
     ->  Foreign Scan on public.ft2
!          Output: c1, c2
           Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (("C 1" < 100)) AND ((c2 = 6))
  (6 rows)
  
***************
*** 3259,3265 ****
     Output: array_agg(c1 ORDER BY c1 USING <^ NULLS LAST), c2
     Group Key: ft2.c2
     ->  Foreign Scan on public.ft2
!          Output: c2, c1
           Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (("C 1" < 100)) AND ((c2 = 6))
  (6 rows)
  
--- 3259,3265 ----
     Output: array_agg(c1 ORDER BY c1 USING <^ NULLS LAST), c2
     Group Key: ft2.c2
     ->  Foreign Scan on public.ft2
!          Output: c1, c2
           Remote SQL: SELECT "C 1", c2 FROM "S 1"."T 1" WHERE (("C 1" < 100)) AND ((c2 = 6))
  (6 rows)
  
***************
*** 7814,7836 ****
  -- with PHVs, partition-wise join selected but no join pushdown
  EXPLAIN (COSTS OFF)
  SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
!                          QUERY PLAN                         
! ------------------------------------------------------------
   Sort
     Sort Key: ftprt1_p1.a, ftprt2_p1.b
!    ->  Result
!          ->  Append
!                ->  Hash Full Join
!                      Hash Cond: (ftprt1_p1.a = ftprt2_p1.b)
!                      ->  Foreign Scan on ftprt1_p1
!                      ->  Hash
!                            ->  Foreign Scan on ftprt2_p1
!                ->  Hash Full Join
!                      Hash Cond: (ftprt1_p2.a = ftprt2_p2.b)
!                      ->  Foreign Scan on ftprt1_p2
!                      ->  Hash
!                            ->  Foreign Scan on ftprt2_p2
! (14 rows)
  
  SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
    a  |  phv   |  b  |  phv   
--- 7814,7835 ----
  -- with PHVs, partition-wise join selected but no join pushdown
  EXPLAIN (COSTS OFF)
  SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
!                       QUERY PLAN                      
! ------------------------------------------------------
   Sort
     Sort Key: ftprt1_p1.a, ftprt2_p1.b
!    ->  Append
!          ->  Hash Full Join
!                Hash Cond: (ftprt1_p1.a = ftprt2_p1.b)
!                ->  Foreign Scan on ftprt1_p1
!                ->  Hash
!                      ->  Foreign Scan on ftprt2_p1
!          ->  Hash Full Join
!                Hash Cond: (ftprt1_p2.a = ftprt2_p2.b)
!                ->  Foreign Scan on ftprt1_p2
!                ->  Hash
!                      ->  Foreign Scan on ftprt2_p2
! (13 rows)
  
  SELECT t1.a, t1.phv, t2.b, t2.phv FROM (SELECT 't1_phv' phv, * FROM fprt1 WHERE a % 25 = 0) t1 FULL JOIN (SELECT 't2_phv' phv, * FROM fprt2 WHERE b % 25 = 0) t2 ON (t1.a = t2.b) ORDER BY t1.a, t2.b;
    a  |  phv   |  b  |  phv   

======================================================================

#90

robertmhaas@gmail.com

almost 8 years ago

In reply to: Amit Kapila (#89)

Re: [HACKERS] why not parallel seq scan for slow functions

On Thu, Mar 29, 2018 at 12:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

I think that is under acceptable range. I am seeing few regression
failures with the patch series. The order of targetlist seems to have
changed for Remote SQL. Kindly find the failure report attached. I
have requested my colleague Ashutosh Sharma to cross-verify this and
he is also seeing the same failures.

Oops. Those just require an expected output change.

It seems UPPERREL_TLIST is redundant in the patch now. I think we can
remove it unless you have something else in mind.

Yes.

I think the handling of partitioned rels looks okay, but we might want
to once check the overhead of the same unless you are sure that this
shouldn't be a problem. If you think, we should check it once, then
is it possible that we can do it as a separate patch as this doesn't
look to be directly linked to the main patch. It can be treated as an
optimization for partitionwise aggregates. I think we can treat it
along with the main patch as well, but it might be somewhat simpler to
verify it if we do it separately.

I don't think it should be a problem, although you're welcome to test
it if you're concerned about it. I think it would probably be
penny-wise and pound-foolish to worry about the overhead of
eliminating the Result nodes, which can occur not only with
partition-wise aggregate but also with partition-wise join and, I
think, really any case where the top scan/join plan would be an Append
node. We're talking about a very small amount of additional planning
time to potentially get a better plan.

I've committed all of these now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#91