why not parallel seq scan for slow functions
If I have a slow function which is evaluated in a simple seq scan, I do not
get parallel execution, even though it would be massively useful. Unless
force_parallel_mode=ON, then I get a dummy parallel plan with one worker.
explain select aid,slow(abalance) from pgbench_accounts;
CREATE OR REPLACE FUNCTION slow(integer)
RETURNS integer
LANGUAGE plperl
IMMUTABLE PARALLEL SAFE STRICT COST 10000000
AS $function$
my $thing=$_[0];
foreach (1..1_000_000) {
$thing = sqrt($thing);
$thing *= $thing;
};
return ($thing+0);
$function$;
The partial path is getting added to the list of paths, it is just not
getting chosen, even if parallel_*_cost are set to zero. Why not?
If I do an aggregate, then it does use parallel workers:
explain select sum(slow(abalance)) from pgbench_accounts;
It doesn't use as many as I would like, because there is a limit based on
the logarithm of the table size (I'm using -s 10 and get 3 parallel
processes) , but at least I know how to start looking into that.
Also, how do you debug stuff like this? Are there some gdb tricks to make
this easier to introspect into the plans?
Cheers,
Jeff
On Tue, Jul 11, 2017 at 9:02 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
If I have a slow function which is evaluated in a simple seq scan, I do not
get parallel execution, even though it would be massively useful. Unless
force_parallel_mode=ON, then I get a dummy parallel plan with one worker.explain select aid,slow(abalance) from pgbench_accounts;
After analysing this, I see multiple reasons of this getting not selected
1. The query is selecting all the tuple and the benefit what we are
getting by parallelism is by dividing cpu_tuple_cost which is 0.01 but
for each tuple sent from worker to gather there is parallel_tuple_cost
which is 0.1 for each tuple. (which will be very less in case of
aggregate). Maybe you can try some selecting with some condition.
like below:
postgres=# explain select slow(abalance) from pgbench_accounts where
abalance > 1;
QUERY PLAN
-----------------------------------------------------------------------------------
Gather (cost=0.00..46602.33 rows=1 width=4)
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..46602.33
rows=1 width=4)
Filter: (abalance > 1)
2. The second problem I am seeing is that (maybe the code problem),
the "slow" function is very costly (10000000) and in
apply_projection_to_path we account for this cost. But, I have
noticed that for gather node also we are adding this cost to all the
rows but actually, if the lower node is already doing the projection
then gather node just need to send out the tuple instead of actually
applying the projection.
In below function, we always multiply the target->cost.per_tuple with
path->rows, but in case of gather it should multiply this with
subpath->rows
apply_projection_to_path()
....
path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;
So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.
CREATE OR REPLACE FUNCTION slow(integer)
RETURNS integer
LANGUAGE plperl
IMMUTABLE PARALLEL SAFE STRICT COST 10000000
AS $function$
my $thing=$_[0];
foreach (1..1_000_000) {
$thing = sqrt($thing);
$thing *= $thing;
};
return ($thing+0);
$function$;The partial path is getting added to the list of paths, it is just not
getting chosen, even if parallel_*_cost are set to zero. Why not?If I do an aggregate, then it does use parallel workers:
explain select sum(slow(abalance)) from pgbench_accounts;
It doesn't use as many as I would like, because there is a limit based on
the logarithm of the table size (I'm using -s 10 and get 3 parallel
processes) , but at least I know how to start looking into that.Also, how do you debug stuff like this? Are there some gdb tricks to make
this easier to introspect into the plans?Cheers,
Jeff
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
In below function, we always multiply the target->cost.per_tuple with
path->rows, but in case of gather it should multiply this with
subpath->rowsapply_projection_to_path()
....path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.
I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.
So the attached patch improves things, but doesn't go far enough.
Cheers,
Jeff
Attachments:
subpath_projection_cost.patchapplication/octet-stream; name=subpath_projection_cost.patchDownload+2-0
On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.
I think that is genuine because tuple communication cost is very high.
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))
So the attached patch improves things, but doesn't go far enough.
It seems to that we need to adjust the cost based on if the below node
is projection capable. See attached.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
subpath_projection_cost.2.patchapplication/octet-stream; name=subpath_projection_cost.2.patchDownload+15-4
On Wed, Jul 12, 2017 at 10:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path as an
argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.I think that is genuine because tuple communication cost is very high.
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))So the attached patch improves things, but doesn't go far enough.
It seems to that we need to adjust the cost based on if the below node
is projection capable. See attached.
Patch looks good to me.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:
So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.I think you are correct. However, unless parallel_tuple_cost is set very
low, apply_projection_to_path never gets called with the Gather path asan
argument. It gets ruled out at some earlier stage, presumably because it
assumes the projection step cannot make it win if it is already behind by
enough.I think that is genuine because tuple communication cost is very high.
Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.
I agree that the communication cost is high, which is why I don't want to
have to set parellel_tuple_cost very low. For example, to get the benefit
of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
real-world case, not the dummy test case I posted, although the number are
around the same for that one too). But with a setting that low, all kinds
of other things also start using parallel plans, even if they don't benefit
from them and are harmed.
I realize we need to do some aggressive pruning to avoid an exponential
explosion in planning time, but in this case it has some rather unfortunate
consequences. I wanted to explore it, but I can't figure out where this
particular pruning is taking place.
By the time we get to planner.c line 1787, current_rel->pathlist already
does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
pruning is happening earlier than that.
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))
Setting parallel_workers to 8 changes the threshold for the parallel to
even be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it
is going in the correct direction, but not by enough to matter.
Cheers,
Jeff
On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.I think you are correct. However, unless parallel_tuple_cost is set
very
low, apply_projection_to_path never gets called with the Gather path as
an
argument. It gets ruled out at some earlier stage, presumably because
it
assumes the projection step cannot make it win if it is already behind
by
enough.I think that is genuine because tuple communication cost is very high.
Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.
Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.
I agree that the communication cost is high, which is why I don't want to
have to set parellel_tuple_cost very low. For example, to get the benefit
of your patch, I have to set parellel_tuple_cost to 0.0049 or less (in my
real-world case, not the dummy test case I posted, although the number are
around the same for that one too). But with a setting that low, all kinds
of other things also start using parallel plans, even if they don't benefit
from them and are harmed.I realize we need to do some aggressive pruning to avoid an exponential
explosion in planning time, but in this case it has some rather unfortunate
consequences. I wanted to explore it, but I can't figure out where this
particular pruning is taking place.By the time we get to planner.c line 1787, current_rel->pathlist already
does not contain the parallel plan if parellel_tuple_cost >= 0.0050, so the
pruning is happening earlier than that.
Check generate_gather_paths.
If your table is reasonable large then you might want to try by
increasing parallel workers (Alter Table ... Set (parallel_workers =
..))Setting parallel_workers to 8 changes the threshold for the parallel to even
be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it is
going in the correct direction, but not by enough to matter.
You might want to play with cpu_tuple_cost and or seq_page_cost.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
Setting parallel_workers to 8 changes the threshold for the parallel to even
be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it is
going in the correct direction, but not by enough to matter.You might want to play with cpu_tuple_cost and or seq_page_cost.
I don't know whether the patch will completely solve your problem, but
this seems to be the right thing to do. Do you think we should stick
this for next CF?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jul 22, 2017 at 8:53 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:
Setting parallel_workers to 8 changes the threshold for the parallel to
even
be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it is
going in the correct direction, but not by enough to matter.You might want to play with cpu_tuple_cost and or seq_page_cost.
I don't know whether the patch will completely solve your problem, but
this seems to be the right thing to do. Do you think we should stick
this for next CF?
It doesn't solve the problem for me, but I agree it is an improvement we
should commit.
Cheers,
Jeff
On Mon, Jul 24, 2017 at 9:21 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Jul 22, 2017 at 8:53 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Thu, Jul 13, 2017 at 7:38 AM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com>
wrote:Setting parallel_workers to 8 changes the threshold for the parallel to
even
be considered from parellel_tuple_cost <= 0.0049 to <= 0.0076. So it
is
going in the correct direction, but not by enough to matter.You might want to play with cpu_tuple_cost and or seq_page_cost.
I don't know whether the patch will completely solve your problem, but
this seems to be the right thing to do. Do you think we should stick
this for next CF?It doesn't solve the problem for me, but I agree it is an improvement we
should commit.
Okay, added the patch for next CF.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:
On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com>
wrote:
On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.I think you are correct. However, unless parallel_tuple_cost is set
very
low, apply_projection_to_path never gets called with the Gather pathas
an
argument. It gets ruled out at some earlier stage, presumably because
it
assumes the projection step cannot make it win if it is already behind
by
enough.I think that is genuine because tuple communication cost is very high.
Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.
If I understand it correctly, we have a way, it just can lead to
exponential explosion problem, so we are afraid to use it, correct? If I
just lobotomize the path domination code (make pathnode.c line 466 always
test false)
if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)
Then it keeps the parallel plan and later chooses to use it (after applying
your other patch in this thread) as the overall best plan. It even doesn't
slow down "make installcheck-parallel" by very much, which I guess just
means the regression tests don't have a lot of complex joins.
But what is an acceptable solution? Is there a heuristic for when
retaining a parallel path could be helpful, the same way there is for
fast-start paths? It seems like the best thing would be to include the
evaluation costs in the first place at this step.
Why is the path-cost domination code run before the cost of the function
evaluation is included? Is that because the information needed to compute
it is not available at that point, or because it would be too slow to
include it at that point? Or just because no one thought it important to do?
Cheers,
Jeff
On Wed, Aug 2, 2017 at 11:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes <jeff.janes@gmail.com>
wrote:On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar <dilipbalaut@gmail.com>
wrote:So because of this high projection cost the seqpath and parallel
path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.I think you are correct. However, unless parallel_tuple_cost is set
very
low, apply_projection_to_path never gets called with the Gather path
as
an
argument. It gets ruled out at some earlier stage, presumably
because
it
assumes the projection step cannot make it win if it is already
behind
by
enough.I think that is genuine because tuple communication cost is very high.
Sorry, I don't know which you think is genuine, the early pruning or my
complaint about the early pruning.Early pruning. See, currently, we don't have a way to maintain both
parallel and non-parallel paths till later stage and then decide which
one is better. If we want to maintain both parallel and non-parallel
paths, it can increase planning cost substantially in the case of
joins. Now, surely it can have benefit in many cases, so it is a
worthwhile direction to pursue.If I understand it correctly, we have a way, it just can lead to exponential
explosion problem, so we are afraid to use it, correct? If I just
lobotomize the path domination code (make pathnode.c line 466 always test
false)if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)
Then it keeps the parallel plan and later chooses to use it (after applying
your other patch in this thread) as the overall best plan. It even doesn't
slow down "make installcheck-parallel" by very much, which I guess just
means the regression tests don't have a lot of complex joins.But what is an acceptable solution? Is there a heuristic for when retaining
a parallel path could be helpful, the same way there is for fast-start
paths? It seems like the best thing would be to include the evaluation
costs in the first place at this step.Why is the path-cost domination code run before the cost of the function
evaluation is included?
Because the function evaluation is part of target list and we create
path target after the creation of base paths (See call to
create_pathtarget @ planner.c:1696).
Is that because the information needed to compute
it is not available at that point,
Right.
I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path. This has the danger of path
explosion. (b) In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.
I think the preference will be to do something on the lines of
approach (b), but I am not sure whether we can easily do that.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 8, 2017 at 3:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Right.
I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path. This has the danger of path
explosion. (b) In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.
I think the right approach to this problem is to get the cost of the
GatherPath correct when it's initially created. The proposed patch
changes the cost after-the-fact, but that (1) doesn't prevent a
promising path from being rejected before we reach this point and (2)
is probably unsafe, because it might confuse code that reaches the
modified-in-place path through some other pointer (e.g. code which
expects the RelOptInfo's paths to still be sorted by cost). Perhaps
the way to do that is to skip generate_gather_paths() for the toplevel
scan/join node and do something similar later, after we know what
target list we want.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 10, 2017 at 1:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 8, 2017 at 3:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Right.
I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path. This has the danger of path
explosion. (b) In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.I think the right approach to this problem is to get the cost of the
GatherPath correct when it's initially created. The proposed patch
changes the cost after-the-fact, but that (1) doesn't prevent a
promising path from being rejected before we reach this point and (2)
is probably unsafe, because it might confuse code that reaches the
modified-in-place path through some other pointer (e.g. code which
expects the RelOptInfo's paths to still be sorted by cost). Perhaps
the way to do that is to skip generate_gather_paths() for the toplevel
scan/join node and do something similar later, after we know what
target list we want.
I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.
I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 15, 2017 at 7:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.
I am suggesting to call generate_gather_paths just before we try to
apply projection on paths in grouping_planner (file:planner.c;
line:1787; commit:004a9702). Won't the target list for upper nodes be
available at that point?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 16, 2017 at 7:23 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 15, 2017 at 7:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Aug 12, 2017 at 9:18 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump). Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.I don't think that can work, because at that point we don't know what
target list the upper node wants to impose.I am suggesting to call generate_gather_paths just before we try to
apply projection on paths in grouping_planner (file:planner.c;
line:1787; commit:004a9702). Won't the target list for upper nodes be
available at that point?
Oh, yes. Apparently I misunderstood your proposal.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Aug 12, 2017 at 6:48 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 10, 2017 at 1:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 8, 2017 at 3:50 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Right.
I think skipping a generation of gather paths for scan node or top
level join node generated via standard_join_search seems straight
forward, but skipping for paths generated via geqo seems to be tricky
(See use of generate_gather_paths in merge_clump).
Either we can pass "num_gene" to merge_clump or we can store num_gene
in the root. And inside merge_clump we can check. Do you see some more
complexity?
if (joinrel)
{
/* Create GatherPaths for any useful partial paths for rel */
if (old_clump->size + new_clump->size < num_gene)
generate_gather_paths(root, joinrel);
}
Assuming, we find
some way to skip it for top level scan/join node, I don't think that
will be sufficient, we have some special way to push target list below
Gather node in apply_projection_to_path, we need to move that part as
well in generate_gather_paths.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 17, 2017 at 2:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Either we can pass "num_gene" to merge_clump or we can store num_gene
in the root. And inside merge_clump we can check. Do you see some more
complexity?
After putting some more thought I see one more problem but not sure
whether we can solve it easily. Now, if we skip generating the gather
path at top level node then our cost comparison while adding the
element to pool will not be correct as we are skipping some of the
paths (gather path). And, it's very much possible that the path1 is
cheaper than path2 without adding gather on top of it but with gather,
path2 can be cheaper. But, there is not an easy way to handle it
because without targetlist we can not calculate the cost of the
gather(which is the actual problem).
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 17, 2017 at 2:45 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Thu, Aug 17, 2017 at 2:09 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
Either we can pass "num_gene" to merge_clump or we can store num_gene
in the root. And inside merge_clump we can check. Do you see some more
complexity?
I think something like that should work.
After putting some more thought I see one more problem but not sure
whether we can solve it easily. Now, if we skip generating the gather
path at top level node then our cost comparison while adding the
element to pool will not be correct as we are skipping some of the
paths (gather path). And, it's very much possible that the path1 is
cheaper than path2 without adding gather on top of it but with gather,
path2 can be cheaper.
I think that should not matter because the costing of gather is mainly
based on a number of rows and that should be same for both path1 and
path2 in this case.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers