Removing INNER JOINs
Hi,
Starting a new thread which continues on from
/messages/by-id/CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com
To give a brief summary for any new readers:
The attached patch allows for INNER JOINed relations to be removed from the
plan,
providing none of the columns are used for anything, and a foreign key
exists which
proves that a record must exist in the table being removed which matches
the join
condition:
Example:
test=# create table b (id int primary key);
CREATE TABLE
test=# create table a (id int primary key, b_id int not null references
b(id));
CREATE TABLE
test=# explain (costs off) select a.* from a inner join b on a.b_id = b.id;
QUERY PLAN
---------------
Seq Scan on a
This has worked for a few years now for LEFT JOINs, so this patch is just
extending
the joins types that can be removed.
This optimisation should prove to be quite useful for views in which a
subset of its
columns are queried.
The attached is an updated patch which fixes a conflict from a recent
commit and
also fixes a bug where join removals did not properly work for PREPAREd
statements.
I'm looking for a bit of feedback around the method I'm using to prune the
redundant
plan nodes out of the plan tree at executor startup. Particularly around
not stripping
the Sort nodes out from below a merge join, even if the sort order is no
longer
required due to the merge join node being removed. This potentially could
leave
the plan suboptimal when compared to a plan that the planner could generate
when the removed relation was never asked for in the first place.
There are also other cases such as MergeJoins performing btree index scans
in order to obtain ordered results for a MergeJoin that would be better
executed
as a SeqScan when the MergeJoin can be removed.
Perhaps some costs could be adjusted at planning time when there's a
possibility
that joins could be removed at execution time, although I'm not quite sure
about
this as it risks generating a poor plan in the case when the joins cannot
be removed.
I currently can't see much of a way around these cases, but in both cases
removing
the join should prove to be a win, though just perhaps not with the most
optimal of
plans.
There are some more details around the reasons behind doing this weird
executor
startup plan pruning around here:
/messages/by-id/20141006145957.GA20577@awork2.anarazel.de
Comments are most welcome
Regards
David Rowley
Attachments:
inner_join_removals_2014-11-29_be69869.patchapplication/octet-stream; name=inner_join_removals_2014-11-29_be69869.patchDownload+1304-69
Hi David (and others),
David Rowley wrote:
Hi,
Starting a new thread which continues on from
/messages/by-id/CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.comTo give a brief summary for any new readers:
The attached patch allows for INNER JOINed relations to be removed from
the plan, providing none of the columns are used for anything, and a
foreign key exists which proves that a record must exist in the table
being removed which matches the join condition:I'm looking for a bit of feedback around the method I'm using to prune the
redundant plan nodes out of the plan tree at executor startup.
Particularly around not stripping the Sort nodes out from below a merge
join, even if the sort order is no longer required due to the merge join
node being removed. This potentially could leave the plan suboptimal when
compared to a plan that the planner could generate when the removed
relation was never asked for in the first place.
I did read this patch (and the previous patch about removing SEMI-joins)
with great interest. I don't know the code well enough to say much about the
patch itself, but I hope to have some usefull ideas about the the global
process.
I think performance can be greatly improved if the planner is able to use
information based on the current data. I think these patches are just two
examples of where assumptions during planning are usefull. I think there are
more possibilities for this kind of assumpions (for example unique
constraints, empty tables).
There are some more details around the reasons behind doing this weird
executor startup plan pruning around here:
The problem here is that assumpions done during planning might not hold
during execution. That is why you placed the final decision about removing a
join in the executor.
If a plan is made, you know under which assumptions are made in the final
plan. In this case, the assumption is that a foreign key is still valid. In
general, there are a lot more assumptions, such as the still existing of an
index or the still existing of columns. There also are soft assumptions,
assuming that the used statistics are still reasonable.
My suggestion is to check the assumptions at the start of executor. If they
still hold, you can just execute the plan as it is.
If one or more assumptions doesn't hold, there are a couple of things you
might do:
* Make a new plan. The plan is certain to match all conditions because at
that time, a snapshot is already taken.
* Check the assumption. This can be a costly operation with no guarantee of
success.
* Change the existing plan to not rely on the failed assumption.
* Use an already stored alternate plan (generate during the initial plan).
You currently change the plan in executer code. I suggest to go back to the
planner if the assumpion doesn't hold. The planner can then decide to change
the plan. The planner can also conclude to fully replan if there are reasons
for it.
If the planner knows that it needs to replan if the assumption will not hold
during execution, the cost of replanning multiplied by the chance of the
assumption not holding during exeuction should be part of the decision to
deliver a plan with an assumpion in the first place.
There are also other cases such as MergeJoins performing btree index scans
in order to obtain ordered results for a MergeJoin that would be better
executed as a SeqScan when the MergeJoin can be removed.Perhaps some costs could be adjusted at planning time when there's a
possibility that joins could be removed at execution time, although I'm
not quite sure about this as it risks generating a poor plan in the case
when the joins cannot be removed.
Maybe this is a case where you are better off replanning if the assumption
doesn't hold instead of changing the generated exeuction plan. In that case
you can remove the join before the path is made.
Comments are most welcome
Regards
David Rowley
Regards,
Mart
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 30 November 2014 at 23:19, Mart Kelder <mart@kelder31.nl> wrote:
I think performance can be greatly improved if the planner is able to use
information based on the current data. I think these patches are just two
examples of where assumptions during planning are usefull. I think there
are
more possibilities for this kind of assumpions (for example unique
constraints, empty tables).The problem here is that assumpions done during planning might not hold
during execution. That is why you placed the final decision about removing
a
join in the executor.If a plan is made, you know under which assumptions are made in the final
plan. In this case, the assumption is that a foreign key is still valid. In
general, there are a lot more assumptions, such as the still existing of an
index or the still existing of columns. There also are soft assumptions,
assuming that the used statistics are still reasonable.
Hi Mart,
That's an interesting idea. Though I think it would be much harder to
decide if it's a good idea to go off and replan for things like empty
tables as that's not known at executor startup, and may only be discovered
99% of the way through the plan execution, in that case going off and
replanning and starting execution all over again might throw away too much
hard work.
It does seem like a good idea for things that could be known at executor
start-up, I guess this would likely include LEFT JOIN removals using
deferrable unique indexes... Currently these indexes are ignored by the
current join removal code as they mightn't be unique until the transaction
finishes.
I'm imagining this being implemented by passing the planner a set of flags
which are assumptions that the planner is allowed to make... During the
planner's work, if it generated a plan which required this assumption to be
met, then it could set this flag in the plan somewhere which would force
the executor to check this at executor init. If the executor found any
required flag's conditions to be not met, then the executor would request a
new plan passing all the original flags, minus the ones that the conditions
have been broken on.
I see this is quite a fundamental change to how things currently work and
it could cause planning to take place during the execution of PREPAREd
statements, which might not impress people too much, but it would certainly
fix the weird anomalies that I'm currently facing by trimming the plan at
executor startup. e.g left over Sort nodes after a MergeJoin was removed.
It would be interesting to hear Tom's opinion on this.
Regards
David Rowley
David Rowley <dgrowleyml@gmail.com> writes:
I see this is quite a fundamental change to how things currently work and
it could cause planning to take place during the execution of PREPAREd
statements, which might not impress people too much, but it would certainly
fix the weird anomalies that I'm currently facing by trimming the plan at
executor startup. e.g left over Sort nodes after a MergeJoin was removed.
It would be interesting to hear Tom's opinion on this.
TBH I don't like this patch at all even in its current form, let alone
a form that's several times more invasive. I do not think there is a
big enough use-case to justify such an ad-hoc and fundamentally different
way of doing things. I think it's probably buggy as can be --- one thing
that definitely is a huge bug is that it modifies the plan tree in-place,
ignoring the rule that the plan tree is read-only to the executor.
Another question is what effect this has on EXPLAIN; there's basically
no way you can avoid lying to the user about what's going to happen at
runtime.
One idea you might think about to ameliorate those two objections is two
separate plan trees underneath an AlternativeSubPlan or similar kind of
node.
At a more macro level, there's the issue of how can the planner possibly
make intelligent decisions at other levels of the join tree when it
doesn't know the cost of this join. For that matter there's nothing
particularly driving the planner to arrange the tree so that the
optimization is possible at all.
Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1 December 2014 at 06:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
I see this is quite a fundamental change to how things currently work and
it could cause planning to take place during the execution of PREPAREd
statements, which might not impress people too much, but it wouldcertainly
fix the weird anomalies that I'm currently facing by trimming the plan at
executor startup. e.g left over Sort nodes after a MergeJoin was removed.It would be interesting to hear Tom's opinion on this.
Another question is what effect this has on EXPLAIN; there's basically
no way you can avoid lying to the user about what's going to happen at
runtime.
One of us must be missing something here. As far as I see it, there are no
lies told, the EXPLAIN shows exactly the plan that will be executed. All of
the regression tests I've added rely on this.
Regards
David Rowley
On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.
This optimization applies to a tremendous number of real-world cases,
and we really need to have it. This was a huge problem for me in my
previous life as a web developer. The previous work that we did to
remove LEFT JOINs was an enormous help, but it's not enough; we need a
way to remove INNER JOINs as well.
I thought that David's original approach of doing this in the planner
was a good one. That fell down because of the possibility that
apparently-valid referential integrity constraints might not be valid
at execution time if the triggers were deferred. But frankly, that
seems like an awfully nitpicky thing for this to fall down on. Lots
of web applications are going to issue only SELECT statements that run
as as single-statement transactions, and so that issue, so troubling
in theory, will never occur in practice. That doesn't mean that we
don't need to account for it somehow to make the code safe, but any
argument that it abridges the use case significantly is, in my
opinion, not credible.
Anyway, David was undeterred by the rejection of that initial approach
and rearranged everything, based on suggestions from Andres and later
Simon, into the form it's reached now. Kudos to him for his
persistance. But your point that we might have chosen a whole
different plan if it had known that this join was cheaper is a good
one. However, that takes us right back to square one, which is to do
this at plan time. I happen to think that's probably better anyway,
but I fear we're just going around in circles here. We can either do
it at plan time and find some way of handling the fact that there
might be deferred triggers that haven't fired yet; or we can do it at
execution time and live with the fact that we might have chosen a plan
that is not optimal, though still better than executing a
completely-unnecessary join.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.This optimization applies to a tremendous number of real-world cases,
and we really need to have it. This was a huge problem for me in my
previous life as a web developer. The previous work that we did to
remove LEFT JOINs was an enormous help, but it's not enough; we need a
way to remove INNER JOINs as well.
For my 2c, I'm completely with Robert on this one. There are a lot of
cases this could help with, particularly things coming out of ORMs
(which, yes, might possibly be better written, but that's a different
issue).
I thought that David's original approach of doing this in the planner
was a good one. That fell down because of the possibility that
apparently-valid referential integrity constraints might not be valid
at execution time if the triggers were deferred. But frankly, that
seems like an awfully nitpicky thing for this to fall down on. Lots
of web applications are going to issue only SELECT statements that run
as as single-statement transactions, and so that issue, so troubling
in theory, will never occur in practice. That doesn't mean that we
don't need to account for it somehow to make the code safe, but any
argument that it abridges the use case significantly is, in my
opinion, not credible.
Agreed with this also, deferred triggers are not common-place in my
experience and when it *does* happen, ime at least, it's because you
have a long-running data load or similar where you're not going to
care one bit that large, complicated JOINs aren't as fast as they
might have been otherwise.
Anyway, David was undeterred by the rejection of that initial approach
and rearranged everything, based on suggestions from Andres and later
Simon, into the form it's reached now. Kudos to him for his
persistance. But your point that we might have chosen a whole
different plan if it had known that this join was cheaper is a good
one. However, that takes us right back to square one, which is to do
this at plan time. I happen to think that's probably better anyway,
but I fear we're just going around in circles here. We can either do
it at plan time and find some way of handling the fact that there
might be deferred triggers that haven't fired yet; or we can do it at
execution time and live with the fact that we might have chosen a plan
that is not optimal, though still better than executing a
completely-unnecessary join.
Right, we can't get it wrong in the face of deferred triggers either.
Have we considered only doing the optimization for read-only
transactions? I'm not thrilled with that, but at least we'd get out
from under this deferred triggers concern. Another way might be an
option to say "use the optimization, but throw an error if you run
into a deferred trigger", or perhaps save both plans and use whichever
one we can when we get to execution time? That could make planning
time go up too much to work, but perhaps it's worth testing..
Thanks,
Stephen
On 3 December 2014 at 08:13, Robert Haas <robertmhaas@gmail.com> wrote:
On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bottom line, given all the restrictions on whether the optimization can
happen, I have very little enthusiasm for the whole idea. I do not think
the benefit will be big enough to justify the amount of mess this will
introduce.This optimization applies to a tremendous number of real-world cases,
and we really need to have it. This was a huge problem for me in my
previous life as a web developer. The previous work that we did to
remove LEFT JOINs was an enormous help, but it's not enough; we need a
way to remove INNER JOINs as well.I thought that David's original approach of doing this in the planner
was a good one. That fell down because of the possibility that
apparently-valid referential integrity constraints might not be valid
at execution time if the triggers were deferred. But frankly, that
seems like an awfully nitpicky thing for this to fall down on. Lots
of web applications are going to issue only SELECT statements that run
as as single-statement transactions, and so that issue, so troubling
in theory, will never occur in practice. That doesn't mean that we
don't need to account for it somehow to make the code safe, but any
argument that it abridges the use case significantly is, in my
opinion, not credible.Anyway, David was undeterred by the rejection of that initial approach
and rearranged everything, based on suggestions from Andres and later
Simon, into the form it's reached now. Kudos to him for his
persistance. But your point that we might have chosen a whole
different plan if it had known that this join was cheaper is a good
one. However, that takes us right back to square one, which is to do
this at plan time. I happen to think that's probably better anyway,
but I fear we're just going around in circles here. We can either do
it at plan time and find some way of handling the fact that there
might be deferred triggers that haven't fired yet; or we can do it at
execution time and live with the fact that we might have chosen a plan
that is not optimal, though still better than executing a
completely-unnecessary join.
Just so that I don't end up going around in circles again, let me
summarise my understanding of the pros and cons of each of the states that
this patch has been in.
*** Method 1: Removing Inner Joins at planning time:
Pros:
1. Plan generated should be optimal, i.e should generate the same plan for
the query as if the removed relations were never included in the query's
text.
2. On successful join removal planning likely will be faster as there's
less paths to consider having fewer relations and join combinations.
Cons:
1. Assumptions must be made during planning about the trigger queue being
empty or not. During execution, if there are pending fk triggers which need
to be executed then we could produce wrong results.
*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)
Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.
Cons:
1. Planner may not generate optimal plan. e.g sort nodes may be useless for
Merge joins
2. Code needed to be added to all join methods to allow skipping, nested
loop joins suffered from a small overhead.
3. Small overhead from visiting extra nodes in the plan which would not be
present if those nodes had been removed.
4. Problems writing regression tests due to having to use EXPLAIN ANALYZE
to try to work out what's going on, and the output containing variable
runtime values.
*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)
Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.
2. Does not require extra code in all join types (see cons #2 above)
3. Does not suffer from extra node visiting overhead (see cons #3 above)
Cons:
1. Executor must modify the plan.
2. Planner may have generated a plan which is not optimal for modification
by the executor (e.g. Sort nodes for merge join, or index scans for
pre-sorted input won't become seqscans which may be more efficient as
ordering may not be required after removing a merge join)
With each of the methods listed above, someone has had a problem with, and
from the feedback given I've made changes based and ended up with the next
revision of the patch.
Tom has now pointed out that he does not like the executor modifying the
plan, which I agree with to an extent as it I really do hate the extra
useless nodes that I'm unable to remove from the plan.
I'd like to propose Method 4 which I believe solves quite a few of the
problems seen in the other method.
Method 4: (Which is I think what Mart had in mind, I've only expanded on it
a bit with thoughts about possible implementations methods)
1. Invent planner flags which control the optimiser's ability to perform
join removals
2. Add a GUC for the default planner flags. (PLANFLAG_REMOVE_INNER_JOINS)
3. Join removal code checks if the appropriate planner flag is set before
performing join removal.
4. If join removals are performed, planner sets flags which were "utilised"
by the planner.
5. At Executor startup check plan's "utilised" flags and verifies the plan
is compatible for current executor status. e.g if
PLANFLAG_REMOVE_INNER_JOINS is set, then we'd better be sure there's no
pending foreign key triggers, if there are then the executor invokes the
planner with: planflags & ~(all_flags_which_are_not_compatible)
6. planner generates a plan without removing inner joins. (does not set
utilised flag)
7. goto step 5
If any users are suffering the overhead of this replanning then they can
Zero out the planner_flags GUC and get the standard behaviour back
This would also allow deferrable unique indexes to be used for LEFT JOIN
removals... We'd just need to tag
PLANFLAG_REMOVE_LEFT_JOIN_WITH_DEFERRED_UNIQUE_IDX (or something shorter),
onto the utilised flags and have the executor check that no unique indexes
are waiting to be updated.
Things I'm currently not sure about are:
a. can we invoke the planner during executor init?
b. PREPAREd statements... Which plan do we cache? It might not be very nice
to force the executor to re-plan if the generated plan was not compatible
with the current executor state. Or if we then replaced the cached plan,
then subsequent executions of the prepared statement could contain
redundant joins. Perhaps we can just stash both plans having planned them
lazily as and when required.
Pros:
1. Generates optimal plan
2. Could speed up planning when useless joins are removed.
3. Executor does not have to modify the plan.
4. No wrong results from removing joins when there's pending fk triggers.
5. No extra overhead from visiting useless plan nodes at execution time.
Cons:
1. Executor may have to invoke planner.
2. May have to plan queries twice.
I'm not seeing cons #2 as massively bad, as likely this won't happen too
often. This seems far better than generating an alternative plan which may
never be used, though, it all hangs on, is it even possible for the
executor to call planner() or standard_planner() ?
Regards
David Rowley
On 3 December 2014 at 09:29, David Rowley <dgrowleyml@gmail.com> wrote:
*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)Pros:
1. The plan can be executed as normal if there are any foreign key triggers
pending.
2. Does not require extra code in all join types (see cons #2 above)
3. Does not suffer from extra node visiting overhead (see cons #3 above)Cons:
1. Executor must modify the plan.
2. Planner may have generated a plan which is not optimal for modification
by the executor (e.g. Sort nodes for merge join, or index scans for
pre-sorted input won't become seqscans which may be more efficient as
ordering may not be required after removing a merge join)With each of the methods listed above, someone has had a problem with, and
from the feedback given I've made changes based and ended up with the next
revision of the patch.Tom has now pointed out that he does not like the executor modifying the
plan, which I agree with to an extent as it I really do hate the extra
useless nodes that I'm unable to remove from the plan.
I guess we need an Option node. Tom and I discussed that about an aeon ago.
The Option node has a plan for each situation. At execution time, we
make the test specified in the plan and then select the appropriate
subplan.
That way we can see what is happening in the plan and the executor
doesn't need to edit anything.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 3, 2014 at 5:00 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 3 December 2014 at 09:29, David Rowley <dgrowleyml@gmail.com> wrote:
*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)Pros:
1. The plan can be executed as normal if there are any foreign keytriggers
pending.
2. Does not require extra code in all join types (see cons #2 above)
3. Does not suffer from extra node visiting overhead (see cons #3 above)Cons:
1. Executor must modify the plan.
2. Planner may have generated a plan which is not optimal formodification
by the executor (e.g. Sort nodes for merge join, or index scans for
pre-sorted input won't become seqscans which may be more efficient as
ordering may not be required after removing a merge join)With each of the methods listed above, someone has had a problem with,
and
from the feedback given I've made changes based and ended up with the
next
revision of the patch.
Tom has now pointed out that he does not like the executor modifying the
plan, which I agree with to an extent as it I really do hate the extra
useless nodes that I'm unable to remove from the plan.I guess we need an Option node. Tom and I discussed that about an aeon ago.
The Option node has a plan for each situation. At execution time, we
make the test specified in the plan and then select the appropriate
subplan.That way we can see what is happening in the plan and the executor
doesn't need to edit anything.
So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?
So is the idea essentially making the planner return a set of "best" plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?
IMO this sounds like punting the planner's task to executor. Not to mention
some overhead for maintaining various plans that might have been discarded
early in the planning and path cost evaluation phase (consider a path with
pathkeys specified, like with ORDINALITY. Can there be edge cases where we
might end up invalidating the entire path if we let executor modify it, or,
maybe just lose the ordinality optimization?)
I agree that executor should not modify plans, but letting executor choose
the plan to execute (out of a set from planner, of course) rather than
planner giving executor a single plan and executor not caring about the
semantics, seems a bit counterintuitive to me. It might be just me though.
Regards,
Atri
--
Regards,
Atri
*l'apprenant*
* Atri Sharma (atri.jiit@gmail.com) wrote:
So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?
Right, this was one of the thoughts that I had.
So is the idea essentially making the planner return a set of "best" plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?
We *already* have an idea of there being multiple plans (see
plancache.c).
IMO this sounds like punting the planner's task to executor. Not to mention
some overhead for maintaining various plans that might have been discarded
early in the planning and path cost evaluation phase (consider a path with
pathkeys specified, like with ORDINALITY. Can there be edge cases where we
might end up invalidating the entire path if we let executor modify it, or,
maybe just lose the ordinality optimization?)
The executor isn't modifying the plan, it's just picking one based on
what the current situation is (which is information that only the
executor can have, such as if there are pending deferred triggers).
I agree that executor should not modify plans, but letting executor choose
the plan to execute (out of a set from planner, of course) rather than
planner giving executor a single plan and executor not caring about the
semantics, seems a bit counterintuitive to me. It might be just me though.
I don't think it follows that the executor is now required to care about
semantics. The planner says "use plan A if X is true; use plan B is X
is not true" and then the executor does exactly that. There's nothing
about the plans provided by the planner which are being changed and
there is no re-planning going on (though, as I point out, we actually
*do* re-plan in cases where we think the new plan is much much better
than the prior plan..).
Thanks!
Stephen
* Stephen Frost (sfrost@snowman.net) wrote:
* Atri Sharma (atri.jiit@gmail.com) wrote:
So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?Right, this was one of the thoughts that I had.
Erm, "I had also". Don't mean to imply that it was all my idea or
something silly like that.
Thanks,
Stephen
On 2014-12-03 11:30:32 +0000, Simon Riggs wrote:
I guess we need an Option node. Tom and I discussed that about an aeon ago.
The Option node has a plan for each situation. At execution time, we
make the test specified in the plan and then select the appropriate
subplan.That way we can see what is happening in the plan and the executor
doesn't need to edit anything.
Given David's result where he noticed a performance impact due to the
additional branch in the join code - which I still have a bit of a hard
time to believe - it seems likely that a whole separate node that has to
pass stuff around will be more expensive.
I think the switch would actually have to be done in ExecInitNode() et
al. David, if you essentially take your previous solution and move the
if into ExecInitNode(), does it work well?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:
*** Method 1: Removing Inner Joins at planning time:
*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)
[....]
a. can we invoke the planner during executor init?
I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused. I am less
clear on the specifics of why the former is unsafe, but what I think
it boils down to is that the plan per se needs to be finalized before
we begin execution; any replanning needs to be handled in the
plancache code. I am not sure whether it's feasible to do something
about this at the plancache layer; we have an is_oneshot flag there,
so perhaps one-shot plans could simply test whether there are pending
triggers, and non-oneshot plans could forego the optimization until we
come up with something better.
If that doesn't work for some reason, then I think we basically have
to give up on the idea of replanning if the situation becomes unsafe
between planning and execution. That leaves us with two alternatives.
One is to create a plan incorporating the optimization and another not
incorporating the optimization and decide between them at runtime,
which sounds expensive. The second is to create a plan that
contemplates performing the join and skip the join if it turns out to
be possible, living with the fact that the resulting plan might be
less than optimal - in other words, option #2. I am not sure that's
all that bad. Planning is ALWAYS an exercise in predicting the
future: we use statistics gathered at some point in the past, which
are furthermore imprecise, to predict what will happen if we try to
execute a given plan at some point in the future. Sometimes we are
wrong, but that doesn't prevent us from trying to our best to predict
the outcome; so here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 3, 2014 at 8:32 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Atri Sharma (atri.jiit@gmail.com) wrote:
So the planner keeps all possibility satisfying plans, or it looks at the
possible conditions (like presence of foreign key for this case, for eg)
and then lets executor choose between them?Right, this was one of the thoughts that I had.
So is the idea essentially making the planner return a set of "best"
plans,
one for each condition? Are we assured of their optimality at the local
level i.e. at each possibility?We *already* have an idea of there being multiple plans (see
plancache.c).Thanks for pointing me there.
What I am concerned about is that in this case, the option plans are
competing plans rather than separate plans.
My main concern is that we might be not able to discard plans that we know
that are not optimal early in planning. My understanding is that planner is
aggressive when discarding potential paths. Maintaining them ahead and
storing and returning them might have issues, but that is only my thought.
--
Regards,
Atri
*l'apprenant*
On 2014-12-03 10:51:19 -0500, Robert Haas wrote:
On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:
*** Method 1: Removing Inner Joins at planning time:
*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)[....]
a. can we invoke the planner during executor init?
I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused.
We already have a transformation between the plan and execution
tree. I'm right now not seing why transforming the trees in
ExecInitNode() et. al. would be unsafe - it looks fairly simple to
switch between different execution plans there.
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-12-03 10:51:19 -0500, Robert Haas wrote:
On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:
*** Method 1: Removing Inner Joins at planning time:
*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)[....]
a. can we invoke the planner during executor init?
I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused.We already have a transformation between the plan and execution
tree.
We do?
I think what we have is a plan tree, which is potentially stored in a
plan cache someplace and thus must be read-only, and a planstate tree,
which contains the stuff that is for this specific execution. There's
probably some freedom to do exciting things in the planstate nodes,
but I don't think you can tinker with the plan itself.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-12-03 11:11:49 -0500, Robert Haas wrote:
On Wed, Dec 3, 2014 at 10:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-12-03 10:51:19 -0500, Robert Haas wrote:
On Wed, Dec 3, 2014 at 4:29 AM, David Rowley <dgrowleyml@gmail.com> wrote:
*** Method 1: Removing Inner Joins at planning time:
*** Method 2: Marking scans as possibly skippable during planning, and
skipping joins at execution (Andres' method)*** Method 3: Marking scans as possibly skippable during planning and
removing redundant join nodes at executor startup (Simon's method)[....]
a. can we invoke the planner during executor init?
I'm pretty sure that we can't safely invoke the planner during
executor startup, and that doing surgery on the plan tree (option #3)
is unsafe also. I'm pretty clear why the latter is unsafe: it might
be a copy of a data structure that's going to be reused.We already have a transformation between the plan and execution
tree.We do?
I think what we have is a plan tree, which is potentially stored in a
plan cache someplace and thus must be read-only, and a planstate tree,
which contains the stuff that is for this specific execution. There's
probably some freedom to do exciting things in the planstate nodes,
but I don't think you can tinker with the plan itself.
Well, the planstate tree is what determines the execution, right? I
don't see what would stop us from doing something like replacing:
PlanState *
ExecInitNode(Plan *node, EState *estate, int eflags)
{
...
case T_NestLoop:
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);
by
case T_NestLoop:
if (JoinCanBeSkipped(node))
result = NonSkippedJoinNode(node);
else
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);
Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
from David's early patch where he put the logic entirely into the actual
execution phase.
We'd probably want to move the join nodes into a separate ExecInitJoin()
function and do the JoinCanBeSkipped() and NonSkippedJoin() node in the
generic code.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Atri Sharma (atri.jiit@gmail.com) wrote:
What I am concerned about is that in this case, the option plans are
competing plans rather than separate plans.
Not sure I follow this thought entirely.. The plans in the plancache
are competeing, but separate, plans.
My main concern is that we might be not able to discard plans that we know
that are not optimal early in planning. My understanding is that planner is
aggressive when discarding potential paths. Maintaining them ahead and
storing and returning them might have issues, but that is only my thought.
The planner is aggressive at discarding potential paths, but this is all
a consideration for how expensive this particular optimization is, not
an issue with the approach itself. We certainly don't want an
optimization that doubles the time for 100% of queries planned but only
saves time in 5% of the cases, but if we can spend an extra 5% of the
time required for planning in the 1% of cases where the optimization
could possibly happen to save a huge amount of time for those queries,
then it's something to consider.
We would definitely want to spend as little time as possible checking
for this optimization in cases where it isn't possible to use the
optimization.
Thanks,
Stephen
On Wed, Dec 3, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Well, the planstate tree is what determines the execution, right? I
don't see what would stop us from doing something like replacing:
PlanState *
ExecInitNode(Plan *node, EState *estate, int eflags)
{
...
case T_NestLoop:
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);
by
case T_NestLoop:
if (JoinCanBeSkipped(node))
result = NonSkippedJoinNode(node);
else
result = (PlanState *) ExecInitNestLoop((NestLoop *) node,
estate, eflags);Where JoinCanBeSkipped() and NonSkippedJoinNode() contain the logic
from David's early patch where he put the logic entirely into the actual
execution phase.
Yeah, maybe. I think there's sort of a coding principle that the plan
and planstate trees should match up one-to-one, but it's possible that
nothing breaks if they don't, or that I've misunderstood the coding
rule in the first instance.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers