Proposal: relaxing link between explicit JOINs and execution order
There's been some recent discussion about the fact that Postgres treats
explicit JOIN syntax as constraining the actual join plan, cf
http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.html
This behavior was originally in there simply because of lack of time to
consider alternatives. I now realize that it wouldn't be hard to get
the planner to do better --- basically, preprocess_jointree just has to
be willing to fold JoinExpr-under-JoinExpr into a FromExpr when the
joins are inner joins.
But in the meantime, some folks have found the present behavior to be
a feature rather than a bug, since it lets them control planning time
on many-table queries. If we are going to change it, I think we need
some way to accommodate both camps.
What I've been toying with is inventing a GUC variable or two. I am
thinking of defining a variable that sets the maximum size of a FromExpr
that preprocess_jointree is allowed to create by folding JoinExprs.
If this were set to 2, the behavior would be the same as before: no
collapsing of JoinExprs can occur. If it were set to a large number,
inner JOIN syntax would never affect the planner at all. In practice
it'd be smart to leave it at some value less than GEQO_THRESHOLD, so
that folding a large number of JOINs wouldn't leave you with a query
that takes a long time to plan or produces unpredictable plans.
There is already a need for a GUC variable to control the existing
behavior of preprocess_jointree: right now, it arbitrarily uses
GEQO_THRESHOLD/2 as the limit for the size of a FromExpr that can be
made by collapsing FromExprs together. This ought to be a separately
settable parameter, I think.
Comments? In particular, can anyone think of pithy names for these
variables? The best I'd been able to come up with is MAX_JOIN_COLLAPSE
and MAX_FROM_COLLAPSE, but neither of these exactly sing...
regards, tom lane
There's been some recent discussion about the fact that Postgres
treats explicit JOIN syntax as constraining the actual join plan, cf
http://www.ca.postgresql.org/users-lounge/docs/7.3/postgres/explicit-joins.htmlThis behavior was originally in there simply because of lack of time
to consider alternatives. I now realize that it wouldn't be hard to
get the planner to do better --- basically, preprocess_jointree just
has to be willing to fold JoinExpr-under-JoinExpr into a FromExpr
when the joins are inner joins.But in the meantime, some folks have found the present behavior to be
a feature rather than a bug, since it lets them control planning time
on many-table queries. If we are going to change it, I think we need
some way to accommodate both camps.
[snip]
Comments? In particular, can anyone think of pithy names for these
variables? The best I'd been able to come up with is
MAX_JOIN_COLLAPSE and MAX_FROM_COLLAPSE, but neither of these
exactly sing...
How about something that's runtime tunable via a SET/SHOW config var?
There are some queries that I have that I haven't spent any time
tuning and would love to have the planner spend its CPU thinking about
it instead of mine. Setting it to 2 by default, then on my tuned
queries, setting to something obscenely high so the planner won't muck
with what I know is fastest (or so I think at least).
I know this is a can of worms, but what about piggy backing on an
Oracle notation and having an inline way of setting this inside of a
comment?
SELECT /* +planner:collapse_tables=12 */ .... ?
^^^^^^^ ^^^^^^^^^^^^^^^ ^^^
system variable value
::shrug:: In brainstorm mode. Anyway, a few names:
auto_order_join
auto_order_join_max
auto_reorder_table_limit
auto_collapse_join
auto_collapse_num_join
auto_join_threshold
When I'm thinking about what this variable will do for me as a DBA, I
think it will make the plan more intelligent by reordering the joins.
My $0.02. -sc
--
Sean Chittenden
Sean Chittenden <sean@chittenden.org> writes:
How about something that's runtime tunable via a SET/SHOW config var?
Er, that's what I was talking about.
I know this is a can of worms, but what about piggy backing on an
Oracle notation and having an inline way of setting this inside of a
comment?
I don't want to go there ...
regards, tom lane
Tom,
I am very strongly in favor of this idea. I would personally prefer it if
the Join collapsing parmeter could be set at query time through a SET
statement, but will of course defer to the difficulty level in doing so.
Comments? In particular, can anyone think of pithy names for these
variables? The best I'd been able to come up with is MAX_JOIN_COLLAPSE
and MAX_FROM_COLLAPSE, but neither of these exactly sing...
How about:
EXPLICIT_JOIN_MINIMUM
and
FROM_COLLAPSE_LIMIT
Just to make the two params not sound so identical?
--
-Josh Berkus
Aglio Database Solutions
San Francisco
Josh Berkus <josh@agliodbs.com> writes:
I am very strongly in favor of this idea. I would personally prefer it if
the Join collapsing parmeter could be set at query time through a SET
statement, but will of course defer to the difficulty level in doing so.
I guess I failed to make it clear that that's what I meant. GUC
variables are those things that you can set via SET, or in the
postgresql.conf file, etc. These values would be just as manipulable
as, say, ENABLE_SEQSCAN.
How about:
EXPLICIT_JOIN_MINIMUM
and
FROM_COLLAPSE_LIMIT
Just to make the two params not sound so identical?
Hmm. The two parameters would have closely related functions, so I'd
sort of think that the names *should* be pretty similar.
regards, tom lane
Josh Berkus <josh@agliodbs.com> writes:
How about:
EXPLICIT_JOIN_MINIMUM
and
FROM_COLLAPSE_LIMIT
I've implemented this using FROM_COLLAPSE_LIMIT and JOIN_COLLAPSE_LIMIT
as the variable names. It'd be easy enough to change if someone comes
up with better names. You can read updated documentation at
http://developer.postgresql.org/docs/postgres/explicit-joins.html
regards, tom lane