LATERAL

Started by Robert Haasover 16 years ago40 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

I've attempted to search the archives for references to the SQL
LATERAL feature, which AIUI is fairly-frequently requested. Most of
the discussion that I've found harks back to 2003, and that seems to
discuss more the need for eventual support of the feature than exactly
what it is or how to implement it. Looking around the web a little, I
found these links:

http://farrago.sourceforge.net/design/CollectionTypes.html
http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/

Based on reading through this discussion, it appears that LATERAL is
mostly a bit of syntactic sugar that requests that the parser allow
you to reference tables at the same query level. Assuming that the
necessary executor support were present (which it's currently not),
it's unclear to me why one couldn't simply allow such references
unconditionally. We currently explicitly block such references in
parse_clause.c, but the comments indicate that this is for reasons of
standards-conformance, not semantic ambiguity.

It appears that we're not completely without the ability to handle
queries of this type. For example, this works:

select g, generate_series(1,g) as h from generate_series(1,10) g;

But this doesn't:

select g, h from generate_series(1,10) g, generate_series(1,g) h;

Just for kicks, I tried removing the code that throws a syntax error
on the latter query, which resulted in a core dump inside
ExecEvalVar(), execQual.c:546, trying to deference a TupleTableSlot.
I'm guessing that this is because it can't get access to the right
variables - the plan actually looks quite sane:

rhaas=# explain select g, h from generate_series(1,10) g,
generate_series(1,g) h;
QUERY PLAN
--------------------------------------------------------------------------------
Nested Loop (cost=0.00..22512.50 rows=1000000 width=8)
-> Function Scan on generate_series g (cost=0.00..12.50 rows=1000 width=4)
-> Function Scan on generate_series h (cost=0.00..12.50 rows=1000 width=4)
(3 rows)

The first query just shows up a function scan, which means there's
some projection operation happening here that isn't exposed at the
plan level. I'm guessing that what needs to happen here is that the
planner needs to be taught that LATERAL queries can only be
implemented as a nested loop (right? unless they're not really using
the LATERAL-ness...) and that the executor needs to be taught how to
make the vars for the outer side of a nestloop visible to the inner
side, when necessary.

Has anyone poked at this at all?

...Robert

#2David Fetter
david@fetter.org
In reply to: Robert Haas (#1)
Re: LATERAL

On Sun, Sep 06, 2009 at 11:59:22PM -0400, Robert Haas wrote:

I've attempted to search the archives for references to the SQL
LATERAL feature, which AIUI is fairly-frequently requested.
[snip]
Has anyone poked at this at all?

I believe Andrew (RhodiumToad) Gierth is taking a look at
implementing the full LATERAL feature from SQL:2008.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#1)
Re: LATERAL

On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote:

Based on reading through this discussion, it appears that LATERAL is
mostly a bit of syntactic sugar that requests that the parser allow
you to reference tables at the same query level. Assuming that the
necessary executor support were present (which it's currently not),
it's unclear to me why one couldn't simply allow such references
unconditionally.

Because joins can be reordered, whereas LATERAL creates a kind of
syntactic sequence point for join reordering. To pick up your example:

But this doesn't [work]:

select g, h from generate_series(1,10) g, generate_series(1,g) h;

You need to constrain the order of the from items in some way so the "g"
refers to something well-defined. That's what LATERAL does.

You could argue that the parser could infer the references and the
resultant join ordering restrictions automatically, but perhaps it was
deemed that an explicit specification would be less error-prone.

#4Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#3)
Re: LATERAL

On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:

On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote:

Based on reading through this discussion, it appears that LATERAL is
mostly a bit of syntactic sugar that requests that the parser allow
you to reference tables at the same query level.  Assuming that the
necessary executor support were present (which it's currently not),
it's unclear to me why one couldn't simply allow such references
unconditionally.

Because joins can be reordered, whereas LATERAL creates a kind of
syntactic sequence point for join reordering.  To pick up your example:

But this doesn't [work]:

select g, h from generate_series(1,10) g, generate_series(1,g) h;

You need to constrain the order of the from items in some way so the "g"
refers to something well-defined.  That's what LATERAL does.

I don't think so. All joins constrain the join order, but none of
them except FULL JOIN constrain it completely, and this is no
exception. Consider this query:

select * from foo f, generate_series(1, 10) g, generate_series(1, g) h
WHERE f.x = g;

There are two legal join orderings here: f {g h} and {f g} h, just as
there would be for a three-way inner join:

select * from foo f, goo g, hoo h WHERE f.x = g.x and g.x = h.x;

I believe that the join-order restrictions imposed by a LATERAL SRF
are identical to those that would be imposed by an inner join against
a table with join clauses referencing the same tables used in the
inputs to the SRF. What is different is that there is only one
possible method of implementing the join, namely, for each outer row,
evaluate the SRF. We can't hash or merge, and we can't swap the inner
and outer rels, but we do still have a choice as to whether to join
against h before or after joining against f.

You could argue that the parser could infer the references and the
resultant join ordering restrictions automatically, but perhaps it was
deemed that an explicit specification would be less error-prone.

Well, the irony is that our current code does already infer the
references - and then disallows them. It seems to me that if we
implement LATERAL(), we're basically going to be conditionalizing
those checks on whether the relevant subexpression has been wrapped in
LATERAL or not. Introducing a new keyword would make sense if it were
possible to write a query that is valid without LATERAL(), but means
something different with LATERAL() - but I'm currently unable to
devise such a scenario. Whether this is for want of creativity or
because none exists I'm not entirely sure. Any ideas?

...Robert

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: LATERAL

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:

You could argue that the parser could infer the references and the
resultant join ordering restrictions automatically, but perhaps it was
deemed that an explicit specification would be less error-prone.

Well, the irony is that our current code does already infer the
references - and then disallows them.

Because as often as not, they're mistakes. Please don't think
you're smarter than the spec here.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: LATERAL

On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:

You could argue that the parser could infer the references and the
resultant join ordering restrictions automatically, but perhaps it was
deemed that an explicit specification would be less error-prone.

Well, the irony is that our current code does already infer the
references - and then disallows them.

Because as often as not, they're mistakes.  Please don't think
you're smarter than the spec here.

You're frequently the first to criticize the spec, but I have no
interest in second-guessing whatever behavior the spec specifies for
this construct. I'm just trying to understand it, and as far as I can
tell, LATERAL() is just a piece of syntactic sugar that allows
expressions within to reference FROM items at the same query level.
As far as I can tell, it doesn't change the semantic of any legal
queries - it just renders legal notation that otherwise would be
rejected. But is my understanding correct?

I haven't got a copy of the spec, so that's a bit of a handicap. If
someone who does can look this up and comment on how it's supposed to
work, I would certainly appreciate that. My understanding of it is
currently based on random articles cherry-picked around the Internet
and a handful of emails from archives.postgresql.org, which seems a
little thin.

...Robert

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#6)
Re: LATERAL

Robert Haas escribi�:

I haven't got a copy of the spec, so that's a bit of a handicap. If
someone who does can look this up and comment on how it's supposed to
work, I would certainly appreciate that. My understanding of it is
currently based on random articles cherry-picked around the Internet
and a handful of emails from archives.postgresql.org, which seems a
little thin.

http://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#6)
Re: LATERAL

Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Because as often as not, they're mistakes.  Please don't think
you're smarter than the spec here.

You're frequently the first to criticize the spec, but I have no
interest in second-guessing whatever behavior the spec specifies for
this construct.

I'm not entirely sure you followed what Tom was getting at here. If you
did, feel free to ignore me.

I'm just trying to understand it, and as far as I can
tell, LATERAL() is just a piece of syntactic sugar that allows
expressions within to reference FROM items at the same query level.

What I'm gathering is that this may be correct, though I don't know for
sure. The point I think Tom was making is that even if it *is* just
syntactic sugar, we don't want to allow expressions to reference FROM
items at the same query level *unless* LATERAL is specified. Your
earlier comments sounded like you would want to implement allowing
expressions to refer to FROM items at the same query level without
LATERAL being specified.

I haven't got a copy of the spec, so that's a bit of a handicap. If
someone who does can look this up and comment on how it's supposed to
work, I would certainly appreciate that. My understanding of it is
currently based on random articles cherry-picked around the Internet
and a handful of emails from archives.postgresql.org, which seems a
little thin.

You can get a 'draft' that's very close to the spec pretty easily..
Just do '??sql' in IRC sometime..

Enjoy,

Stephen

#9Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#8)
Re: LATERAL

On Mon, Sep 7, 2009 at 9:12 PM, Stephen Frost<sfrost@snowman.net> wrote:

Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:

On Mon, Sep 7, 2009 at 7:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Because as often as not, they're mistakes.  Please don't think
you're smarter than the spec here.

You're frequently the first to criticize the spec, but I have no
interest in second-guessing whatever behavior the spec specifies for
this construct.

I'm not entirely sure you followed what Tom was getting at here.  If you
did, feel free to ignore me.

I'm just trying to understand it, and as far as I can
tell, LATERAL() is just a piece of syntactic sugar that allows
expressions within to reference FROM items at the same query level.

What I'm gathering is that this may be correct, though I don't know for
sure.  The point I think Tom was making is that even if it *is* just
syntactic sugar, we don't want to allow expressions to reference FROM
items at the same query level *unless* LATERAL is specified.  Your
earlier comments sounded like you would want to implement allowing
expressions to refer to FROM items at the same query level without
LATERAL being specified.

Fair enough. I think I started to drift off in the direction of
making that argument, but it wasn't really my point. The original
point I was trying to make is that we may not need to invent any kind
of new name-resolution or scoping in order to make LATERAL() work.
Instead, LATERAL() can just do everything normally with the exception
of not throwing the following errors when they would otherwise be
thrown:

subquery in FROM cannot refer to other relations of the same query level
function expression in FROM cannot refer to other relations of the
same query level

I'm not sure if I'm right about this, but if I am, it seems likely to
be a pretty straightforward change.

I haven't got a copy of the spec, so that's a bit of a handicap.  If
someone who does can look this up and comment on how it's supposed to
work, I would certainly appreciate that.  My understanding of it is
currently based on random articles cherry-picked around the Internet
and a handful of emails from archives.postgresql.org, which seems a
little thin.

You can get a 'draft' that's very close to the spec pretty easily..
Just do '??sql' in IRC sometime..

Thanks for the tip.

...Robert

#10Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#9)
Re: LATERAL

* Robert Haas (robertmhaas@gmail.com) wrote:

Fair enough. I think I started to drift off in the direction of
making that argument, but it wasn't really my point.

To be honest, I'm not sure I agree with Tom here on the value of
requiring a keyword to tell the system that you really mean what you
wrote. On the other hand, it sounds like the spec is pretty clear on
this, and I don't feel we should violate it just because we think it's
being silly on this point.

The original
point I was trying to make is that we may not need to invent any kind
of new name-resolution or scoping in order to make LATERAL() work.
Instead, LATERAL() can just do everything normally with the exception
of not throwing the following errors when they would otherwise be
thrown:

I don't know for sure, but I do hope you're right. I'd certainly love
to be able to do this in general.. There's a number of cases where I've
had to do the hokey-pokey to get around our lack of ability to do this
when using set-returning functions.

I'm not sure if I'm right about this, but if I am, it seems likely to
be a pretty straightforward change.

Please continue to explore it and propose a patch. :)

Thanks,

Stephen

#11Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#10)
Re: LATERAL

On Mon, Sep 7, 2009 at 10:08 PM, Stephen Frost<sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

Fair enough.  I think I started to drift off in the direction of
making that argument, but it wasn't really my point.

To be honest, I'm not sure I agree with Tom here on the value of
requiring a keyword to tell the system that you really mean what you
wrote.  On the other hand, it sounds like the spec is pretty clear on
this, and I don't feel we should violate it just because we think it's
being silly on this point.

That's pretty much where I am, too, modulo needing to better
understand the aforesaid spec.

The original
point I was trying to make is that we may not need to invent any kind
of new name-resolution or scoping in order to make LATERAL() work.
Instead, LATERAL() can just do everything normally with the exception
of not throwing the following errors when they would otherwise be
thrown:

I don't know for sure, but I do hope you're right.  I'd certainly love
to be able to do this in general..  There's a number of cases where I've
had to do the hokey-pokey to get around our lack of ability to do this
when using set-returning functions.

Me too.

I'm not sure if I'm right about this, but if I am, it seems likely to
be a pretty straightforward change.

Please continue to explore it and propose a patch. :)

Yeah, that's the not-so-easy part. Gotta grok this executor stuff
first before I can even think about implementing this.

...Robert

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#4)
Re: LATERAL

On mån, 2009-09-07 at 19:06 -0400, Robert Haas wrote:

On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentraut<peter_e@gmx.net> wrote:

Because joins can be reordered, whereas LATERAL creates a kind of
syntactic sequence point for join reordering. To pick up your example:

But this doesn't [work]:

select g, h from generate_series(1,10) g, generate_series(1,g) h;

You need to constrain the order of the from items in some way so the "g"
refers to something well-defined. That's what LATERAL does.

I don't think so. All joins constrain the join order,

I carefully did not say "constrain the join order" but "constraint the
order of the from items" and "a *syntactic* sequence point". If the
order of the from items is free, then you could also write the above
example as

select g, h from generate_series(1,g) h, generate_series(1,10) g;

but that would no longer be valid with LATERAL in there.

but none of
them except FULL JOIN constrain it completely, and this is no
exception.

FWIW, the spec says somewhere that LATERAL is not allowed at or near an
outer join. I'll have to read up on it again.

#13Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: David Fetter (#2)
Re: LATERAL

"David" == David Fetter <david@fetter.org> writes:

I've attempted to search the archives for references to the SQL
LATERAL feature, which AIUI is fairly-frequently requested.
[snip]
Has anyone poked at this at all?

David> I believe Andrew (RhodiumToad) Gierth is taking a look at
David> implementing the full LATERAL feature from SQL:2008.

I've looked, but I've not actually had time to try any actual work on
implementation, so anyone else who fancies a go shouldn't hesitate.

Just to pick up on some points from the discussion:

1. LATERAL has to be explicit because it changes the scope of
references. For example, in:
... (select ... FROM (select a AS b), (select b)) ...
the "b" in the second subselect could be an outer reference, but
it can't be a reference to the first subquery; whereas in:
... (select ... FROM (select a AS b), LATERAL (select b)) ...
the "b" in the second subselect refers to the result of the first
subselect.

2. LATERAL in general constrains both the join order and the join
plan, assuming any lateral references are actually made.

3. LATERAL specifically IS allowed with left outer joins, though the
syntax productions in the spec are sufficiently obscure that this
isn't obvious. In general there are (as far as I can tell from the
syntax rules) two ways to use it:

SELECT ... FROM foo, LATERAL (bar)

or

SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ...

Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded
(syntax rule 2 for "<joined table>").

4. LATERAL allows some optimizations that aren't currently done, either
by explicitly rewriting the query, or (in theory) the optimizer itself
could consider a lateral plan (I believe Oracle does this). This would
apply to queries of this form:

SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);

which currently forces the t2/t3 join to occur first even where t1 is
small; this could be rewritten with LATERAL as:

SELECT ...
FROM t1
LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
where t2.a=t1.a) s
ON true;

--
Andrew (irc:RhodiumToad)

#14Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Gierth (#13)
Re: LATERAL

On Tue, Sep 8, 2009 at 6:29 PM, Andrew
Gierth<andrew@tao11.riddles.org.uk> wrote:

"David" == David Fetter <david@fetter.org> writes:

 >> I've attempted to search the archives for references to the SQL
 >> LATERAL feature, which AIUI is fairly-frequently requested.
 >> [snip]
 >> Has anyone poked at this at all?

 David> I believe Andrew (RhodiumToad) Gierth is taking a look at
 David> implementing the full LATERAL feature from SQL:2008.

I've looked, but I've not actually had time to try any actual work on
implementation, so anyone else who fancies a go shouldn't hesitate.

Thanks for your thoughts - I appreciate it.

Just to pick up on some points from the discussion:

1. LATERAL has to be explicit because it changes the scope of
references.  For example, in:
... (select ... FROM (select a AS b), (select b)) ...
the "b" in the second subselect could be an outer reference, but
it can't be a reference to the first subquery; whereas in:
... (select ... FROM (select a AS b), LATERAL (select b)) ...
the "b" in the second subselect refers to the result of the first
subselect.

Can you provide a more complete example? I'm unable to construct a
working example of this type. For example:

rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1;
ERROR: subquery in FROM cannot refer to other relations of same query
level at character 50

Though this works as expected:
rhaas=# select (select 1 from (select a) x, (select b) y) from t1;

2. LATERAL in general constrains both the join order and the join
plan, assuming any lateral references are actually made.

Peter seemed to be saying that LATERAL() must syntactically follow the
same-level FROM items to which it refers. Is that your understanding
also?

3. LATERAL specifically IS allowed with left outer joins, though the
syntax productions in the spec are sufficiently obscure that this
isn't obvious.  In general there are (as far as I can tell from the
syntax rules) two ways to use it:

SELECT ... FROM foo, LATERAL (bar)

or

SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ...

Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded
(syntax rule 2 for "<joined table>").

Makes sense to me.

4. LATERAL allows some optimizations that aren't currently done, either
by explicitly rewriting the query, or (in theory) the optimizer itself
could consider a lateral plan (I believe Oracle does this). This would
apply to queries of this form:

SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);

which currently forces the t2/t3 join to occur first even where t1 is
small; this could be rewritten with LATERAL as:

SELECT ...
 FROM t1
      LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
                                  where t2.a=t1.a) s
      ON true;

Well, you haven't actually commuted the joins here - how do you have
in mind for PostgreSQL to execute this? I'm guessing that it's
something like a nest loop with t1 as the outer side and the lateral
subquery as the inner side, so that the executor repeatedly executes
"select * from t2 join t3 on t2.a = t3.a where t2.a = $1"?

...Robert

#15Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Robert Haas (#14)
Re: LATERAL

"Robert" == Robert Haas <robertmhaas@gmail.com> writes:

Just to pick up on some points from the discussion:

1. LATERAL has to be explicit because it changes the scope of
references.  For example, in:
... (select ... FROM (select a AS b), (select b)) ...
the "b" in the second subselect could be an outer reference, but
it can't be a reference to the first subquery; whereas in:
... (select ... FROM (select a AS b), LATERAL (select b)) ...
the "b" in the second subselect refers to the result of the first
subselect.

Robert> Can you provide a more complete example? I'm unable to
Robert> construct a working example of this type. For example:

Robert> rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1;
Robert> ERROR: subquery in FROM cannot refer to other relations of same query
Robert> level at character 50

That looks like a bug to me. The spec is explicit that the inner definition
of b is not in scope in the second subquery, and therefore that should parse
as an outer reference.

2. LATERAL in general constrains both the join order and the join
plan, assuming any lateral references are actually made.

Robert> Peter seemed to be saying that LATERAL() must syntactically
Robert> follow the same-level FROM items to which it refers. Is that
Robert> your understanding also?

LATERAL references must be to items defined in the same FROM clause and
to the left of the LATERAL.

The relevant language of the spec seems to be:

a) If TR is contained in a <from clause> FC with no intervening
<query expression>, then the scope clause SC of TR is the <select
statement: single row> or innermost <query specification> that
contains FC. The scope of a range variable of TR is the <select
list>, <where clause>, <group by clause>, <having clause>, and
<window clause> of SC, together with every <lateral derived table>
that is simply contained in FC and is preceded by TR, and every
<collection derived table> that is simply contained in FC and is
preceded by TR, and the <join condition> of all <joined table>s
contained in SC that contain TR. If SC is the <query specification>
that is the <query expression body> of a simple table query STQ,
then the scope of a range variable of TR also includes the <order
by clause> of STQ.

4. LATERAL allows some optimizations that aren't currently done, either
by explicitly rewriting the query, or (in theory) the optimizer itself
could consider a lateral plan (I believe Oracle does this). This would
apply to queries of this form:

SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);

which currently forces the t2/t3 join to occur first even where t1 is
small; this could be rewritten with LATERAL as:

SELECT ...
FROM t1
LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
where t2.a=t1.a) s
ON true;

Robert> Well, you haven't actually commuted the joins here - how do
Robert> you have in mind for PostgreSQL to execute this? I'm
Robert> guessing that it's something like a nest loop with t1 as the
Robert> outer side and the lateral subquery as the inner side, so
Robert> that the executor repeatedly executes
Robert> "select * from t2 join t3 on t2.a = t3.a where t2.a = $1"?

Yup.

The current execution plans for this type of query are completely
disastrous if t1 is small (or qualified so as to be small) and t2 and
t3 are large. Having LATERAL would allow the query to be rewritten to
perform reasonably; a bonus would be for the planner to consider the
lateral join automatically without requiring it to be explicitly
requested.

--
Andrew (irc:RhodiumToad)

#16Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Gierth (#15)
Re: LATERAL

On Wed, Sep 9, 2009 at 11:25 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:

 >> 4. LATERAL allows some optimizations that aren't currently done, either
 >> by explicitly rewriting the query, or (in theory) the optimizer itself
 >> could consider a lateral plan (I believe Oracle does this). This would
 >> apply to queries of this form:
 >>
 >> SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a);
 >>
 >> which currently forces the t2/t3 join to occur first even where t1 is
 >> small; this could be rewritten with LATERAL as:
 >>
 >> SELECT ...
 >>        FROM t1
 >>        LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a)
 >>                            where t2.a=t1.a) s
 >>        ON true;

 Robert> Well, you haven't actually commuted the joins here - how do
 Robert> you have in mind for PostgreSQL to execute this?  I'm
 Robert> guessing that it's something like a nest loop with t1 as the
 Robert> outer side and the lateral subquery as the inner side, so
 Robert> that the executor repeatedly executes
 Robert> "select * from t2 join t3 on t2.a = t3.a where t2.a = $1"?

Yup.

The current execution plans for this type of query are completely
disastrous if t1 is small (or qualified so as to be small) and t2 and
t3 are large. Having LATERAL would allow the query to be rewritten to
perform reasonably; a bonus would be for the planner to consider the
lateral join automatically without requiring it to be explicitly
requested.

I've been turning this one over in my head. It seems to me that this
is very similar to what we already do with inner index-scans, only
generalized to joinrels. When we consider nested loop plans in
match_unsorted_outer(), we really consider two quite different types
of plans - call them parameterized and unparameterized. The
unparameterized variants considered regardless of the details of the
inner plan, and basically rescan the same identical inner side once
per outer row, possibly with a materialize node interposed. The
parameterized variants are what the inner index-scan code is looking
at: they also involve rescanning the inner path, but not the same set
of tuples every time. Instead, we look for cases where an index is
available that can support a lookup of the specific values that the
outer side needs on each pass as an index condition.

Currently, however, we only consider this possibility when the inner
rel is NOT a joinrel. It seems like it might be possible to change
this, but it doesn't look straightforward. When the inner rel is a
baserel, there's really nothing to do except grovel through the
indexes looking for something useful, and then estimate the cost of
using it. When the inner rel is a joinrel, it's a lot more
complicated. A parameterized nested loop might be right even if only
some of the rels making up the joinrel are indexed (imagine t2 IJ t3
IJ t4 where t2 and t3 are large and indexed and t4 is small and
unindexed). In addition, the relations making up the inner rel could
be joined in any order and using a variety of strategies. The path
that is best when trying to suck down the ENTIRE inner rel doesn't
necessarily bear any resemblence to the plan that is best when trying
to suck down only a part of it.

To some extent, this seems like a tangent as far as LATERAL is
concerned: I think the primary reason why people want LATERAL
(certainly, why I want it) is for SRFs, for which there isn't any
choice of execution plan anyway. But it's certainly an interesting
optimization problem, and I wish I had some better ideas on how to
tackle it.

...Robert

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#16)
Re: LATERAL

Robert Haas <robertmhaas@gmail.com> writes:

I've been turning this one over in my head. It seems to me that this
is very similar to what we already do with inner index-scans, only
generalized to joinrels.

Right.

Currently, however, we only consider this possibility when the inner
rel is NOT a joinrel. It seems like it might be possible to change
this, but it doesn't look straightforward.

Well, it's straightforward enough in theory, it's just the exponential
growth in the number of join paths to consider that's a problem :-(.
I think what we'd need is a heuristic to limit the paths considered.

I think Andrew was suggesting that we not attempt to consider this
automatically, but only when the user writes the query in a way that
exposes it directly via LATERAL. While that goes against my normal
instincts for the planner, it isn't unreasonable as a first step.

regards, tom lane

#18Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#17)
Re: LATERAL

On Tue, Sep 22, 2009 at 11:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Currently, however, we only consider this possibility when the inner
rel is NOT a joinrel.  It seems like it might be possible to change
this, but it doesn't look straightforward.

Well, it's straightforward enough in theory, it's just the exponential
growth in the number of join paths to consider that's a problem :-(.
I think what we'd need is a heuristic to limit the paths considered.

[ picking this up again now that CommitFest is over ]

So that I don't have to keep referring to "this possibility" and
similar circumlocution, let's define an index-accelerated path (feel
free to suggest a different term) to be the generalization to joinrels
of what we now call a nested inner index-scan. In other words, they
can only be usefully used on the inner side of a nested loop, where
they'll be rescanned with different parameters for each outer row, and
they can only ever win when some part of the path involves a *partial*
index scan that can used one of the passed parameters as an index
qual.

For a fixed a set of parameters to be pushed down from the outer side,
it seems to me that we could build up a set of index-accelerated paths
for a joinrel {A B} by considering the combination of (1) an
index-accelerated path for A joined to an index-accelerated path for
B, or (2) an index-accelerated path for A joined to the cheapest total
path for B. I believe we don't need to worry about path keys because
this will only ever be used on the inner side of a nested loop, so the
pathkeys will be ignored anyway. In other words, the first heuristic
is that you can't build up an index-accelerated path for the joinrel
unless there is an index-accelerated path for a least one of its
baserels.

That still leaves a lot of silly paths, though. In many cases, if
you're thinking about joining A to {B C} using an index-accelerated
path, you'd be just as well off joining to B first and then to C. So
it might be that we only need to consider index-accelerated paths when
there is no legal join between the LHS and a subset of the RHS. But
I'm not completely sure that I'm right about this, nor whether it's
easy to check for.

The other problem I see here is that the bottom-up approach that we
use in general is going to be difficult to apply here, because the set
of paths will vary depending on what parameters are pushed down from
the outer side.

I think Andrew was suggesting that we not attempt to consider this
automatically, but only when the user writes the query in a way that
exposes it directly via LATERAL.  While that goes against my normal
instincts for the planner, it isn't unreasonable as a first step.

Possibly, but I'm not sure we have a good enough design sketch at this
point to even know whether such a restriction would be useful.

Another thought, only semi-related. One of the big use cases for
LATERAL in general is to use a set-returning function in the FROM
clause that uses vars from a preceding FROM item. I am idly wondering
if there's a reason why ExecProject is not its own node type. It
seems that we have close to the same logic scattered through a whole
bunch of different node types...

...Robert

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#18)
Re: LATERAL

Robert Haas <robertmhaas@gmail.com> writes:

Another thought, only semi-related. One of the big use cases for
LATERAL in general is to use a set-returning function in the FROM
clause that uses vars from a preceding FROM item. I am idly wondering
if there's a reason why ExecProject is not its own node type.

You generally need it everywhere. Scan nodes need it because you don't
want unused columns propagating upwards, and join nodes need it because
the two input tuples have to be unified somehow. We do skip projection
ability in a few node types, but I doubt it would be profitable to
remove it from the rest.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#18)
Re: LATERAL

Robert Haas <robertmhaas@gmail.com> writes:

That still leaves a lot of silly paths, though. In many cases, if
you're thinking about joining A to {B C} using an index-accelerated
path, you'd be just as well off joining to B first and then to C. So
it might be that we only need to consider index-accelerated paths when
there is no legal join between the LHS and a subset of the RHS.

Yeah. If there are no join order constraints, it's always possible to
form a plan such that you join a given rel only once you have all the
rels needed (to provide values for the inner indexscan) on the lefthand
side. I think this is probably the main reason why the issue was not
treated in the planner originally. So maybe the way to think about
this is as a way of dealing with join order constraints without losing
the benefits of inner indexscans.

The other problem I see here is that the bottom-up approach that we
use in general is going to be difficult to apply here, because the set
of paths will vary depending on what parameters are pushed down from
the outer side.

Well, we deal with that already --- the set of possible inner indexscan
paths already varies depending on what the LHS is. I think the point
here is that we'd be considering an inner path that's against an LHS
that it's not legal to join the inner rel to *directly*. Such a path
would only be legal if we later join to that LHS at a higher join level.
So we'd be keeping around some tentative paths that might not ever form
a valid join plan.

Maybe we should turn around the way that inner indexscan paths are
made. Currently we form them on-the-fly while considering a valid
join combination. Maybe we should build them all at the first level
(driving this off the set of available join clauses for each base rel)
and mark each such path as "requires a join to this other set of rels
to be valid". But then we'd go ahead and join such paths to *other*
rels, keeping the resulting join paths still marked as requiring the
same future join. Once that join actually happens, the resulting path
becomes fully valid. Only a join to a proper subset of the future-join
requirement would be disallowed meanwhile.

I'm not even sure this would be slower or more complicated than what we
do now --- if you look at the logic that caches potential inner
indexscan plans, it's almost doing this already. It would result in
considering more join paths, but only ones that have some plausible use.

regards, tom lane

#21Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#24)
#26Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#26)
#28Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Bruce Momjian (#26)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#24)
#30Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Andrew Gierth (#28)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Hitoshi Harada (#30)
#32Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Robert Haas (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#29)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#36)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#37)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#38)
#40Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#38)