Re: CTE optimization fence on the todo list?

Started by Chris Rogersover 10 years ago19 messages

Chris Rogers

teukros@gmail.com

over 10 years ago

Has there been any movement on this in the last couple years?

I could really use the ability to optimize across CTE boundaries, and it
seems like a lot of other people could too.

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Chris Rogers (#1)

On Thu, Apr 30, 2015 at 12:44 AM, Chris Rogers <teukros@gmail.com> wrote:

Has there been any movement on this in the last couple years?

I could really use the ability to optimize across CTE boundaries, and it
seems like a lot of other people could too.

I'm not aware that anyone is working on it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Robert Haas (#2)

On 4/30/15 6:35 AM, Robert Haas wrote:

On Thu, Apr 30, 2015 at 12:44 AM, Chris Rogers <teukros@gmail.com> wrote:

Has there been any movement on this in the last couple years?

I could really use the ability to optimize across CTE boundaries, and it
seems like a lot of other people could too.

I'm not aware that anyone is working on it.

ISTR a comment to the effect of the SQL standard effectively requires
current behavior.

I'd still love to see a way around that though, even if it means some
kind of additional syntax; WITH is a lot nicer way to factor a query
than 10 nested subselects...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Jim Nasby (#3)

On Fri, May 1, 2015 at 4:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

ISTR a comment to the effect of the SQL standard effectively requires
current behavior.

I'd be astonished. The SQL standard doesn't even know that there is
such a thing as an index, so I presume it doesn't dictate the behavior
of the query planner either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Jim Nasby (#3)

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

On 4/30/15 6:35 AM, Robert Haas wrote:

On Thu, Apr 30, 2015 at 12:44 AM, Chris Rogers <teukros@gmail.com> wrote:

I could really use the ability to optimize across CTE boundaries, and it
seems like a lot of other people could too.

I'm not aware that anyone is working on it.

ISTR a comment to the effect of the SQL standard effectively requires
current behavior.

I doubt that the spec says anything about it one way or another.
However, there are a lot of cases where we definitely can't push
constraints into a WITH:
* Data-modifying query in the WITH, eg UPDATE RETURNING --- pushing
outer constraints into it would change the set of rows updated.
* Multiply-referenced WITH item (unless the outer query applies
identical constraints to each reference, which seems silly and not
worth the cycles to check for).
* Recursive WITH item (well, maybe in some cases you could push down a
clause and not change the results, but it seems very hard to analyze).

So initially we just punted and didn't consider flattening WITHs at
all. I'm not sure to what extent people are now expecting that behavior
and would be annoyed if we changed it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Robert Haas (#4)

On Fri, May 1, 2015 at 2:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, May 1, 2015 at 4:53 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

ISTR a comment to the effect of the SQL standard effectively requires
current behavior.

I'd be astonished. The SQL standard doesn't even know that there is
such a thing as an index, so I presume it doesn't dictate the behavior
of the query planner either.

I agree. Somehow, the idea that this is within the standard caught on,
but I'm almost certain it's false.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Steele

david@pgmasters.net

over 10 years ago

In reply to: Tom Lane (#5)

On 5/1/15 5:39 PM, Tom Lane wrote:

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

On 4/30/15 6:35 AM, Robert Haas wrote:

On Thu, Apr 30, 2015 at 12:44 AM, Chris Rogers <teukros@gmail.com> wrote:

I could really use the ability to optimize across CTE boundaries, and it
seems like a lot of other people could too.

I'm not aware that anyone is working on it.

ISTR a comment to the effect of the SQL standard effectively requires
current behavior.

I doubt that the spec says anything about it one way or another.
However, there are a lot of cases where we definitely can't push
constraints into a WITH:
* Data-modifying query in the WITH, eg UPDATE RETURNING --- pushing
outer constraints into it would change the set of rows updated.
* Multiply-referenced WITH item (unless the outer query applies
identical constraints to each reference, which seems silly and not
worth the cycles to check for).
* Recursive WITH item (well, maybe in some cases you could push down a
clause and not change the results, but it seems very hard to analyze).

So initially we just punted and didn't consider flattening WITHs at
all. I'm not sure to what extent people are now expecting that behavior
and would be annoyed if we changed it.

I use CTEs for both organizational purposes and as optimization barriers
(in preference to using temp tables, when possible).

I'd definitely prefer to keep the barriers in place by default, perhaps
with a keyword to allow optimization across boundaries when appropriate.
However, when I really need optimization across boundaries I just use a
subquery.

It doesn't seem like there's much to be gained in terms of net
functionality.

--
- David Steele
david@pgmasters.net

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: David Steele (#7)

David Steele <david@pgmasters.net> writes:

On 5/1/15 5:39 PM, Tom Lane wrote:

I doubt that the spec says anything about it one way or another.
However, there are a lot of cases where we definitely can't push
constraints into a WITH:
* Data-modifying query in the WITH, eg UPDATE RETURNING --- pushing
outer constraints into it would change the set of rows updated.
* Multiply-referenced WITH item (unless the outer query applies
identical constraints to each reference, which seems silly and not
worth the cycles to check for).
* Recursive WITH item (well, maybe in some cases you could push down a
clause and not change the results, but it seems very hard to analyze).

So initially we just punted and didn't consider flattening WITHs at
all. I'm not sure to what extent people are now expecting that behavior
and would be annoyed if we changed it.

I use CTEs for both organizational purposes and as optimization barriers
(in preference to using temp tables, when possible).

I'd definitely prefer to keep the barriers in place by default, perhaps
with a keyword to allow optimization across boundaries when appropriate.
However, when I really need optimization across boundaries I just use a
subquery.

FWIW, a bit of thought suggests that it would not take a lot of code to
handle this: you'd just have to check the conditions mentioned above
and then convert the RTE_CTE item into an RTE_SUBQUERY, much like
inline_set_returning_functions does with RTE_FUNCTION items.

Assuming that that sketch is accurate, it would take more code to provide
a new user-visible knob to enable/disable the behavior than it would to
implement the optimization, which makes me pretty much -1 on providing
such a knob. We should either do it or not. If we do, people who want
optimization fences should use the traditional "OFFSET 0" hack.

(A possible compromise position would be to offer a new GUC to
enable/disable the optimization globally; that would add only a reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@heroku.com

over 10 years ago

In reply to: Tom Lane (#8)

On Fri, May 1, 2015 at 3:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Assuming that that sketch is accurate, it would take more code to provide
a new user-visible knob to enable/disable the behavior than it would to
implement the optimization, which makes me pretty much -1 on providing
such a knob. We should either do it or not. If we do, people who want
optimization fences should use the traditional "OFFSET 0" hack.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

David Steele

david@pgmasters.net

over 10 years ago

In reply to: Peter Geoghegan (#9)

On 5/1/15 6:32 PM, Peter Geoghegan wrote:

On Fri, May 1, 2015 at 3:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Assuming that that sketch is accurate, it would take more code to provide
a new user-visible knob to enable/disable the behavior than it would to
implement the optimization, which makes me pretty much -1 on providing
such a knob. We should either do it or not. If we do, people who want
optimization fences should use the traditional "OFFSET 0" hack.

+1

Not sure if I'm thrilled with the "OFFSET 0" hack but I guess it's not
much different from the CTE hack I've been using.

An "enable_cte_optimization" GUC would serve to keep old code from
breaking while giving new users/queries the advantage of optimization.

I'm not sure it's worth adding the complexity, though. In my experience
not that many developers use CTEs.

--
- David Steele
david@pgmasters.net

#11

Josh Berkus

josh@agliodbs.com

over 10 years ago

In reply to: Chris Rogers (#1)

On 05/01/2015 03:30 PM, Tom Lane wrote:

Assuming that that sketch is accurate, it would take more code to provide
a new user-visible knob to enable/disable the behavior than it would to
implement the optimization, which makes me pretty much -1 on providing
such a knob. We should either do it or not. If we do, people who want
optimization fences should use the traditional "OFFSET 0" hack.

Yes.

(A possible compromise position would be to offer a new GUC to
enable/disable the optimization globally; that would add only a reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

We'd need the GUC. I know of a lot of cases where people are using WITH
clauses specifically to override the query planner, and requiring them
to edit all of their queries in order to enable the old behavior would
become an upgrade barrier.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reply to msg id not found: WMcb1bf737ad9f62bfffe194a78293a06d4924dbd0a6cc2de5061ced0534b644b6e852b0ca1d8115f8c58f5a6a06ba72c5@asav-3.01.com

#12

David G. Johnston

david.g.johnston@gmail.com

over 10 years ago

In reply to: Tom Lane (#8)

On Fri, May 1, 2015 at 3:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Assuming that that sketch is accurate, it would take more code to provide
a new user-visible knob to enable/disable the behavior than it would to
implement the optimization, which makes me pretty much -1 on providing
such a knob. We should either do it or not. If we do, people who want
optimization fences should use the traditional "OFFSET 0" hack.

(A possible compromise position would be to offer a new GUC to
enable/disable the optimization globally; that would add only a reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

+1 to both. The default should be to allow the user to choose between CTE
and inline subqueries for style reasons alone - as much as possible since
you cannot have a correlated CTE nor a recursive subquery.

Trust in the planner, the planner is good. If it isn't then requiring
OFFSET 0 as the only means to create an optimization fence seems reasonable.

I like the GUC as an cheap means to keep the status-quo for those who
desire it.

While the idea of overriding the status-quo on a per-query basis has some
appeal the apparent cost-benefit ratio doesn't seem convincing.

David J.

#13

Andrew Dunstan

andrew@dunslane.net

over 10 years ago

In reply to: Josh Berkus (#11)

On 05/01/2015 07:24 PM, Josh Berkus wrote:

O

(A possible compromise position would be to offer a new GUC to
enable/disable the optimization globally; that would add only a reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

We'd need the GUC. I know of a lot of cases where people are using WITH
clauses specifically to override the query planner, and requiring them
to edit all of their queries in order to enable the old behavior would
become an upgrade barrier.

+100

This could be a very bad, almost impossible to catch, behaviour break.
Even if we add the GUC, we're probably going to be imposing very
significant code audit costs on some users.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Tom Lane

tgl@sss.pgh.pa.us

over 10 years ago

In reply to: Andrew Dunstan (#13)

Andrew Dunstan <andrew@dunslane.net> writes:

On 05/01/2015 07:24 PM, Josh Berkus wrote:

(A possible compromise position would be to offer a new GUC to
enable/disable the optimization globally; that would add only a reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

This could be a very bad, almost impossible to catch, behaviour break.
Even if we add the GUC, we're probably going to be imposing very
significant code audit costs on some users.

On what grounds do you claim it'd be a behavior break? It's possible
that the subquery flattening would result in less-desirable plans not
more-desirable ones, but the results should still be correct.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andrew Dunstan

andrew@dunslane.net

over 10 years ago

In reply to: Tom Lane (#14)

On 05/03/2015 11:49 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 05/01/2015 07:24 PM, Josh Berkus wrote:

(A possible compromise position would be to offer a new GUC to
enable/disable the optimization globally; that would add only a reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

This could be a very bad, almost impossible to catch, behaviour break.
Even if we add the GUC, we're probably going to be imposing very
significant code audit costs on some users.

On what grounds do you claim it'd be a behavior break? It's possible
that the subquery flattening would result in less-desirable plans not
more-desirable ones, but the results should still be correct.

I meant w.r.t. performance. Sorry if that wasn't clear.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Jim Nasby

Jim.Nasby@BlueTreble.com

over 10 years ago

In reply to: Andrew Dunstan (#15)

On 5/3/15 11:59 AM, Andrew Dunstan wrote:

On 05/03/2015 11:49 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 05/01/2015 07:24 PM, Josh Berkus wrote:

(A possible compromise position would be to offer a new GUC to
enable/disable the optimization globally; that would add only a
reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

This could be a very bad, almost impossible to catch, behaviour break.
Even if we add the GUC, we're probably going to be imposing very
significant code audit costs on some users.

On what grounds do you claim it'd be a behavior break? It's possible
that the subquery flattening would result in less-desirable plans not
more-desirable ones, but the results should still be correct.

I meant w.r.t. performance. Sorry if that wasn't clear.

To put this in perspective... I've seen things like this take query
runtime from minutes to multiple hours or worse; bad enough that
"behavior break" becomes a valid description.

We definitely need to highlight this in the release notes, and I think
the GUC would be mandatory.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Chris Rogers

teukros@gmail.com

over 10 years ago

In reply to: Jim Nasby (#16)

I need this feature a lot. Can anyone point me to a place in the code
where I can hack together a quick-and-dirty, compatibility-breaking
implementation? Thanks!

On Sun, May 3, 2015 at 10:03 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Show quoted text

On 5/3/15 11:59 AM, Andrew Dunstan wrote:

On 05/03/2015 11:49 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 05/01/2015 07:24 PM, Josh Berkus wrote:

(A possible compromise position would be to offer a new GUC to

enable/disable the optimization globally; that would add only a
reasonably
small amount of control code, and people who were afraid of the change
breaking their apps would probably want a global disable anyway.)

This could be a very bad, almost impossible to catch, behaviour break.

Even if we add the GUC, we're probably going to be imposing very
significant code audit costs on some users.

On what grounds do you claim it'd be a behavior break? It's possible
that the subquery flattening would result in less-desirable plans not
more-desirable ones, but the results should still be correct.

I meant w.r.t. performance. Sorry if that wasn't clear.

To put this in perspective... I've seen things like this take query
runtime from minutes to multiple hours or worse; bad enough that "behavior
break" becomes a valid description.

We definitely need to highlight this in the release notes, and I think the
GUC would be mandatory.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

over 10 years ago

In reply to: Chris Rogers (#17)

On Wed, May 20, 2015 at 12:58 AM, Chris Rogers <teukros@gmail.com> wrote:

I need this feature a lot. Can anyone point me to a place in the code where
I can hack together a quick-and-dirty, compatibility-breaking
implementation? Thanks!

Does this help?

/messages/by-id/38448.1430519406@sss.pgh.pa.us

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Qingqing Zhou

zhouqq.postgres@gmail.com

over 10 years ago

In reply to: Tom Lane (#5)

On Fri, May 1, 2015 at 2:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* Multiply-referenced WITH item (unless the outer query applies
identical constraints to each reference, which seems silly and not
worth the cycles to check for).

Not sure if I understand this correctly. Look at this query, CTE q is
referenced twice and it is obviously expand it helps:

postgres=# explain with q as (select * from a) select * from q q1 join
q q2 on q1.i=q2.i where q1.i <= 10 and q2.i >=2;
QUERY PLAN
----------------------------------------------------------------------
Nested Loop (cost=1443.59..1526.35 rows=9 width=16)
CTE q
-> Seq Scan on a a_2 (cost=0.00..1443.00 rows=100000 width=8)
-> Index Scan using ai on a (cost=0.29..8.45 rows=9 width=8)
Index Cond: (i <= 10)
-> Index Scan using ai on a a_1 (cost=0.29..8.31 rows=1 width=8)
Index Cond: ((i = a.i) AND (i >= 2))
(7 rows)

Another question is that CTEs might be used as an optimization fence.
Think about a query like this:

WITH q1 as /* 5 table joins */, q2 as /* 5 tables join */, q3 ...,
q4 SELECT ...

If we expand all CTEs, we may end up optimize join with many tables
(could be bad). Or it is possible that users intentionally arrange
join in that way (more or less like hints) to override the optimizer.

We could look at geqo_threshold and decide how shall we expand, but
this may not be better than a GUC variable.

Regards,
Qingqing

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers