Early WIP/PoC for inlining CTEs
About a year ago I was briefly in discussion/collaboration with Adam Sah
regarding the topic of inlining CTEs into the query rather than treating
them as optimization barriers. We didn't take it very far (he sent me
some stuff, I wrote some stuff and sent it back, things kind of got
dropped at that point); but there's been some recent discussion of this
and some people have expressed an interest in seeing the code.
So I'm posting the parts that I wrote for the benefit of anyone wanting
to pick up the issue again. The assumption of this code is that some
form of syntax would exist to mark materialized CTEs and set the
"ctematerialized" flag.
I haven't rebased this or tested it since last year; this patch is
against b81eba6a65.
Posted for discussion, further development, criticism, whatever; feel
free to include this (with credit) in any relevant patch. Consider this
released under the PG license.
--
Andrew (irc:RhodiumToad)
Attachments:
cteinline.patchtext/x-patchDownload+157-0
On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:
About a year ago I was briefly in discussion/collaboration with Adam Sah
regarding the topic of inlining CTEs into the query rather than treating
them as optimization barriers. We didn't take it very far (he sent me
some stuff, I wrote some stuff and sent it back, things kind of got
dropped at that point); but there's been some recent discussion of this
and some people have expressed an interest in seeing the code.So I'm posting the parts that I wrote for the benefit of anyone wanting
to pick up the issue again. The assumption of this code is that some
form of syntax would exist to mark materialized CTEs and set the
"ctematerialized" flag.I haven't rebased this or tested it since last year; this patch is
against b81eba6a65.Posted for discussion, further development, criticism, whatever; feel
free to include this (with credit) in any relevant patch. Consider this
released under the PG license.--
Andrew (irc:RhodiumToad)In our environment we often want this to be a fence. For example it can
be used to only have smaller numbers of joins in each cte and not hit the
join collapse limit, or when we really know more about the subquery than
the optimizer and have something really specific there . So in general I
would not want the default functionality to change all of the queries we
have already written with this in mind. I do however like the idea of this
feature being an option, but I would question whether it perhaps worked the
other way around where you have to mark a CTE as not being a fence.
Curious what other RDBMSs do here?
Thanks,
Jeremy
On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:
On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:About a year ago I was briefly in discussion/collaboration with Adam Sah
regarding the topic of inlining CTEs into the query rather than treating
them as optimization barriers. We didn't take it very far (he sent me
some stuff, I wrote some stuff and sent it back, things kind of got
dropped at that point); but there's been some recent discussion of this
and some people have expressed an interest in seeing the code.So I'm posting the parts that I wrote for the benefit of anyone wanting
to pick up the issue again. The assumption of this code is that some
form of syntax would exist to mark materialized CTEs and set the
"ctematerialized" flag.I haven't rebased this or tested it since last year; this patch is
against b81eba6a65.Posted for discussion, further development, criticism, whatever; feel
free to include this (with credit) in any relevant patch. Consider this
released under the PG license.--
Andrew (irc:RhodiumToad)In our environment we often want this to be a fence. For example it can
be used to only have smaller numbers of joins in each cte and not hit the
join collapse limit, or when we really know more about the subquery than
the optimizer and have something really specific there . So in general I
would not want the default functionality to change all of the queries we
have already written with this in mind. I do however like the idea of this
feature being an option, but I would question whether it perhaps worked the
other way around where you have to mark a CTE as not being a fence.
This essentially has been discussed already:
http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ru
My read of the concensus (in which I am in the majority, so I might be
biased) is that we do want inlining to be the default. We were thinking
that it'd be necessary to provide a way to force inlining on the SQL
level for individual CTEs.
Curious what other RDBMSs do here?
They largely inline by default.
Greetings,
Andres Freund
"Andres" == Andres Freund <andres@anarazel.de> writes:
Andres> My read of the concensus (in which I am in the majority, so I
Andres> might be biased) is that we do want inlining to be the default.
Andres> We were thinking that it'd be necessary to provide a way to
Andres> force inlining on the SQL level for individual CTEs.
For the record, here is where it came up in the original discussion
exactly 10 years ago when the feature was being added:
/messages/by-id/87myk1rg4z.fsf@news-spur.riddles.org.uk
--
Andrew (irc:RhodiumToad)
Andres Freund <andres@anarazel.de> writes:
On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:
On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:Posted for discussion, further development, criticism, whatever; feel
free to include this (with credit) in any relevant patch. Consider this
released under the PG license.
In our environment we often want this to be a fence. For example it can
be used to only have smaller numbers of joins in each cte and not hit the
join collapse limit, or when we really know more about the subquery than
the optimizer and have something really specific there . So in general I
would not want the default functionality to change all of the queries we
have already written with this in mind. I do however like the idea of this
feature being an option, but I would question whether it perhaps worked the
other way around where you have to mark a CTE as not being a fence.
This essentially has been discussed already:
http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ru
My read of the concensus (in which I am in the majority, so I might be
biased) is that we do want inlining to be the default. We were thinking
that it'd be necessary to provide a way to force inlining on the SQL
level for individual CTEs.
We can't inline wCTEs (those containing insert/update/delete) without
risk of semantics change. I'd also not favor changing the semantics
for CTEs that are read more than once by the parent query. However,
a singly-referenced SELECT CTE could reasonably be treated as equivalent
to a sub-select-in-FROM, and then you would have the same mechanisms
for preventing inlining as you do for those cases, e.g. OFFSET 0.
And sticking in OFFSET 0 would be backwards-compatible too: your
code would still work the same in older releases, unlike if we invent
new syntax for this.
So if we're going to go in this direction, that's pretty much how
I'd envision it working.
Curious what other RDBMSs do here?
They largely inline by default.
Even for multi-referenced CTEs?
regards, tom lane
On 25/07/18 11:10, Andres Freund wrote:
On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:
On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:
[...]
In our environment we often want this to be a fence. For example it can
[...]
This essentially has been discussed already:
http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ruMy read of the concensus (in which I am in the majority, so I might be
biased) is that we do want inlining to be the default. We were thinking
that it'd be necessary to provide a way to force inlining on the SQL
level for individual CTEs.Curious what other RDBMSs do here?
They largely inline by default.
Greetings,
Andres Freund
If I'd not read anything about CTE's being a fence, I would have
implicitly assumed that they were optimised together with the main part
of the SQL statement, and I suspect that is the case for most people.
So I'm very much a favour of optimisation of CTE's being the default.
Cheers,
Gavin
Hi,
On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2018-07-24 18:03:43 -0500, Jeremy Finzel wrote:
On Tue, Jul 24, 2018 at 5:28 PM Andrew Gierth <andrew@tao11.riddles.org.uk>
wrote:Posted for discussion, further development, criticism, whatever; feel
free to include this (with credit) in any relevant patch. Consider this
released under the PG license.In our environment we often want this to be a fence. For example it can
be used to only have smaller numbers of joins in each cte and not hit the
join collapse limit, or when we really know more about the subquery than
the optimizer and have something really specific there . So in general I
would not want the default functionality to change all of the queries we
have already written with this in mind. I do however like the idea of this
feature being an option, but I would question whether it perhaps worked the
other way around where you have to mark a CTE as not being a fence.This essentially has been discussed already:
http://archives.postgresql.org/message-id/5351711493487900%40web53g.yandex.ru
My read of the concensus (in which I am in the majority, so I might be
biased) is that we do want inlining to be the default. We were thinking
that it'd be necessary to provide a way to force inlining on the SQL
level for individual CTEs.We can't inline wCTEs (those containing insert/update/delete) without
risk of semantics change.
Right.
I'd also not favor changing the semantics for CTEs that are read more
than once by the parent query.
I think medium term it'd be good to do a cost based analysis for that. I
think it's fine to not do that immediately, but we should imo keep that
in mind.
However, a singly-referenced SELECT CTE could reasonably be treated as
equivalent to a sub-select-in-FROM, and then you would have the same
mechanisms for preventing inlining as you do for those cases,
e.g. OFFSET 0. And sticking in OFFSET 0 would be backwards-compatible
too: your code would still work the same in older releases, unlike if
we invent new syntax for this.
I still think this is just doubling down on prior mistakes.
Curious what other RDBMSs do here?
They largely inline by default.
Even for multi-referenced CTEs?
I don't know.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
However, a singly-referenced SELECT CTE could reasonably be treated as
equivalent to a sub-select-in-FROM, and then you would have the same
mechanisms for preventing inlining as you do for those cases,
e.g. OFFSET 0. And sticking in OFFSET 0 would be backwards-compatible
too: your code would still work the same in older releases, unlike if
we invent new syntax for this.
I still think this is just doubling down on prior mistakes.
Not following what you think a better alternative is? I'd be the
first to agree that OFFSET 0 is a hack, but people are used to it.
Assuming that we go for inline-by-default for at least some cases,
there's a separate discussion to be had about whether it's worth
making a planner-control GUC to force the old behavior. I'm not
very excited about that, but I bet some people will be.
regards, tom lane
Hi,
On 2018-07-24 19:57:49 -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
However, a singly-referenced SELECT CTE could reasonably be treated as
equivalent to a sub-select-in-FROM, and then you would have the same
mechanisms for preventing inlining as you do for those cases,
e.g. OFFSET 0. And sticking in OFFSET 0 would be backwards-compatible
too: your code would still work the same in older releases, unlike if
we invent new syntax for this.I still think this is just doubling down on prior mistakes.
Not following what you think a better alternative is?
An explicit keyword controlling the behaviour. WITH ... foo AS
MATERIALIZED (query) or whatnot.
I'd be the first to agree that OFFSET 0 is a hack, but people are used
to it.
So are they to CTEs being blocking.
Consider e.g. the case of being able to push down constraints into CTEs
(which have been inlined to allow for that). Even in queries with a
non-0 OFFSET you can push down in a number of cases, making CTE inlining
useful. You can't tack on an OFFSET 0 controlling that, without going
for a superflous subquery that just has an OFFSET 0, which is fairly
ridiculous. What if we learn to inline subqueries with some offsets?
Assuming that we go for inline-by-default for at least some cases,
there's a separate discussion to be had about whether it's worth
making a planner-control GUC to force the old behavior. I'm not
very excited about that, but I bet some people will be.
Yea, I am not either. I think we shouldn't go for it.
Greetings,
Andres Freund
"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> We can't inline wCTEs (those containing insert/update/delete)
Tom> without risk of semantics change.
Clearly.
Tom> I'd also not favor changing the semantics for CTEs that are read
Tom> more than once by the parent query.
This one's more debatable. There will still be cases where a CTE
referenced multiple times will be better inlined.
(It's obviously trivial to make the posted code do it that way, just by
checking cterefcount.)
Tom> However, a singly-referenced SELECT CTE could reasonably be
Tom> treated as equivalent to a sub-select-in-FROM,
In the PoC code I also excluded SELECT FOR UPDATE from inlining.
(There's already a difference between how SELECT FOR UPDATE works for
CTEs compared to subqueries and views, the comments mention it)
There might also be some merit in checking for volatility?
--
Andrew (irc:RhodiumToad)
"Andres" == Andres Freund <andres@anarazel.de> writes:
Andres> Even in queries with a non-0 OFFSET you can push down in a
Andres> number of cases,
really?
--
Andrew (irc:RhodiumToad)
On 2018-07-25 01:08:44 +0100, Andrew Gierth wrote:
"Andres" == Andres Freund <andres@anarazel.de> writes:
Andres> Even in queries with a non-0 OFFSET you can push down in a
Andres> number of cases,really?
Yea. I guess it's a bit dependant on what kind of behaviour you consider
as "pushing down". I'm doubtful it's worth the analytical complexity on
ensuring it's safe, however. With knowledge from the outer query you
e.g. can: trim the target list; remove outer joins below the OFFSET 0;
push down a restriction into an outer join below the OFFSET if that's
guaranteed to only return max one row, and not needed if not matching
the restrcition. I'm sure you can come up with more?
Greetings,
Andres Freund
On Tue, Jul 24, 2018 at 11:28:21PM +0100, Andrew Gierth wrote:
About a year ago I was briefly in discussion/collaboration with Adam Sah
regarding the topic of inlining CTEs into the query rather than treating
them as optimization barriers. We didn't take it very far (he sent me
some stuff, I wrote some stuff and sent it back, things kind of got
dropped at that point); but there's been some recent discussion of this
and some people have expressed an interest in seeing the code.So I'm posting the parts that I wrote for the benefit of anyone wanting
to pick up the issue again. The assumption of this code is that some
form of syntax would exist to mark materialized CTEs and set the
"ctematerialized" flag.I haven't rebased this or tested it since last year; this patch is
against b81eba6a65.
Please find attached a version rebased atop 167075be3ab1547e18 with
what I believe are appropriate changes to regression test output. The
other changes to the regression tests output are somewhat puzzling, as
they change the actual results of queries. I've also attached both
the "leftover" diff and the files to which it should be applied.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
"David" == David Fetter <david@fetter.org> writes:
David> Please find attached a version rebased atop 167075be3ab1547e18
David> with what I believe are appropriate changes to regression test
David> output. The other changes to the regression tests output are
David> somewhat puzzling, as they change the actual results of queries.
Both of those changes are the result of volatile CTEs being inlined more
than once (in one case, as part of an explicit test to ensure that CTEs
are being materialized and not multiply evaluated).
If you look for the XXX comment in the patch, it should be easy to add a
check that skips inlining if cterefcount > 1 and
contains_volatile_functions is true.
--
Andrew (irc:RhodiumToad)
On Wed, Jul 25, 2018 at 07:42:37AM +0200, David Fetter wrote:
Please find attached a version rebased atop 167075be3ab1547e18 with
what I believe are appropriate changes to regression test output. The
other changes to the regression tests output are somewhat puzzling, as
they change the actual results of queries. I've also attached both
the "leftover" diff and the files to which it should be applied.
I think the SQL programmer needs some control over whether a CTE is:
- a materialized view -- and therefore a barrier
- a view (which can then be inlined by the optimizer)
It is possible to add a keyword for this purpose in the WITH syntax:
WITH VIEW (...) AS a_view
, MATERIALIZED VIEW (...) AS a_barrier
...;
This would be a lot like creating TEMP views, but without the catalog
overhead.
(I wonder how hard it would be to partiion the OID namespace into
temp/persistent ranges so that temp schema elements need not be written
into the catalog.)
Nico
--
"Nico" == Nico Williams <nico@cryptonector.com> writes:
Nico> It is possible to add a keyword for this purpose in the WITH syntax:
Nico> WITH VIEW (...) AS a_view
The existing (and standard) syntax is WITH ctename AS (query).
Syntaxes that have been suggested for explicitly controlling the
materialization are along the lines of:
WITH ctename AS [[NOT] MATERIALIZED] (query)
(MATERIALIZED is already an un-reserved keyword)
--
Andrew (irc:RhodiumToad)
On Wed, Jul 25, 2018 at 05:08:43PM +0100, Andrew Gierth wrote:
Nico> It is possible to add a keyword for this purpose in the WITH syntax:
Nico> WITH VIEW (...) AS a_view
The existing (and standard) syntax is WITH ctename AS (query).
Oy, I flubbed that up.
Syntaxes that have been suggested for explicitly controlling the
materialization are along the lines of:WITH ctename AS [[NOT] MATERIALIZED] (query)
(MATERIALIZED is already an un-reserved keyword)
Works for me.
On Tue, Jul 24, 2018 at 07:57:49PM -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
On 2018-07-24 19:49:19 -0400, Tom Lane wrote:
However, a singly-referenced SELECT CTE could reasonably be treated as
equivalent to a sub-select-in-FROM, and then you would have the same
mechanisms for preventing inlining as you do for those cases,
e.g. OFFSET 0. And sticking in OFFSET 0 would be backwards-compatible
too: your code would still work the same in older releases, unlike if
we invent new syntax for this.I still think this is just doubling down on prior mistakes.
Not following what you think a better alternative is? I'd be the
first to agree that OFFSET 0 is a hack, but people are used to it.Assuming that we go for inline-by-default for at least some cases,
there's a separate discussion to be had about whether it's worth
making a planner-control GUC to force the old behavior. I'm not
very excited about that, but I bet some people will be.
It is widely known that CTEs in PG are optimizer barriers.
That actually is useful, and I do make use of that fact (though I'm not
proud of it).
My proposal is that PG add an extension for specifying that a CTE is to
be materialized (barrier) or not (then inlined).
Nico
--
On Wed, Jul 25, 2018 at 04:18:42PM +0100, Andrew Gierth wrote:
"David" == David Fetter <david@fetter.org> writes:
David> Please find attached a version rebased atop 167075be3ab1547e18
David> with what I believe are appropriate changes to regression test
David> output. The other changes to the regression tests output are
David> somewhat puzzling, as they change the actual results of queries.Both of those changes are the result of volatile CTEs being inlined more
than once (in one case, as part of an explicit test to ensure that CTEs
are being materialized and not multiply evaluated).If you look for the XXX comment in the patch, it should be easy to add a
check that skips inlining if cterefcount > 1 and
contains_volatile_functions is true.
Thanks for the broad hints!
Please find attached the next version, which passes 'make check'.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachments:
0001-Inlining-CTEs-v0003.patchtext/x-diff; charset=us-asciiDownload+201-62
On 07/25/2018 06:08 PM, Andrew Gierth wrote:
WITH ctename AS [[NOT] MATERIALIZED] (query)
I think "NOT MATERIALIZED" would be a bit misleading since the planner
may choose to materialize anyway, so I suggest skipping that part of the
syntax unless there is a really strong reason for having it.
Andreas