Backport "WITH ... AS MATERIALIZED" syntax to <12?
I've been struggling with how we're going to upgrade launchpad.net to
PostgreSQL 12 or newer (we're currently on 10). We're one of those
applications that deliberately uses CTEs as optimization fences in a few
difficult places. The provision of the MATERIALIZED keyword in 12 is
great, but the fact that it doesn't exist in earlier versions is
awkward. We certainly don't want to upgrade our application code at the
same time as upgrading the database, and dealing with performance
degradation between the database upgrade and the application upgrade
doesn't seem great either (not to mention that it would be hard to
coordinate). That leaves us with hacking our query compiler to add the
MATERIALIZED keyword only if it's running on 12 or newer, which would be
possible but is pretty cumbersome.
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no need
to support "NOT MATERIALIZED" since that's explicitly requesting the new
query-folding feature that only exists in 12. Would something like the
attached patch against REL_11_STABLE be acceptable? I'd like to
backpatch it at least as far as PostgreSQL 10.
This compiles and passes regression tests.
Thanks,
--
Colin Watson [cjwatson@canonical.com]
Attachments:
0001-Backport-WITH-.-AS-MATERIALIZED-syntax.patchtext/x-diff; charset=us-asciiDownload
From 063186eb678ad9831961d6319f7a4279f1029358 Mon Sep 17 00:00:00 2001
From: Colin Watson <cjwatson@canonical.com>
Date: Fri, 18 Oct 2019 14:08:11 +0100
Subject: [PATCH] Backport "WITH ... AS MATERIALIZED" syntax
Applications that deliberately use CTEs as optimization fences need to
adjust their code to prepare for PostgreSQL 12. Unfortunately, the
MATERIALIZED keyword that they need to add isn't valid syntax in earlier
versions of PostgreSQL, so they're stuck with either upgrading the
application and the database simultaneously, accepting performance
degradation between the two parts of the upgrade, or doing complex query
compiler work to add MATERIALIZED conditionally.
It makes things much easier in these cases if the MATERIALIZED keyword
is accepted and ignored in earlier releases. Users can then upgrade to
a suitable point release, change their application code to add
MATERIALIZED, and then upgrade to PostgreSQL 12.
---
doc/src/sgml/queries.sgml | 12 ++++++++++
doc/src/sgml/ref/select.sgml | 18 +++++++++++++-
src/backend/parser/gram.y | 12 +++++++---
src/test/regress/expected/subselect.out | 31 +++++++++++++++++++++++++
src/test/regress/sql/subselect.sql | 14 +++++++++++
5 files changed, 83 insertions(+), 4 deletions(-)
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 88bc189646..cc33d92133 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2215,6 +2215,18 @@ SELECT n FROM t LIMIT 100;
rows.)
</para>
+ <para>
+ In some cases, <productname>PostgreSQL</productname> 12 folds
+ <literal>WITH</literal> queries into the parent query, allowing joint
+ optimization of the two query levels. You can override that decision by
+ specifying <literal>MATERIALIZED</literal> to force separate calculation
+ of the <literal>WITH</literal> query. While versions of
+ <productname>PostgreSQL</productname> before 12 do not support folding of
+ <literal>WITH</literal> queries, specifying
+ <literal>MATERIALIZED</literal> is permitted to ease application
+ upgrades.
+ </para>
+
<para>
The examples above only show <literal>WITH</literal> being used with
<command>SELECT</command>, but it can be attached in the same way to
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..1bd711a3cb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -72,7 +72,7 @@ SELECT [ ALL | DISTINCT [ ON ( <replaceable class="parameter">expression</replac
<phrase>and <replaceable class="parameter">with_query</replaceable> is:</phrase>
- <replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS ( <replaceable class="parameter">select</replaceable> | <replaceable class="parameter">values</replaceable> | <replaceable class="parameter">insert</replaceable> | <replaceable class="parameter">update</replaceable> | <replaceable class="parameter">delete</replaceable> )
+ <replaceable class="parameter">with_query_name</replaceable> [ ( <replaceable class="parameter">column_name</replaceable> [, ...] ) ] AS [ MATERIALIZED ] ( <replaceable class="parameter">select</replaceable> | <replaceable class="parameter">values</replaceable> | <replaceable class="parameter">insert</replaceable> | <replaceable class="parameter">update</replaceable> | <replaceable class="parameter">delete</replaceable> )
TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
</synopsis>
@@ -290,6 +290,17 @@ TABLE [ ONLY ] <replaceable class="parameter">table_name</replaceable> [ * ]
row, the results are unspecified.
</para>
+ <para>
+ <productname>PostgreSQL</productname> 12 folds side-effect-free
+ <literal>WITH</literal> queries into the primary query in some cases.
+ To override this and retain the behaviour up to
+ <productname>PostgreSQL</productname> 11, mark the
+ <literal>WITH</literal> query as <literal>MATERIALIZED</literal>. That
+ might be useful, for example, if the <literal>WITH</literal> query is
+ being used as an optimization fence to prevent the planner from choosing
+ a bad plan.
+ </para>
+
<para>
See <xref linkend="queries-with"/> for additional information.
</para>
@@ -2087,6 +2098,11 @@ SELECT distributors.* WHERE distributors.name = 'Westward';
<para>
<literal>ROWS FROM( ... )</literal> is an extension of the SQL standard.
</para>
+
+ <para>
+ The <literal>MATERIALIZED</literal> option of <literal>WITH</literal> is
+ an extension of the SQL standard.
+ </para>
</refsect2>
</refsect1>
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index bc65319c2c..70df09f409 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -479,7 +479,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%type <list> row explicit_row implicit_row type_list array_expr_list
%type <node> case_expr case_arg when_clause case_default
%type <list> when_clause_list
-%type <ival> sub_type
+%type <ival> sub_type opt_materialized
%type <value> NumericOnly
%type <list> NumericOnly_list
%type <alias> alias_clause opt_alias_clause
@@ -11419,17 +11419,23 @@ cte_list:
| cte_list ',' common_table_expr { $$ = lappend($1, $3); }
;
-common_table_expr: name opt_name_list AS '(' PreparableStmt ')'
+common_table_expr: name opt_name_list AS opt_materialized '(' PreparableStmt ')'
{
CommonTableExpr *n = makeNode(CommonTableExpr);
n->ctename = $1;
n->aliascolnames = $2;
- n->ctequery = $5;
+ n->ctequery = $6;
n->location = @1;
$$ = (Node *) n;
}
;
+/* Stub for forward-compatibility with PostgreSQL 12. */
+opt_materialized:
+ MATERIALIZED {}
+ | /*EMPTY*/ {}
+ ;
+
opt_with_clause:
with_clause { $$ = $1; }
| /*EMPTY*/ { $$ = NULL; }
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index a288c6d33b..85cd1e57cb 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1178,3 +1178,34 @@ fetch backward all in c1;
(2 rows)
commit;
+--
+-- Tests for CTE inlining behavior (forward-compatibility with PostgreSQL 12)
+--
+-- Basic subquery
+explain (verbose, costs off)
+with x as (select * from (select f1 from subselect_tbl) ss)
+select * from x where f1 = 1;
+ QUERY PLAN
+------------------------------------------
+ CTE Scan on x
+ Output: x.f1
+ Filter: (x.f1 = 1)
+ CTE x
+ -> Seq Scan on public.subselect_tbl
+ Output: subselect_tbl.f1
+(6 rows)
+
+-- Explicitly request materialization
+explain (verbose, costs off)
+with x as materialized (select * from (select f1 from subselect_tbl) ss)
+select * from x where f1 = 1;
+ QUERY PLAN
+------------------------------------------
+ CTE Scan on x
+ Output: x.f1
+ Filter: (x.f1 = 1)
+ CTE x
+ -> Seq Scan on public.subselect_tbl
+ Output: subselect_tbl.f1
+(6 rows)
+
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index eafd927e82..5a8ece180f 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -635,3 +635,17 @@ move forward all in c1;
fetch backward all in c1;
commit;
+
+--
+-- Tests for CTE inlining behavior (forward-compatibility with PostgreSQL 12)
+--
+
+-- Basic subquery
+explain (verbose, costs off)
+with x as (select * from (select f1 from subselect_tbl) ss)
+select * from x where f1 = 1;
+
+-- Explicitly request materialization
+explain (verbose, costs off)
+with x as materialized (select * from (select f1 from subselect_tbl) ss)
+select * from x where f1 = 1;
--
2.17.1
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no need
to support "NOT MATERIALIZED" since that's explicitly requesting the new
query-folding feature that only exists in 12. Would something like the
attached patch against REL_11_STABLE be acceptable? I'd like to
backpatch it at least as far as PostgreSQL 10.
I am afraid that new features don't gain a backpatch. This is a
project policy. Back-branches should just include bug fixes.
--
Michael
"Michael" == Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no
need to support "NOT MATERIALIZED" since that's explicitly
requesting the new query-folding feature that only exists in 12.
Would something like the attached patch against REL_11_STABLE be
acceptable? I'd like to backpatch it at least as far as PostgreSQL
10.
Michael> I am afraid that new features don't gain a backpatch. This is
Michael> a project policy. Back-branches should just include bug fixes.
I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just accepting
and ignoring some of the new syntax to make upgrading easier.
--
Andrew (irc:RhodiumToad)
On Sat, Oct 19, 2019 at 05:01:04AM +0100, Andrew Gierth wrote:
"Michael" == Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no
need to support "NOT MATERIALIZED" since that's explicitly
requesting the new query-folding feature that only exists in 12.
Would something like the attached patch against REL_11_STABLE be
acceptable? I'd like to backpatch it at least as far as PostgreSQL
10.Michael> I am afraid that new features don't gain a backpatch. This is
Michael> a project policy. Back-branches should just include bug fixes.I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just accepting
and ignoring some of the new syntax to make upgrading easier.
Right, this is my position too. I'm explicitly not asking for
backpatching of the CTE-inlining feature, just trying to cope with the
fact that we now have to spell some particular queries differently to
retain the performance characteristics we need for them.
I suppose an alternative would be to add a configuration option to 12
that allows disabling inlining of CTEs cluster-wide: we could then
upgrade to 12 with inlining disabled, add MATERIALIZED to the relevant
queries, and then re-enable inlining. But I like that less because it
would end up leaving cruft around in PostgreSQL's configuration code
somewhat indefinitely for the sake of an edge case in upgrading to a
particular version.
--
Colin Watson [cjwatson@canonical.com]
Hi,
On October 19, 2019 6:01:04 AM GMT+02:00, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Michael" == Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no
need to support "NOT MATERIALIZED" since that's explicitly
requesting the new query-folding feature that only exists in 12.
Would something like the attached patch against REL_11_STABLE be
acceptable? I'd like to backpatch it at least as far as PostgreSQL
10.Michael> I am afraid that new features don't gain a backpatch. This is
Michael> a project policy. Back-branches should just include bug fixes.I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just
accepting
and ignoring some of the new syntax to make upgrading easier.
+1
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Sat, Oct 19, 2019 at 11:56:56AM +0200, Andres Freund wrote:
Hi,
On October 19, 2019 6:01:04 AM GMT+02:00, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
"Michael" == Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no
need to support "NOT MATERIALIZED" since that's explicitly
requesting the new query-folding feature that only exists in 12.
Would something like the attached patch against REL_11_STABLE be
acceptable? I'd like to backpatch it at least as far as PostgreSQL
10.Michael> I am afraid that new features don't gain a backpatch. This is
Michael> a project policy. Back-branches should just include bug fixes.I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just
accepting
and ignoring some of the new syntax to make upgrading easier.+1
+0.5
In general, I'm not opposed to accepting and ignoring the MATERIALIZED
syntax (assuming we'd only accept AS MATERIALIZED, but not the negative
variant).
FWIW I'm not sure the "we don't want to upgrade application code at the
same time as the database" is really tenable. I don't think we really
promise that anywhere, and adding the AS MATERIALIZED seems quite
mechanical. I think we could find cases where we caused worse breaks
between major versions.
One disadvantage is that this will increase confusion for users, who'll
get used to the behavior on 12, and then they'll get confused on older
releases (e.g. if you don't specify AS MATERIALIZED you'd expect the CTE
to get inlined, but that won't happen).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 19, 2019 at 10:22:39AM +0100, Colin Watson wrote:
On Sat, Oct 19, 2019 at 05:01:04AM +0100, Andrew Gierth wrote:
"Michael" == Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no
need to support "NOT MATERIALIZED" since that's explicitly
requesting the new query-folding feature that only exists in 12.
Would something like the attached patch against REL_11_STABLE be
acceptable? I'd like to backpatch it at least as far as PostgreSQL
10.Michael> I am afraid that new features don't gain a backpatch. This is
Michael> a project policy. Back-branches should just include bug fixes.I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just accepting
and ignoring some of the new syntax to make upgrading easier.Right, this is my position too. I'm explicitly not asking for
backpatching of the CTE-inlining feature, just trying to cope with the
fact that we now have to spell some particular queries differently to
retain the performance characteristics we need for them.I suppose an alternative would be to add a configuration option to 12
that allows disabling inlining of CTEs cluster-wide: we could then
upgrade to 12 with inlining disabled, add MATERIALIZED to the relevant
queries, and then re-enable inlining. But I like that less because it
would end up leaving cruft around in PostgreSQL's configuration code
somewhat indefinitely for the sake of an edge case in upgrading to a
particular version.
I think having a GUC option was proposed and discussed while developping
the feature, and it was rejected - one of the reasons being experience
with similar GUCs in the past, which essentially just allowed users to
postpone the fix indefinitely, and increased our maintenance burden.
I wonder if an extension could do something like that, though. It can
install a hook after parse analysis, so I guess it could walk the CTEs
and mark them as materialized.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/19/19 6:48 AM, Tomas Vondra wrote:
On Sat, Oct 19, 2019 at 11:56:56AM +0200, Andres Freund wrote:
Hi,
On October 19, 2019 6:01:04 AM GMT+02:00, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:"Michael" == Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no
need to support "NOT MATERIALIZED" since that's explicitly
requesting the new query-folding feature that only exists in 12.
Would something like the attached patch against REL_11_STABLE be
acceptable? I'd like to backpatch it at least as far as PostgreSQL
10.Michael> I am afraid that new features don't gain a backpatch. This is
Michael> a project policy. Back-branches should just include bug fixes.I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just
accepting
and ignoring some of the new syntax to make upgrading easier.+1
+0.5
In general, I'm not opposed to accepting and ignoring the MATERIALIZED
syntax (assuming we'd only accept AS MATERIALIZED, but not the negative
variant).FWIW I'm not sure the "we don't want to upgrade application code at the
same time as the database" is really tenable.
I'm -1 for exactly this reason.
In any case, if you insist on using the same code with pre-12 and
post-12 releases, this should be achievable (at least in most cases) by
using the "offset 0" trick, shouldn't it?
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, 19 Oct 2019 at 10:53, Andrew Dunstan <andrew.dunstan@2ndquadrant.com>
wrote:
In general, I'm not opposed to accepting and ignoring the MATERIALIZED
syntax (assuming we'd only accept AS MATERIALIZED, but not the negative
variant).FWIW I'm not sure the "we don't want to upgrade application code at the
same time as the database" is really tenable.I'm -1 for exactly this reason.
In any case, if you insist on using the same code with pre-12 and
post-12 releases, this should be achievable (at least in most cases) by
using the "offset 0" trick, shouldn't it?
That embeds a temporary hack in the application code indefinitely.
If only we had Guido's (Python) time machine. We could go back and start
accepting "AS MATERIALIZED" as noise words starting from version 7 or
something.
On 10/19/19 11:10 AM, Isaac Morland wrote:
On Sat, 19 Oct 2019 at 10:53, Andrew Dunstan
<andrew.dunstan@2ndquadrant.com
<mailto:andrew.dunstan@2ndquadrant.com>> wrote:In general, I'm not opposed to accepting and ignoring the
MATERIALIZED
syntax (assuming we'd only accept AS MATERIALIZED, but not the
negative
variant).
FWIW I'm not sure the "we don't want to upgrade application code
at the
same time as the database" is really tenable.
I'm -1 for exactly this reason.
In any case, if you insist on using the same code with pre-12 and
post-12 releases, this should be achievable (at least in most
cases) by
using the "offset 0" trick, shouldn't it?That embeds a temporary hack in the application code indefinitely.
If only we had Guido's (Python) time machine. We could go back and
start accepting "AS MATERIALIZED" as noise words starting from version
7 or something.
let me know when that's materialized :-)
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings,
* Isaac Morland (isaac.morland@gmail.com) wrote:
That embeds a temporary hack in the application code indefinitely.
... one could argue the same about having to say AS MATERIALIZED.
Thanks,
Stephen
On Sat, 19 Oct 2019 at 13:36, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Isaac Morland (isaac.morland@gmail.com) wrote:
That embeds a temporary hack in the application code indefinitely.
... one could argue the same about having to say AS MATERIALIZED.
I think OFFSET 0 is a hack - the fact that it forces an optimization fence
feels like an oddity. By contrast, saying AS MATERIALIZED means materialize
the CTE. I suppose you could argue that the need to be able to request that
is a temporary hack until query optimization improves further, but I don't
think that's realistic. For the foreseeable future we will need to be able
to tell the query planner that it is wrong. I mean, in principle the DB
should figure out for itself which (non-constraint) indexes are needed. But
I don't see any proposals to attempt to implement that.
Side note: I am frequently disappointed by the query planner. I have had
many situations in which a nice simple strategy of looking up some tiny
number of records in an index and then following more indices to get joined
records would have worked, but instead it did a linear scan through the
wrong starting table. So I'm very glad the AS MATERIALIZED now exists for
when it's needed. On the other hand, I recognize that the reason I'm
disappointed is because my expectations are so high: often I've written a
query that joins several views together, meaning that under the covers it's
really joining maybe 20 tables, and it comes back with the answer
instantly. So in effect the query planner is just good enough to make me
expect it to be even better than it is.
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
+0.5
In general, I'm not opposed to accepting and ignoring the MATERIALIZED
syntax (assuming we'd only accept AS MATERIALIZED, but not the negative
variant).
FWIW, I'm +0.1 or thereabouts. I'd vote -1 if the patch required
introducing a new lexer keyword (even an unreserved one); but it
doesn't. So it's hard to argue that there's much downside.
(If we do this, I wonder if we should make the back branches parse
NOT MATERIALIZED as well, and then throw a "not implemented" error
rather than the unhelpful syntax error you'd get today.)
(Also, if we do this, I think we should patch all supported branches.
The OP's proposal to patch back to 10 has no foundation that I can see.)
FWIW I'm not sure the "we don't want to upgrade application code at the
same time as the database" is really tenable. I don't think we really
promise that anywhere, and adding the AS MATERIALIZED seems quite
mechanical. I think we could find cases where we caused worse breaks
between major versions.
That's certainly true, which is why I'm only lukewarm about the proposal.
One disadvantage is that this will increase confusion for users, who'll
get used to the behavior on 12, and then they'll get confused on older
releases (e.g. if you don't specify AS MATERIALIZED you'd expect the CTE
to get inlined, but that won't happen).
I'm less concerned about that aspect than about the aspect of (for
instance) 11.6 and up allowing a syntax that 11.0-11.5 don't. People
are likely to write code relying on this and then be surprised when
it doesn't work on a slightly older server. Still, is that so much
different from cases where we fix a bug that prevented some statement
from working?
regards, tom lane
+1 for the configuration option. Otherwise, migration is a nightmare -- so
many CTEs were written specifically to use the "optimization fence"
behavior. The lack of such configuration options is now a "migration fence".
On Sat, Oct 19, 2019 at 2:49 AM Colin Watson <cjwatson@canonical.com> wrote:
Show quoted text
On Sat, Oct 19, 2019 at 05:01:04AM +0100, Andrew Gierth wrote:
"Michael" == Michael Paquier <michael@paquier.xyz> writes:
On Fri, Oct 18, 2019 at 02:21:30PM +0100, Colin Watson wrote:
However, an alternative would be to backport the new syntax to some
earlier versions. "WITH ... AS MATERIALIZED" can easily just be
synonymous with "WITH ... AS" in versions prior to 12; there's no
need to support "NOT MATERIALIZED" since that's explicitly
requesting the new query-folding feature that only exists in 12.
Would something like the attached patch against REL_11_STABLE be
acceptable? I'd like to backpatch it at least as far as PostgreSQL
10.Michael> I am afraid that new features don't gain a backpatch. This is
Michael> a project policy. Back-branches should just include bug fixes.I do think an argument can be made for making an exception in this
particular case. This wouldn't be backpatching a feature, just accepting
and ignoring some of the new syntax to make upgrading easier.Right, this is my position too. I'm explicitly not asking for
backpatching of the CTE-inlining feature, just trying to cope with the
fact that we now have to spell some particular queries differently to
retain the performance characteristics we need for them.I suppose an alternative would be to add a configuration option to 12
that allows disabling inlining of CTEs cluster-wide: we could then
upgrade to 12 with inlining disabled, add MATERIALIZED to the relevant
queries, and then re-enable inlining. But I like that less because it
would end up leaving cruft around in PostgreSQL's configuration code
somewhat indefinitely for the sake of an edge case in upgrading to a
particular version.--
Colin Watson [cjwatson@canonical.com]
On Sat, Oct 19, 2019 at 8:11 AM Isaac Morland <isaac.morland@gmail.com>
wrote:
That embeds a temporary hack in the application code indefinitely.
Or postpone the migration indefinitely. I saw so many times how migration
in large companies was postponed because of similar "small" issues -- when
the code base is large, it's easier for managers to say something like "no,
we will better live without cool new features for a couple of more years
than put our systems at risk due to lack of testing".
Nobody invented an excellent way to test production workloads in
non-production environments yet. I know it very well because I'm also
working in this direction for a couple of years. So this feature (a great
one) seems to me as a major roadblock for DBAs and developers who would
like to migrate to 12 and have better performance and all the new features.
Ironically, including this one for the new or the updated code!
If you need to patch all your code adding "AS MATERIALIZED" (and you will
need it if you want to minimize risks of performance degradation after the
upgrade), then it's also a temporary hack. But much, much more expensive in
implementation in large projects, and sometimes even not possible.
I do think that the lack of this configuration option will prevent some
projects from migration for a long time.
Correct me if I'm wrong. Maybe somebody already thought about migration
options here and have good answers? What is the best way to upgrade if you
have dozens of multi-terabyte databases, a lot of legacy code and workloads
where 1 minute of downtime or even performance degradation would cost a lot
of money so it is not acceptable? What would be the good answers here?
On Sat, Oct 19, 2019 at 10:53 AM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
FWIW I'm not sure the "we don't want to upgrade application code at the
same time as the database" is really tenable.I'm -1 for exactly this reason.
-1 from me, too, also for this reason. I bet if we started looking
we'd find many changes every year that we could justify partially or
completely back-porting on similar grounds, and if we start doing
that, we'll certainly screw it up sometimes, turning what should have
been a smooth minor release upgrade process into one that breaks. And
we'll still not satisfy the people who don't want to upgrade the
application and the database at the same time, because there will
always be changes where nothing like this is remotely reasonable.
Also, we'll then have a lot more behavior differences between minor
releases, which sounds like a bad thing to me. In particular, nobody
will be happy if a pg_dump taken on version X.Y fails to restore on
version X.Z. But even apart from that, it just doesn't sound like a
good idea to have the user-facing behavior vary significantly across
minor releases...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sat, Oct 19, 2019 at 02:35:42PM -0400, Isaac Morland wrote:
On Sat, 19 Oct 2019 at 13:36, Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Isaac Morland (isaac.morland@gmail.com) wrote:
That embeds a temporary hack in the application code indefinitely.
... one could argue the same about having to say AS MATERIALIZED.
�
I think OFFSET 0 is a hack - the fact that it forces an optimization fence
feels like an oddity. By contrast, saying AS MATERIALIZED means materialize the
CTE. I suppose you could argue that the need to be able to request that is a
temporary hack until query optimization improves further, but I don't think
that's realistic. For the foreseeable future we will need to be able to tell
the query planner that it is wrong. I mean, in principle the DB should figure
out for itself which (non-constraint) indexes are needed. But I don't see any
proposals to attempt to implement that.Side note: I am frequently disappointed by the query planner. I have had many
situations in which a nice simple strategy of looking up some tiny number of
records in an index and then following more indices to get joined records would
have worked, but instead it did a linear scan through the wrong starting table.
So I'm very glad the AS MATERIALIZED now exists for when it's needed. On the
other hand, I recognize that the reason I'm disappointed is because my
expectations are so high: often I've written a query that joins several views
together, meaning that under the covers it's really joining maybe 20 tables,
and it comes back with the answer instantly. So in effect the query planner is
just good enough to make me expect it to be even better than it is.
Well, since geqo_threshold = 12 is the default, for a 20-table join, you
are using genetic query optimization (GEQO) in PG 12 without
MATERIALIZED:
https://www.postgresql.org/docs/12/geqo.html
and GEQO assumes it would take too long to fully test all optimization
possibilities, so it randomly checks just some of them. Therefore, it
is no surprise you are disappointed in its output.
In a way, when you are using materialized CTEs, you are doing the
optimization yourself, in your SQL code, and then the table join count
drops low enough that GEQO is not used and Postgres fully tests all
optimization possibilities. This is behavior I had never considered ---
the idea that the user is partly replacing the optimizer, and that using
materialized CTEs prevents the problems that require the use of GEQO.
I guess my big take-away is that not only can MATERIALIZE change the
quality of query plans but it can also improve the quality of query
plans if it prevents GEQO from being used.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On Sat, Oct 19, 2019 at 11:52 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
I wonder if an extension could do something like that, though. It can
install a hook after parse analysis, so I guess it could walk the CTEs
and mark them as materialized.
I wonder if the existing pg_hint_plan extension could be extended to
do that using something like /*+ MATERIALIZE */. That'd presumably be
ignored when not installed/not understood etc. I've never used
pg_hint_plan myself and don't know how or how well it works, but it
look like it supports Oracle-style hints hidden in comments like /*+
HashJoin(a b) */ rather than SQL Server-style hints that are in the
SQL grammar itself like SELECT ... FROM a HASH JOIN b.