Text-any concatenation volatility acting as optimization barrier

Started by Marti Raudseppalmost 14 years ago11 messages
#1Marti Raudsepp
marti@juffo.org

Hi list,

Andrew Dunstan reported an awkward-seeming case on IRC where shifting
around a concatenation expression in a view made the planner choose a
good or a bad execution plan.

Simplified, it boils down to this:

db=# create table foo(i int);
db=# explain verbose select i from (select i, i::text || 'x' as asd
from foo) as subq;
Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4)
Output: foo.i

db=# explain verbose select i from (select i, i || 'x'::text as asd
from foo) as subq;
Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4)
Output: subq.i
-> Seq Scan on public.foo (cost=0.00..52.00 rows=2400 width=4)
Output: foo.i, ((foo.i)::text || 'x'::text)

Case #1 uses the normal textcat(text, text) operator by automatically
coercing 'x' as text.
However, case #2 uses the anytextcat(anynonarray, text), which is
marked as volatile thus acts as an optimization barrier. Later, the
anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has
no trace of what happened.

Is this something we can, or want, to fix?

One way would be doing preprocess_expression() before
pull_up_subqueries() so function inlining happens earlier, but I can't
imagine what unintended consequences that might have.

Another option would be creating explicit immutable text || foo
operators for common types, but that sounds pretty hacky.

Regards,
Marti

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Marti Raudsepp (#1)
Re: Text-any concatenation volatility acting as optimization barrier

On 02/07/2012 03:18 PM, Marti Raudsepp wrote:

Hi list,

Andrew Dunstan reported an awkward-seeming case on IRC where shifting
around a concatenation expression in a view made the planner choose a
good or a bad execution plan.

Simplified, it boils down to this:

db=# create table foo(i int);
db=# explain verbose select i from (select i, i::text || 'x' as asd
from foo) as subq;
Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4)
Output: foo.i

db=# explain verbose select i from (select i, i || 'x'::text as asd
from foo) as subq;
Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4)
Output: subq.i
-> Seq Scan on public.foo (cost=0.00..52.00 rows=2400 width=4)
Output: foo.i, ((foo.i)::text || 'x'::text)

Case #1 uses the normal textcat(text, text) operator by automatically
coercing 'x' as text.
However, case #2 uses the anytextcat(anynonarray, text), which is
marked as volatile thus acts as an optimization barrier. Later, the
anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has
no trace of what happened.

Is this something we can, or want, to fix?

One way would be doing preprocess_expression() before
pull_up_subqueries() so function inlining happens earlier, but I can't
imagine what unintended consequences that might have.

Another option would be creating explicit immutable text || foo
operators for common types, but that sounds pretty hacky.

It gets worse if you replace the expression with a call to a (non-sql)
function returning text, which was in fact the original use case. Then
you're pretty much hosed.

cheers

andrew

#3Marti Raudsepp
marti@juffo.org
In reply to: Andrew Dunstan (#2)
Re: Text-any concatenation volatility acting as optimization barrier

On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan <andrew@dunslane.net> wrote:

It gets worse if you replace the expression with a call to a (non-sql)
function returning text, which was in fact the original use case. Then
you're pretty  much hosed.

Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE
should solve it -- did you try that?

Regards,
Marti

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Marti Raudsepp (#3)
Re: Text-any concatenation volatility acting as optimization barrier

On 02/07/2012 03:36 PM, Marti Raudsepp wrote:

On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan<andrew@dunslane.net> wrote:

It gets worse if you replace the expression with a call to a (non-sql)
function returning text, which was in fact the original use case. Then
you're pretty much hosed.

Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE
should solve it -- did you try that?

Hmm, your're right. I could have sworn I tried that. That still leaves
the oddity you reported, though.

cheers

andrew

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marti Raudsepp (#1)
Re: Text-any concatenation volatility acting as optimization barrier

Marti Raudsepp <marti@juffo.org> writes:

Case #1 uses the normal textcat(text, text) operator by automatically
coercing 'x' as text.
However, case #2 uses the anytextcat(anynonarray, text), which is
marked as volatile thus acts as an optimization barrier.

Hmm ... since those operators were invented (in 8.3), we have adopted a
policy that I/O functions are presumed to be no worse than stable:
http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php
ISTM that would justify relabeling anytextcat/textanycat as stable,
which should fix this.

One way would be doing preprocess_expression() before
pull_up_subqueries() so function inlining happens earlier, but I can't
imagine what unintended consequences that might have.

I'm pretty sure that would be a bad idea.

regards, tom lane

#6Marti Raudsepp
marti@juffo.org
In reply to: Tom Lane (#5)
1 attachment(s)
Re: Text-any concatenation volatility acting as optimization barrier

On Wed, Feb 8, 2012 at 06:21, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marti Raudsepp <marti@juffo.org> writes:

Case #1 uses the normal textcat(text, text) operator by automatically
coercing 'x' as text.
However, case #2 uses the anytextcat(anynonarray, text), which is
marked as volatile thus acts as an optimization barrier.

Hmm ... since those operators were invented (in 8.3), we have adopted a
policy that I/O functions are presumed to be no worse than stable:
http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php
ISTM that would justify relabeling anytextcat/textanycat as stable,
which should fix this.

Yes, we should definitely take advantage of that.

I scanned through all of pg_proc, there are 4 functions like this that
can be changed: textanycat, anytextcat, quote_literal and
quote_nullable. All of these have SQL wrappers to cast their argument
to ::text.

quote_literal | select pg_catalog.quote_literal($1::pg_catalog.text)
quote_nullable | select pg_catalog.quote_nullable($1::pg_catalog.text)
textanycat | select $1 || $2::pg_catalog.text
anytextcat | select $1::pg_catalog.text || $2

Patch attached (in git am format). Passes all regression tests (except
'json' which fails on my machine even on git master).

No documentation changes necessary AFAICT.

Regards,
Marti

Attachments:

0001-Mark-textanycat-quote_literal-quote_nullable-functio.patchtext/x-patch; charset=US-ASCII; name=0001-Mark-textanycat-quote_literal-quote_nullable-functio.patchDownload
From e1943868d21316ff9126283efec54146c14e00fc Mon Sep 17 00:00:00 2001
From: Marti Raudsepp <marti@juffo.org>
Date: Wed, 8 Feb 2012 11:26:03 +0200
Subject: [PATCH] Mark textanycat/quote_literal/quote_nullable functions as
 stable

These are SQL functions that rely on immutable functions/operators, but
were previously marked volatile, since they relied on unknown ::text
casts. As of commit aab353a60b95aadc00f81da0c6d99bde696c4b75, all text
I/O functions are guaranteed to be stable or immutable.

Author: Marti Raudsepp <marti@juffo.org>
---
 src/include/catalog/catversion.h |    2 +-
 src/include/catalog/pg_proc.h    |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ae4e5f5..1d92ee3 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	201202072
+#define CATALOG_VERSION_NO	201202081
 
 #endif
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index f8d01fb..d4206f1 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2271,11 +2271,11 @@ DATA(insert OID =  1282 ( quote_ident	   PGNSP PGUID 12 1 0 0 0 f f f t f i 1 0
 DESCR("quote an identifier for usage in a querystring");
 DATA(insert OID =  1283 ( quote_literal    PGNSP PGUID 12 1 0 0 0 f f f t f i 1 0 25 "25" _null_ _null_ _null_ _null_ quote_literal _null_ _null_ _null_ ));
 DESCR("quote a literal for usage in a querystring");
-DATA(insert OID =  1285 ( quote_literal    PGNSP PGUID 14 1 0 0 0 f f f t f v 1 0 25 "2283" _null_ _null_ _null_ _null_ "select pg_catalog.quote_literal($1::pg_catalog.text)" _null_ _null_ _null_ ));
+DATA(insert OID =  1285 ( quote_literal    PGNSP PGUID 14 1 0 0 0 f f f t f s 1 0 25 "2283" _null_ _null_ _null_ _null_ "select pg_catalog.quote_literal($1::pg_catalog.text)" _null_ _null_ _null_ ));
 DESCR("quote a data value for usage in a querystring");
 DATA(insert OID =  1289 ( quote_nullable   PGNSP PGUID 12 1 0 0 0 f f f f f i 1 0 25 "25" _null_ _null_ _null_ _null_ quote_nullable _null_ _null_ _null_ ));
 DESCR("quote a possibly-null literal for usage in a querystring");
-DATA(insert OID =  1290 ( quote_nullable   PGNSP PGUID 14 1 0 0 0 f f f f f v 1 0 25 "2283" _null_ _null_ _null_ _null_ "select pg_catalog.quote_nullable($1::pg_catalog.text)" _null_ _null_ _null_ ));
+DATA(insert OID =  1290 ( quote_nullable   PGNSP PGUID 14 1 0 0 0 f f f f f s 1 0 25 "2283" _null_ _null_ _null_ _null_ "select pg_catalog.quote_nullable($1::pg_catalog.text)" _null_ _null_ _null_ ));
 DESCR("quote a possibly-null data value for usage in a querystring");
 
 DATA(insert OID = 1798 (  oidin			   PGNSP PGUID 12 1 0 0 0 f f f t f i 1 0 26 "2275" _null_ _null_ _null_ _null_ oidin _null_ _null_ _null_ ));
@@ -2747,8 +2747,8 @@ DESCR("adjust time precision");
 DATA(insert OID = 1969 (  timetz		   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 1266 "1266 23" _null_ _null_ _null_ _null_ timetz_scale _null_ _null_ _null_ ));
 DESCR("adjust time with time zone precision");
 
-DATA(insert OID = 2003 (  textanycat	   PGNSP PGUID 14 1 0 0 0 f f f t f v 2 0 25 "25 2776" _null_ _null_ _null_ _null_ "select $1 || $2::pg_catalog.text" _null_ _null_ _null_ ));
-DATA(insert OID = 2004 (  anytextcat	   PGNSP PGUID 14 1 0 0 0 f f f t f v 2 0 25 "2776 25" _null_ _null_ _null_ _null_ "select $1::pg_catalog.text || $2" _null_ _null_ _null_ ));
+DATA(insert OID = 2003 (  textanycat	   PGNSP PGUID 14 1 0 0 0 f f f t f s 2 0 25 "25 2776" _null_ _null_ _null_ _null_ "select $1 || $2::pg_catalog.text" _null_ _null_ _null_ ));
+DATA(insert OID = 2004 (  anytextcat	   PGNSP PGUID 14 1 0 0 0 f f f t f s 2 0 25 "2776 25" _null_ _null_ _null_ _null_ "select $1::pg_catalog.text || $2" _null_ _null_ _null_ ));
 
 DATA(insert OID = 2005 (  bytealike		   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 16 "17 17" _null_ _null_ _null_ _null_ bytealike _null_ _null_ _null_ ));
 DATA(insert OID = 2006 (  byteanlike	   PGNSP PGUID 12 1 0 0 0 f f f t f i 2 0 16 "17 17" _null_ _null_ _null_ _null_ byteanlike _null_ _null_ _null_ ));
-- 
1.7.9

#7Robert Haas
robertmhaas@gmail.com
In reply to: Marti Raudsepp (#6)
Re: Text-any concatenation volatility acting as optimization barrier

On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti@juffo.org> wrote:

Patch attached (in git am format). Passes all regression tests (except
'json' which fails on my machine even on git master).

Can you provide the diffs from the json test on your machine? I don't
see any build-farm failures off-hand...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: Text-any concatenation volatility acting as optimization barrier

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti@juffo.org> wrote:

Patch attached (in git am format). Passes all regression tests (except
'json' which fails on my machine even on git master).

Can you provide the diffs from the json test on your machine? I don't
see any build-farm failures off-hand...

I'm seeing diffs too after applying Marti's patch: instead of "z", "b",
etc, the field labels in the json values look like "f1", "f2", etc in
the output of queries such as

SELECT row_to_json(q)
FROM (SELECT $$a$$ || x AS b,
y AS c,
ARRAY[ROW(x.*,ARRAY[1,2,3]),
ROW(y.*,ARRAY[4,5,6])] AS z
FROM generate_series(1,2) x,
generate_series(4,5) y) q;

I believe what is happening is that now that the planner can flatten the
sub-select, the RowExprs are getting expanded differently, and that ties
into the "when do we lose column names" business that Andrew has been
on about.

However, I was not seeing that before applying the patch, so maybe Marti
has another issue too.

I am going to go ahead and commit the patch with the altered json
results, because IMO it is mere accident that these regression cases
were coming out with "nice" field labels anyway. When and if Andrew
gets the RowExpr cases fixed properly, that will show up as these cases
going back to nicer-looking output.

regards, tom lane

#9Marti Raudsepp
marti@juffo.org
In reply to: Tom Lane (#8)
Re: Text-any concatenation volatility acting as optimization barrier

On Wed, Feb 8, 2012 at 18:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti@juffo.org> wrote:

Patch attached (in git am format). Passes all regression tests (except
'json' which fails on my machine even on git master).

Can you provide the diffs from the json test on your machine?  I don't
see any build-farm failures off-hand...

I'm seeing diffs too after applying Marti's patch: instead of "z", "b",
etc, the field labels in the json values look like "f1", "f2", etc in
the output of queries such as

Sorry, my bad. I guess I got the two versions (patched and unpatched)
mixed up. Indeed this difference disappears when I remove my patch.
I'll have to be more careful when checking regression diffs in the
future. :)

Regards,
Marti

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: Text-any concatenation volatility acting as optimization barrier

On 02/08/2012 11:20 AM, Tom Lane wrote:

I am going to go ahead and commit the patch with the altered json
results, because IMO it is mere accident that these regression cases
were coming out with "nice" field labels anyway. When and if Andrew
gets the RowExpr cases fixed properly, that will show up as these
cases going back to nicer-looking output.

I take it this is your way of making me hurry up with that work :-)

cheers

andrew

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#10)
Re: Text-any concatenation volatility acting as optimization barrier

Andrew Dunstan <andrew@dunslane.net> writes:

On 02/08/2012 11:20 AM, Tom Lane wrote:

I am going to go ahead and commit the patch with the altered json
results, because IMO it is mere accident that these regression cases
were coming out with "nice" field labels anyway. When and if Andrew
gets the RowExpr cases fixed properly, that will show up as these
cases going back to nicer-looking output.

I take it this is your way of making me hurry up with that work :-)

Well, right now the behavior is more consistent than it was before;
we would surely have gotten questions about it as it was. Whether
you find time to improve it or not isn't my concern at the moment.

BTW, the order of the items in the json output doesn't match the column
order of the sub-select, with or without the patch. This seems ... odd.
Is it intentional?

regards, tom lane