BUG #18097: Immutable expression not allowed in generated at

Started by PG Bug reporting formover 2 years ago16 messages

noreply@postgresql.org

over 2 years ago

The following bug has been logged on the website:

Bug reference: 18097
Logged by: Jim Keener
Email address: jim@jimkeener.com
PostgreSQL version: 15.0
Operating system: Linux
Description:

Given this table:

CREATE TABLE test_table (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
created_at timestamptz NOT NULL DEFAULT now()
);

The following work:

* alter table test_table add created_local_y text GENERATED ALWAYS AS
(EXTRACT(isoyear FROM created_at AT TIME ZONE 'America/New_York')) STORED;

* alter table test_table add created_local_w text GENERATED ALWAYS AS
(EXTRACT(week FROM created_at AT TIME ZONE 'America/New_York')) STORED;

* alter table test_table add created_local text GENERATED ALWAYS AS
(EXTRACT(isoyear FROM created_at AT TIME ZONE 'America/New_York')::text ||
'|' || EXTRACT(week FROM created_at AT TIME ZONE 'America/New_York')::text)
STORED;

* CREATE INDEX ON test_table ((EXTRACT(isoyear FROM created_at AT TIME ZONE
'America/New_York') || '|' || EXTRACT(week FROM created_at AT TIME ZONE
'America/New_York')));

However, the following DOES NOT work with an error of (ERROR: generation
expression is not immutable):

* alter table test_table add created_local text GENERATED ALWAYS AS
(EXTRACT(isoyear FROM created_at AT TIME ZONE 'America/New_York') || '|' ||
EXTRACT(week FROM created_at AT TIME ZONE 'America/New_York')) STORED;

Given that casting shouldn't "increase" the immutability of an expression,
and expression indexes need also be immutable afaik, I think that there is a
bug somewhere here?

Thank you,
Jim

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #18097: Immutable expression not allowed in generated at

On Thursday, September 7, 2023, PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 18097
Logged by: Jim Keener
Email address: jim@jimkeener.com
PostgreSQL version: 15.0
Operating system: Linux
Description:

However, the following DOES NOT work with an error of (ERROR: generation
expression is not immutable):

* alter table test_table add created_local text GENERATED ALWAYS AS
(EXTRACT(isoyear FROM created_at AT TIME ZONE 'America/New_York') || '|' ||
EXTRACT(week FROM created_at AT TIME ZONE 'America/New_York')) STORED;

Given that casting shouldn't "increase" the immutability of an expression,
and expression indexes need also be immutable afaik, I think that there is
a
bug somewhere here?

Casting very much can be a non-immutable activity, dates being the prime
example, and I presume going from numeric to text is indeed defined to be
stable hence the error. This is probably due to needing to consult locale
for deciding how to represent the decimal places divider. This is one of
the few places, assuming you write the function to set an environment
fixing locale to some know value like you did with the time zones, where
creating an immutable function around a stable expression makes sense.

David J.

James Keener

jim@jimkeener.com

over 2 years ago

In reply to: David G. Johnston (#2)

Re: BUG #18097: Immutable expression not allowed in generated at

The issue here, though, is that it works as an expression for an index, but doesn't work as a generated column unless I explicitly cast it to text (which should have happened implicitly anyways). (The cast is turning a non-immutable expression to be immutable.)

I'm also able to make generated fields for the individual function calls, but concatenation doesn't work without the explicit cast.

Jim

On September 8, 2023 11:11:42 AM EDT, "David G. Johnston" <david.g.johnston@gmail.com> wrote:

On Thursday, September 7, 2023, PG Bug reporting form <
noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 18097
Logged by: Jim Keener
Email address: jim@jimkeener.com
PostgreSQL version: 15.0
Operating system: Linux
Description:

However, the following DOES NOT work with an error of (ERROR: generation
expression is not immutable):

* alter table test_table add created_local text GENERATED ALWAYS AS
(EXTRACT(isoyear FROM created_at AT TIME ZONE 'America/New_York') || '|' ||
EXTRACT(week FROM created_at AT TIME ZONE 'America/New_York')) STORED;

Given that casting shouldn't "increase" the immutability of an expression,
and expression indexes need also be immutable afaik, I think that there is
a
bug somewhere here?

Casting very much can be a non-immutable activity, dates being the prime
example, and I presume going from numeric to text is indeed defined to be
stable hence the error. This is probably due to needing to consult locale
for deciding how to represent the decimal places divider. This is one of
the few places, assuming you write the function to set an environment
fixing locale to some know value like you did with the time zones, where
creating an immutable function around a stable expression makes sense.

David J.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: James Keener (#3)

Re: BUG #18097: Immutable expression not allowed in generated at

James Keener <jim@jimkeener.com> writes:

The issue here, though, is that it works as an expression for an index, but doesn't work as a generated column unless I explicitly cast it to text (which should have happened implicitly anyways). (The cast is turning a non-immutable expression to be immutable.)

The reason that the generated expression fails is that (if you don't
explicitly cast to text) then it relies on anytextcat(anynonarray,text),
which is only stable, and can't be marked any more restrictively because
depending on the type of the non-text argument the corresponding output
function might not be immutable.

But then why doesn't the equivalent index definition spit up?
I found the answer in indexcmds.c's CheckMutability():

/*
* First run the expression through the planner. This has a couple of
* important consequences. First, function default arguments will get
* inserted, which may affect volatility (consider "default now()").
* Second, inline-able functions will get inlined, which may allow us to
* conclude that the function is really less volatile than it's marked. As
* an example, polymorphic functions must be marked with the most volatile
* behavior that they have for any input type, but once we inline the
* function we may be able to conclude that it's not so volatile for the
* particular input type we're dealing with.
*
* We assume here that expression_planner() won't scribble on its input.
*/
expr = expression_planner(expr);

/* Now we can search for non-immutable functions */
return contain_mutable_functions((Node *) expr);

Applying expression_planner() solves the problem because it inlines
anytextcat(anynonarray,text), resolving that the required cast is
numeric->text which is immutable. The code for generated expressions
omits that step and arrives at the less desirable answer. I wonder
where else we have the same issue.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Tom Lane (#4)

3 attachment(s)

Re: BUG #18097: Immutable expression not allowed in generated at

[ moving to pgsql-hackers ]

I wrote:

Applying expression_planner() solves the problem because it inlines
anytextcat(anynonarray,text), resolving that the required cast is
numeric->text which is immutable. The code for generated expressions
omits that step and arrives at the less desirable answer. I wonder
where else we have the same issue.

After digging around, I could only find one other place where
outside-the-planner code was doing this wrong: AddRelationNewConstraints
can come to the wrong conclusion about whether it's safe to use
missingMode. So here's a patch series to resolve this. I split it
into three parts mostly because 0002 will only go back to v12 where
we added GENERATED, but the missingMode bug exists in v11.

There are a couple of points worth bikeshedding perhaps. I didn't
spend much thought on the wrapper functions' names, but it's surely
true that the semantic difference between contain_mutable_functions
and ContainMutableFunctions is quite un-apparent from those names.
Anybody got a better idea? It also seemed about fifty-fifty whether
to make the wrappers' argument types be Node * or Expr *. I stuck
with Expr * because that's what the predecessor code CheckMutability()
used, but that's not a very strong argument.

BTW, the test function in 0003 might look funny:

CREATE FUNCTION foolme(timestamptz DEFAULT clock_timestamp())
RETURNS timestamptz
IMMUTABLE AS 'select $1' LANGUAGE sql;

but AFAICS it's perfectly legit. The function itself is indeed immutable,
since it's only "select $1"; it's the default argument that's volatile.

I'll add this to the open CF 2023-11, but we really ought to
get it committed before that so we can ship these bug fixes in
November's releases.

regards, tom lane

Attachments:

v1-0001-Refactor-to-add-wrappers-for-contain_mutable-vola.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Refactor-to-add-wrappers-for-contain_mutable-vola.p; name*1=atchDownload

From b8387f19c846de52083121e36c0a40cad61d594b Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 9 Sep 2023 13:54:16 -0400
Subject: [PATCH v1 1/3] Refactor to add wrappers for
 contain_mutable/volatile_functions().

contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions).  Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.

This patch doesn't fix any actual bugs, just provide the infrastructure
for doing so.  The only behavioral change is that I re-ordered the
steps in ComputePartitionAttrs so that its checks for invalid column
references are done before applying expression_planner, rather than
after.  The previous coding would not complain if a partition expression
contains a disallowed column reference that gets optimized away by
constant folding, which seems to me to be a behavior we do not want.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
---
 src/backend/commands/copyfrom.c      |  6 ++-
 src/backend/commands/indexcmds.c     | 31 +-------------
 src/backend/commands/tablecmds.c     | 50 +++++++++++-----------
 src/backend/optimizer/util/clauses.c | 64 ++++++++++++++++++++++++++++
 src/include/optimizer/optimizer.h    |  2 +
 5 files changed, 99 insertions(+), 54 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 70871ed819..a7ec68c25d 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -887,6 +887,9 @@ CopyFrom(CopyFromState cstate)
 		 * Can't support multi-inserts if there are any volatile function
 		 * expressions in WHERE clause.  Similarly to the trigger case above,
 		 * such expressions may query the table we're inserting into.
+		 *
+		 * Note: the whereClause was already preprocessed in DoCopy(), so it's
+		 * okay to use contain_volatile_functions() directly.
 		 */
 		insertMethod = CIM_SINGLE;
 	}
@@ -1599,7 +1602,8 @@ BeginCopyFrom(ParseState *pstate,
 				 * known to be safe for use with the multi-insert
 				 * optimization. Hence we use this special case function
 				 * checker rather than the standard check for
-				 * contain_volatile_functions().
+				 * contain_volatile_functions().  Note also that we already
+				 * ran the expression through expression_planner().
 				 */
 				if (!volatile_defexprs)
 					volatile_defexprs = contain_volatile_functions_not_nextval((Node *) defexpr);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ab8b81b302..9f9e2612cf 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1785,33 +1785,6 @@ DefineIndex(Oid tableId,
 }
 
 
-/*
- * CheckMutability
- *		Test whether given expression is mutable
- */
-static bool
-CheckMutability(Expr *expr)
-{
-	/*
-	 * First run the expression through the planner.  This has a couple of
-	 * important consequences.  First, function default arguments will get
-	 * inserted, which may affect volatility (consider "default now()").
-	 * Second, inline-able functions will get inlined, which may allow us to
-	 * conclude that the function is really less volatile than it's marked. As
-	 * an example, polymorphic functions must be marked with the most volatile
-	 * behavior that they have for any input type, but once we inline the
-	 * function we may be able to conclude that it's not so volatile for the
-	 * particular input type we're dealing with.
-	 *
-	 * We assume here that expression_planner() won't scribble on its input.
-	 */
-	expr = expression_planner(expr);
-
-	/* Now we can search for non-immutable functions */
-	return contain_mutable_functions((Node *) expr);
-}
-
-
 /*
  * CheckPredicate
  *		Checks that the given partial-index predicate is valid.
@@ -1835,7 +1808,7 @@ CheckPredicate(Expr *predicate)
 	 * A predicate using mutable functions is probably wrong, for the same
 	 * reasons that we don't allow an index expression to use one.
 	 */
-	if (CheckMutability(predicate))
+	if (ContainMutableFunctions(predicate))
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 				 errmsg("functions in index predicate must be marked IMMUTABLE")));
@@ -1978,7 +1951,7 @@ ComputeIndexAttrs(IndexInfo *indexInfo,
 				 * same data every time, it's not clear what the index entries
 				 * mean at all.
 				 */
-				if (CheckMutability((Expr *) expr))
+				if (ContainMutableFunctions((Expr *) expr))
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 							 errmsg("functions in index expression must be marked IMMUTABLE")));
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8a2c671b66..dfba7abbe0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -18079,30 +18079,6 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu
 				partattrs[attn] = 0;	/* marks the column as expression */
 				*partexprs = lappend(*partexprs, expr);
 
-				/*
-				 * Try to simplify the expression before checking for
-				 * mutability.  The main practical value of doing it in this
-				 * order is that an inline-able SQL-language function will be
-				 * accepted if its expansion is immutable, whether or not the
-				 * function itself is marked immutable.
-				 *
-				 * Note that expression_planner does not change the passed in
-				 * expression destructively and we have already saved the
-				 * expression to be stored into the catalog above.
-				 */
-				expr = (Node *) expression_planner((Expr *) expr);
-
-				/*
-				 * Partition expression cannot contain mutable functions,
-				 * because a given row must always map to the same partition
-				 * as long as there is no change in the partition boundary
-				 * structure.
-				 */
-				if (contain_mutable_functions(expr))
-					ereport(ERROR,
-							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-							 errmsg("functions in partition key expression must be marked IMMUTABLE")));
-
 				/*
 				 * transformPartitionSpec() should have already rejected
 				 * subqueries, aggregates, window functions, and SRFs, based
@@ -18144,6 +18120,32 @@ ComputePartitionAttrs(ParseState *pstate, Relation rel, List *partParams, AttrNu
 								 parser_errposition(pstate, pelem->location)));
 				}
 
+				/*
+				 * Preprocess the expression before checking for mutability.
+				 * This is essential for the reasons described in
+				 * ContainMutableFunctions.  However, we call
+				 * expression_planner for ourselves rather than using that
+				 * function, because if constant-folding reduces the
+				 * expression to a constant, we'd like to know that so we can
+				 * complain below.
+				 *
+				 * Like ContainMutableFunctions, assume that
+				 * expression_planner won't scribble on its input, so this
+				 * doesn't affect the partexprs entry we saved above.
+				 */
+				expr = (Node *) expression_planner((Expr *) expr);
+
+				/*
+				 * Partition expressions cannot contain mutable functions,
+				 * because a given row must always map to the same partition
+				 * as long as there is no change in the partition boundary
+				 * structure.
+				 */
+				if (contain_mutable_functions(expr))
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+							 errmsg("functions in partition key expression must be marked IMMUTABLE")));
+
 				/*
 				 * While it is not exactly *wrong* for a partition expression
 				 * to be a constant, it seems better to reject such keys.
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index da258968b8..484d8e6b3f 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -360,6 +360,10 @@ contain_subplans_walker(Node *node, void *context)
  * mistakenly think that something like "WHERE random() < 0.5" can be treated
  * as a constant qualification.
  *
+ * This will give the right answer only for clauses that have been put
+ * through expression preprocessing.  Callers outside the planner typically
+ * should use ContainMutableFunctions() instead, for the reasons given there.
+ *
  * We will recursively look into Query nodes (i.e., SubLink sub-selects)
  * but not into SubPlans.  See comments for contain_volatile_functions().
  */
@@ -446,6 +450,34 @@ contain_mutable_functions_walker(Node *node, void *context)
 								  context);
 }
 
+/*
+ * ContainMutableFunctions
+ *	  Test whether given expression contains mutable functions.
+ *
+ * This is a wrapper for contain_mutable_functions() that is safe to use from
+ * outside the planner.  The difference is that it first runs the expression
+ * through expression_planner().  There are two key reasons why we need that:
+ *
+ * First, function default arguments will get inserted, which may affect
+ * volatility (consider "default now()").
+ *
+ * Second, inline-able functions will get inlined, which may allow us to
+ * conclude that the function is really less volatile than it's marked.
+ * As an example, polymorphic functions must be marked with the most volatile
+ * behavior that they have for any input type, but once we inline the
+ * function we may be able to conclude that it's not so volatile for the
+ * particular input type we're dealing with.
+ */
+bool
+ContainMutableFunctions(Expr *expr)
+{
+	/* We assume here that expression_planner() won't scribble on its input */
+	expr = expression_planner(expr);
+
+	/* Now we can search for non-immutable functions */
+	return contain_mutable_functions((Node *) expr);
+}
+
 
 /*****************************************************************************
  *		Check clauses for volatile functions
@@ -459,6 +491,10 @@ contain_mutable_functions_walker(Node *node, void *context)
  * volatile function) is found. This test prevents, for example,
  * invalid conversions of volatile expressions into indexscan quals.
  *
+ * This will give the right answer only for clauses that have been put
+ * through expression preprocessing.  Callers outside the planner typically
+ * should use ContainVolatileFunctions() instead, for the reasons given there.
+ *
  * We will recursively look into Query nodes (i.e., SubLink sub-selects)
  * but not into SubPlans.  This is a bit odd, but intentional.  If we are
  * looking at a SubLink, we are probably deciding whether a query tree
@@ -582,6 +618,34 @@ contain_volatile_functions_walker(Node *node, void *context)
 								  context);
 }
 
+/*
+ * ContainVolatileFunctions
+ *	  Test whether given expression contains volatile functions.
+ *
+ * This is a wrapper for contain_volatile_functions() that is safe to use from
+ * outside the planner.  The difference is that it first runs the expression
+ * through expression_planner().  There are two key reasons why we need that:
+ *
+ * First, function default arguments will get inserted, which may affect
+ * volatility (consider "default random()").
+ *
+ * Second, inline-able functions will get inlined, which may allow us to
+ * conclude that the function is really less volatile than it's marked.
+ * As an example, polymorphic functions must be marked with the most volatile
+ * behavior that they have for any input type, but once we inline the
+ * function we may be able to conclude that it's not so volatile for the
+ * particular input type we're dealing with.
+ */
+bool
+ContainVolatileFunctions(Expr *expr)
+{
+	/* We assume here that expression_planner() won't scribble on its input */
+	expr = expression_planner(expr);
+
+	/* Now we can search for volatile functions */
+	return contain_volatile_functions((Node *) expr);
+}
+
 /*
  * Special purpose version of contain_volatile_functions() for use in COPY:
  * ignore nextval(), but treat all other functions normally.
diff --git a/src/include/optimizer/optimizer.h b/src/include/optimizer/optimizer.h
index 514746c585..8254da373b 100644
--- a/src/include/optimizer/optimizer.h
+++ b/src/include/optimizer/optimizer.h
@@ -138,7 +138,9 @@ extern Expr *canonicalize_qual(Expr *qual, bool is_check);
 /* in util/clauses.c: */
 
 extern bool contain_mutable_functions(Node *clause);
+extern bool ContainMutableFunctions(Expr *expr);
 extern bool contain_volatile_functions(Node *clause);
+extern bool ContainVolatileFunctions(Expr *expr);
 extern bool contain_volatile_functions_not_nextval(Node *clause);
 
 extern Node *eval_const_expressions(PlannerInfo *root, Node *node);
-- 
2.39.3

v1-0002-Preprocess-column-GENERATED-expressions-before-ch.patchtext/x-diff; charset=us-ascii; name*0=v1-0002-Preprocess-column-GENERATED-expressions-before-ch.p; name*1=atchDownload

From 1e836fa15ff30c22de9b555196da3e8fbb5a2ad7 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 9 Sep 2023 14:40:15 -0400
Subject: [PATCH v1 2/3] Preprocess column GENERATED expressions before
 checking mutability.

The previous coding skipped the preprocessing step and thus had
(at least) two failure modes: it could fail to notice the use of a
volatile function default-argument expression, or it could reject a
polymorphic function that is actually immutable on the datatype of
interest.

Per bug #18097 from Jim Keener.  Back-patch to v12 where we added
generated columns.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
---
 src/backend/catalog/heap.c              | 4 +++-
 src/test/regress/expected/generated.out | 3 +++
 src/test/regress/sql/generated.sql      | 3 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index b42711f574..9b7544e9b7 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3087,9 +3087,11 @@ cookDefault(ParseState *pstate,
 
 	if (attgenerated)
 	{
+		/* Disallow refs to other generated columns */
 		check_nested_generated(pstate, expr);
 
-		if (contain_mutable_functions(expr))
+		/* Disallow mutable functions */
+		if (ContainMutableFunctions((Expr *) expr))
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("generation expression is not immutable")));
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index dc97ed3fe0..a2f38d0f50 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -61,6 +61,9 @@ LINE 1: ..._3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STO...
 -- generation expression must be immutable
 CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) STORED);
 ERROR:  generation expression is not immutable
+-- ... but be sure that the immutability test is accurate
+CREATE TABLE gtest2 (a int, b text GENERATED ALWAYS AS (a || ' sec') STORED);
+DROP TABLE gtest2;
 -- cannot have default/identity and generated
 CREATE TABLE gtest_err_5a (a int PRIMARY KEY, b int DEFAULT 5 GENERATED ALWAYS AS (a * 2) STORED);
 ERROR:  both default and generation expression specified for column "b" of table "gtest_err_5a"
diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql
index 8ddecf0cc3..298f6b3aa8 100644
--- a/src/test/regress/sql/generated.sql
+++ b/src/test/regress/sql/generated.sql
@@ -26,6 +26,9 @@ CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) S
 
 -- generation expression must be immutable
 CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) STORED);
+-- ... but be sure that the immutability test is accurate
+CREATE TABLE gtest2 (a int, b text GENERATED ALWAYS AS (a || ' sec') STORED);
+DROP TABLE gtest2;
 
 -- cannot have default/identity and generated
 CREATE TABLE gtest_err_5a (a int PRIMARY KEY, b int DEFAULT 5 GENERATED ALWAYS AS (a * 2) STORED);
-- 
2.39.3

v1-0003-Preprocess-column-DEFAULT-expressions-before-chec.patchtext/x-diff; charset=us-ascii; name*0=v1-0003-Preprocess-column-DEFAULT-expressions-before-chec.p; name*1=atchDownload

From 26cfe11c9487d53982c903d929d9686fecf04911 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sat, 9 Sep 2023 14:53:24 -0400
Subject: [PATCH v1 3/3] Preprocess column DEFAULT expressions before checking
 volatility.

The previous coding skipped the preprocessing step and thus had
(at least) two failure modes: it could fail to notice the use of a
volatile function default-argument expression, or it could reject a
polymorphic function that is actually immutable on the datatype of
interest.  The latter would just result in an unnecessary table
rewrite, but the former could allow the attmissingval functionality
to be used in a case where it should not be.

Noted while investigating bug #18097 from Jim Keener.  Back-patch
to all supported versions.

Discussion: https://postgr.es/m/18097-ebb179674f22932f@postgresql.org
---
 src/backend/catalog/heap.c                 |  2 +-
 src/test/regress/expected/fast_default.out | 18 ++++++++++++++++++
 src/test/regress/sql/fast_default.sql      | 11 +++++++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 9b7544e9b7..b44a4c96bb 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2365,7 +2365,7 @@ AddRelationNewConstraints(Relation rel,
 			continue;
 
 		/* If the DEFAULT is volatile we cannot use a missing value */
-		if (colDef->missingMode && contain_volatile_functions((Node *) expr))
+		if (colDef->missingMode && ContainVolatileFunctions((Expr *) expr))
 			colDef->missingMode = false;
 
 		defOid = StoreAttrDefault(rel, colDef->attnum, expr, is_internal,
diff --git a/src/test/regress/expected/fast_default.out b/src/test/regress/expected/fast_default.out
index 91f25717b5..59365dad96 100644
--- a/src/test/regress/expected/fast_default.out
+++ b/src/test/regress/expected/fast_default.out
@@ -272,7 +272,25 @@ SELECT comp();
  Rewritten
 (1 row)
 
+-- check that we notice insertion of a volatile default argument
+CREATE FUNCTION foolme(timestamptz DEFAULT clock_timestamp())
+  RETURNS timestamptz
+  IMMUTABLE AS 'select $1' LANGUAGE sql;
+ALTER TABLE T ADD COLUMN c3 timestamptz DEFAULT foolme();
+NOTICE:  rewriting table t for reason 2
+SELECT attname, atthasmissing, attmissingval FROM pg_attribute
+  WHERE attrelid = 't'::regclass AND attnum > 0
+  ORDER BY attnum;
+ attname | atthasmissing | attmissingval 
+---------+---------------+---------------
+ pk      | f             | 
+ c1      | f             | 
+ c2      | f             | 
+ c3      | f             | 
+(4 rows)
+
 DROP TABLE T;
+DROP FUNCTION foolme(timestamptz);
 -- Simple querie
 CREATE TABLE T (pk INT NOT NULL PRIMARY KEY);
 SELECT set('t');
diff --git a/src/test/regress/sql/fast_default.sql b/src/test/regress/sql/fast_default.sql
index 16a3b7ca51..dc9df78a35 100644
--- a/src/test/regress/sql/fast_default.sql
+++ b/src/test/regress/sql/fast_default.sql
@@ -256,7 +256,18 @@ ALTER TABLE T ADD COLUMN c2 TIMESTAMP DEFAULT clock_timestamp();
 
 SELECT comp();
 
+-- check that we notice insertion of a volatile default argument
+CREATE FUNCTION foolme(timestamptz DEFAULT clock_timestamp())
+  RETURNS timestamptz
+  IMMUTABLE AS 'select $1' LANGUAGE sql;
+ALTER TABLE T ADD COLUMN c3 timestamptz DEFAULT foolme();
+
+SELECT attname, atthasmissing, attmissingval FROM pg_attribute
+  WHERE attrelid = 't'::regclass AND attnum > 0
+  ORDER BY attnum;
+
 DROP TABLE T;
+DROP FUNCTION foolme(timestamptz);
 
 -- Simple querie
 CREATE TABLE T (pk INT NOT NULL PRIMARY KEY);
-- 
2.39.3

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Tom Lane (#5)

Re: BUG #18097: Immutable expression not allowed in generated at

I wrote:

After digging around, I could only find one other place where
outside-the-planner code was doing this wrong: AddRelationNewConstraints
can come to the wrong conclusion about whether it's safe to use
missingMode. So here's a patch series to resolve this.

Argh ... I forgot to mention that there's one other place that
this patch series doesn't address, which is that publicationcmds.c's
check_simple_rowfilter_expr() also checks for volatile functions
without having preprocessed the expression. I'm not entirely sure
that there's a reachable problem in the direction of underestimating
the expression's volatility, given that that logic rejects non-builtin
functions entirely: it seems unlikely that any immutable builtin
function would have a volatile default expression. But it definitely
seems possible that there would be a problem in the other direction,
leading to rejecting row filter expressions that we'd like to allow,
much as in bug #18097.

I'm not sure about a good way to resolve this. Simply applying
expression simplification ahead of the check would break the code's
intent of rejecting non-builtin operators, in the case where such
an operator resolves to an inline-able builtin function. I find
the entire design of check_simple_rowfilter_expr pretty questionable
anyway, and there are a bunch of dubious (and undocumented) individual
decisions like allowing whole-row Vars, using FirstNormalObjectId
rather than FirstUnpinnedObjectId as the cutoff, etc. So I'm not
planning to touch that code, but somebody who was paying attention
when it was written might want to take a second look.

regards, tom lane

Aleksander Alekseev

aleksander@timescale.com

about 2 years ago

In reply to: Tom Lane (#6)

Re: BUG #18097: Immutable expression not allowed in generated at

Hi,

I noticed that the patchset needs a review and decided to take a look.

There are a couple of points worth bikeshedding perhaps. I didn't
spend much thought on the wrapper functions' names, but it's surely
true that the semantic difference between contain_mutable_functions
and ContainMutableFunctions is quite un-apparent from those names.
Anybody got a better idea?

Oh no! We encountered one of the most difficult problems in computer
science [1]https://martinfowler.com/bliki/TwoHardThings.html.

ContainMutableFunctionsAfterPerformingPlannersTransformations() would
be somewhat long but semantically correct. It can be shortened to
ContainMutableFunctionsAfterTransformations() or perhaps
TransformedExprContainMutableFunctions(). Personally I don't mind long
names. This being said, ContainMutableFunctions() doesn't disgusts my
sense of beauty too much either. All in all any name will do IMO.
Naturally ContainVolatileFunctions() should be renamed consistently
with ContainMutableFunctions().

I couldn't find anything wrong with 0001..0003. The parches were
tested in several environments and passed `make check-world`. I
suggest merging them.

[1]: https://martinfowler.com/bliki/TwoHardThings.html

--
Best regards,
Aleksander Alekseev

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Aleksander Alekseev (#7)

Re: BUG #18097: Immutable expression not allowed in generated at

Aleksander Alekseev <aleksander@timescale.com> writes:

There are a couple of points worth bikeshedding perhaps. I didn't
spend much thought on the wrapper functions' names, but it's surely
true that the semantic difference between contain_mutable_functions
and ContainMutableFunctions is quite un-apparent from those names.
Anybody got a better idea?

Oh no! We encountered one of the most difficult problems in computer
science [1].

Indeed :-(. Looking at it again this morning, I'm thinking of
using "contain_mutable_functions_after_planning" --- what do you
think of that?

regards, tom lane

Aleksander Alekseev

aleksander@timescale.com

about 2 years ago

In reply to: Tom Lane (#8)

Re: BUG #18097: Immutable expression not allowed in generated at

Hi,

Oh no! We encountered one of the most difficult problems in computer
science [1].

Indeed :-(. Looking at it again this morning, I'm thinking of
using "contain_mutable_functions_after_planning" --- what do you
think of that?

It's better but creates an impression that the actual planning will be
involved. According to the comments for expression_planner():

```
* Currently, we disallow sublinks in standalone expressions, so there's no
* real "planning" involved here. (That might not always be true though.)
```

I'm not very well familiar with the part of code responsible for
planning, but I find this inconsistency confusing.

Since the code is written for people to be read and is read more often
than written personally I believe that longer and more descriptive
names are better. Something like
contain_mutable_functions_after_planner_transformations(). This being
said, in practice one should read the comments to learn about corner
cases, pre- and postconditions anyway, so maybe it's not that a big
deal. I think of contain_mutable_functions_after_transformations() as
a good compromise between the length and descriptiveness.

--
Best regards,
Aleksander Alekseev

#10

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Aleksander Alekseev (#9)

Re: BUG #18097: Immutable expression not allowed in generated at

Aleksander Alekseev <aleksander@timescale.com> writes:

Oh no! We encountered one of the most difficult problems in computer
science [1].

Indeed :-(. Looking at it again this morning, I'm thinking of
using "contain_mutable_functions_after_planning" --- what do you
think of that?

It's better but creates an impression that the actual planning will be
involved.

True, but from the perspective of the affected code, the question is
basically "did you call expression_planner() yet". So I like this
naming for that connection, whereas something based on "transformation"
doesn't really connect to anything in existing function names.

regards, tom lane

#11

Aleksander Alekseev

aleksander@timescale.com

about 2 years ago

In reply to: Tom Lane (#10)

Re: BUG #18097: Immutable expression not allowed in generated at

Hi,

True, but from the perspective of the affected code, the question is
basically "did you call expression_planner() yet". So I like this
naming for that connection, whereas something based on "transformation"
doesn't really connect to anything in existing function names.

Fair enough.

--
Best regards,
Aleksander Alekseev

#12

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: Aleksander Alekseev (#11)

Re: BUG #18097: Immutable expression not allowed in generated at

Aleksander Alekseev <aleksander@timescale.com> writes:

True, but from the perspective of the affected code, the question is
basically "did you call expression_planner() yet". So I like this
naming for that connection, whereas something based on "transformation"
doesn't really connect to anything in existing function names.

Fair enough.

Pushed like that, then. Thanks for reviewing!

regards, tom lane

#13

Adrien Nayrat

adrien.nayrat@anayrat.info

over 1 year ago

In reply to: Tom Lane (#12)

2 attachment(s)

Re: BUG #18097: Immutable expression not allowed in generated at

Hello,

A customer encountered an issue while restoring a dump of its database
after applying 15.6 minor version.

It seems due to this fix :

Fix function volatility checking for GENERATED and DEFAULT

expressions (Tom Lane)

These places could fail to detect insertion of a volatile function

default-argument expression, or decide that a polymorphic function is
volatile although it is actually immutable on the datatype of interest.
This could lead to improperly rejecting or accepting a GENERATED clause,
or to mistakenly applying the constant-default-value optimization in
ALTER TABLE ADD COLUMN.

Related commit 9057ddbef

I managed to reproduce it with a simple test case :

CREATE SCHEMA s1;
CREATE SCHEMA s2;

CREATE FUNCTION s2.f1 (c1 text) RETURNS text
LANGUAGE SQL IMMUTABLE
AS $$
SELECT c1
$$;

CREATE FUNCTION s2.f2 (c1 text) RETURNS text
LANGUAGE SQL IMMUTABLE
AS $$
SELECT s2.f1 (c1);
$$;

CREATE TABLE s1.t1 (c1 text, c2 text GENERATED ALWAYS AS (s2.f2 (c1))
STORED);

CREATE FUNCTION s1.f3 () RETURNS SETOF s1.t1
LANGUAGE sql
AS $$
SELECT *
FROM s1.t1
$$;

The resulting dump is attached.

You will notice that the table s1.t1 is created before the function
s2.f1. This is due to the function s1.f3 which returns a SETOF s1.t1

I understand Postgres has to create s1.t1 before s1.f3. Unfortunately,
the function s2.f1 is created later.

When we try to restore the dump, we have this error :
CREATE TABLE s1.t1 (
c1 text,
c2 text GENERATED ALWAYS AS (s2.f2(c1)) STORED
);
psql:b2.sql:61: ERROR: function s2.f1(text) does not exist
LINE 2: SELECT s2.f1(c1);
^
HINT: No function matches the given name and argument types. You might
need to add explicit type casts.
QUERY:
SELECT s2.f1(c1);

CONTEXT: SQL function "f2" during inlining

Thanks to Jordi Morillo, Alexis Lucazeau, Matthieu Honel for reporting this.

Regards,

--
Adrien NAYRAT

#14

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Adrien Nayrat (#13)

Re: BUG #18097: Immutable expression not allowed in generated at

Adrien Nayrat <adrien.nayrat@anayrat.info> writes:

A customer encountered an issue while restoring a dump of its database
after applying 15.6 minor version.
It seems due to this fix :

Fix function volatility checking for GENERATED and DEFAULT
expressions (Tom Lane)

I don't believe this example has anything to do with that.

CREATE SCHEMA s1;
CREATE SCHEMA s2;
CREATE FUNCTION s2.f1 (c1 text) RETURNS text
LANGUAGE SQL IMMUTABLE
AS $$
SELECT c1
$$;
CREATE FUNCTION s2.f2 (c1 text) RETURNS text
LANGUAGE SQL IMMUTABLE
AS $$
SELECT s2.f1 (c1);
$$;
CREATE TABLE s1.t1 (c1 text, c2 text GENERATED ALWAYS AS (s2.f2 (c1))
STORED);

The problem here is that to pg_dump, the body of s2.f2 is just an
opaque string, so it has no idea that that depends on s2.f1, and
it ends up picking a dump order that doesn't respect that
dependency.

It used to be that there wasn't much you could do about this
except choose object names that wouldn't cause the problem.
In v14 and up there's another way, at least for SQL-language
functions: you can write the function in SQL spec style.

CREATE FUNCTION s2.f2 (c1 text) RETURNS text
IMMUTABLE
BEGIN ATOMIC
SELECT s2.f1 (c1);
END;

Then the dependency is visible, both to the server and to pg_dump,
and you get a valid dump order.

regards, tom lane

#15

Adrien Nayrat

adrien.nayrat@anayrat.info

over 1 year ago

In reply to: Tom Lane (#14)

Re: BUG #18097: Immutable expression not allowed in generated at

On 9/25/24 4:41 PM, Tom Lane wrote:

Adrien Nayrat <adrien.nayrat@anayrat.info> writes:

A customer encountered an issue while restoring a dump of its database
after applying 15.6 minor version.
It seems due to this fix :

Fix function volatility checking for GENERATED and DEFAULT
expressions (Tom Lane)

I don't believe this example has anything to do with that.

I've done a git bisect between 15.5 and 15.6 and this commit trigger the
error.

CREATE SCHEMA s1;
CREATE SCHEMA s2;
CREATE FUNCTION s2.f1 (c1 text) RETURNS text
LANGUAGE SQL IMMUTABLE
AS $$
SELECT c1
$$;
CREATE FUNCTION s2.f2 (c1 text) RETURNS text
LANGUAGE SQL IMMUTABLE
AS $$
SELECT s2.f1 (c1);
$$;
CREATE TABLE s1.t1 (c1 text, c2 text GENERATED ALWAYS AS (s2.f2 (c1))
STORED);

The problem here is that to pg_dump, the body of s2.f2 is just an
opaque string, so it has no idea that that depends on s2.f1, and
it ends up picking a dump order that doesn't respect that
dependency.

It used to be that there wasn't much you could do about this
except choose object names that wouldn't cause the problem.

I see. So I understand we were lucky it worked before the commit added
the check of volatility in generated column ?

In v14 and up there's another way, at least for SQL-language
functions: you can write the function in SQL spec style.

CREATE FUNCTION s2.f2 (c1 text) RETURNS text
IMMUTABLE
BEGIN ATOMIC
SELECT s2.f1 (c1);
END;

Then the dependency is visible, both to the server and to pg_dump,
and you get a valid dump order.

Oh, thanks !

--
Adrien NAYRAT
https://pro.anayrat.info

#16

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Adrien Nayrat (#15)

Re: BUG #18097: Immutable expression not allowed in generated at

Adrien Nayrat <adrien.nayrat@anayrat.info> writes:

I see. So I understand we were lucky it worked before the commit added
the check of volatility in generated column ?

Pretty much. There are other cases that could trigger expansion
of such a function before the restore is complete. It is unfortunate
that this bit you in a minor release, but there are lots of other
ways you could have tripped over the missing dependency unexpectedly.

regards, tom lane