Unexpected results from CALL and AUTOCOMMIT=off

Started by Victor Yegorovover 1 year ago8 messages

vyegorov@gmail.com

over 1 year ago

Greetings.

I am observing the following results on PostgreSQL 15.7
First, setup:

create table t_test(x bigint);
insert into t_test values(0);

create or replace function f_get_x()
returns bigint
language plpgsql
stable
as $function$
declare
l_result bigint;
begin
select x into l_result from t_test;
--raise notice 'f_get_x() >> x=%', l_result;
--raise notice 'f_get_x() >> xact=%', txid_current_if_assigned();
return l_result;
end;
$function$;

create or replace procedure f_print_x(x bigint)
language plpgsql
as $procedure$
begin
raise notice 'f_print_x() >> x=%', x;
--raise notice 'f_print_x() >> xact=%', txid_current_if_assigned();
end;
$procedure$;

Now, the case:
\set AUTOCOMMIT off
do
$$ begin
--raise notice 'do >> xact=%', txid_current_if_assigned();
update t_test set x = 1;
--raise notice 'do >> xact=%', txid_current_if_assigned();
raise notice 'do >> x=%', f_get_x();
--raise notice 'do >> xact=%', txid_current_if_assigned();
call f_print_x(f_get_x());
end; $$;
NOTICE: do >> x=1
NOTICE: f_print_x() >> x=0
DO

I don't understand why CALL statement is not seeing an updated record.
With AUTOCOMMIT=on, all goes as expected.

I tried to examine snapshots and xids (commented lines), but they're always
the same.

Can you explain this behavior, please? Is it expected?

--
Victor Yegorov

Pierre Forstmann

pierre.forstmann@gmail.com

over 1 year ago

In reply to: Victor Yegorov (#1)

Re: Unexpected results from CALL and AUTOCOMMIT=off

You declared function f_get_x as stable which means:

https://www.postgresql.org/docs/15/sql-createfunction.html

STABLE indicates that the function cannot modify the database, and that
within a single table scan it will consistently return the same result for
the same argument values, but that its result could change across SQL
statements. This is the appropriate selection for functions whose results
depend on database lookups, parameter variables (such as the current time
zone), etc. (It is inappropriate for AFTER triggers that wish to query rows
modified by the current command.) Also note that the current_timestamp
family of functions qualify as stable, since their values do not change
within a transaction.

If you remove stable from function declaration, it works as expected:

drop table t_test;
DROP TABLE
create table t_test(x bigint);
CREATE TABLE
insert into t_test values(0);
INSERT 0 1
create or replace function f_get_x()
returns bigint
language plpgsql
-- stable
as $function$
declare
l_result bigint;
begin
select x into l_result from t_test;
--raise notice 'f_get_x() >> x=%', l_result;
--raise notice 'f_get_x() >> xact=%', txid_current_if_assigned();
return l_result;
end;
$function$;
CREATE FUNCTION
create or replace procedure f_print_x(x bigint)
language plpgsql
as $procedure$
begin
raise notice 'f_print_x() >> x=%', x;
--raise notice 'f_print_x() >> xact=%', txid_current_if_assigned();
end;
$procedure$;
CREATE PROCEDURE
do
$$ begin
--raise notice 'do >> xact=%', txid_current_if_assigned();
update t_test set x = 1;
--raise notice 'do >> xact=%', txid_current_if_assigned();
raise notice 'do >> x=%', f_get_x();
--raise notice 'do >> xact=%', txid_current_if_assigned();
call f_print_x(f_get_x());
end; $$;
psql:test.sql:38: NOTICE: do >> x=1
psql:test.sql:38: NOTICE: f_print_x() >> x=1
DO

Le lun. 3 juin 2024 à 16:42, Victor Yegorov <vyegorov@gmail.com> a écrit :

Show quoted text

Greetings.

I am observing the following results on PostgreSQL 15.7
First, setup:

create table t_test(x bigint);
insert into t_test values(0);

create or replace function f_get_x()
returns bigint
language plpgsql
stable
as $function$
declare
l_result bigint;
begin
select x into l_result from t_test;
--raise notice 'f_get_x() >> x=%', l_result;
--raise notice 'f_get_x() >> xact=%', txid_current_if_assigned();
return l_result;
end;
$function$;

create or replace procedure f_print_x(x bigint)
language plpgsql
as $procedure$
begin
raise notice 'f_print_x() >> x=%', x;
--raise notice 'f_print_x() >> xact=%', txid_current_if_assigned();
end;
$procedure$;

Now, the case:
\set AUTOCOMMIT off
do
$$ begin
--raise notice 'do >> xact=%', txid_current_if_assigned();
update t_test set x = 1;
--raise notice 'do >> xact=%', txid_current_if_assigned();
raise notice 'do >> x=%', f_get_x();
--raise notice 'do >> xact=%', txid_current_if_assigned();
call f_print_x(f_get_x());
end; $$;
NOTICE: do >> x=1
NOTICE: f_print_x() >> x=0
DO

I don't understand why CALL statement is not seeing an updated record.
With AUTOCOMMIT=on, all goes as expected.

I tried to examine snapshots and xids (commented lines), but they're
always the same.

Can you explain this behavior, please? Is it expected?

--
Victor Yegorov

Victor Yegorov

vyegorov@gmail.com

over 1 year ago

In reply to: Pierre Forstmann (#2)

Re: Unexpected results from CALL and AUTOCOMMIT=off

пн, 3 июн. 2024 г. в 20:40, Pierre Forstmann <pierre.forstmann@gmail.com>:

You declared function f_get_x as stable which means:

…

If you remove stable from function declaration, it works as expected:

Well, I checked
https://www.postgresql.org/docs/current/xfunc-volatility.html
There's a paragraph describing why STABLE (and IMMUTABLE) use different
snapshots:

For functions written in SQL or in any of the standard procedural

languages, there is a second important property determined by the
volatility category, namely the visibility of any data changes that have
been made by the SQL command that is calling the function. A > VOLATILE
function will see such changes, a STABLE or IMMUTABLE function will not.
This behavior is implemented using the snapshotting behavior of MVCC (see
Chapter 13): STABLE and IMMUTABLE functions use a snapshot established as
of the start of the

calling query, whereas VOLATILE functions obtain a fresh snapshot at the

start of each query they execute.

But later, docs state, that

Because of this snapshotting behavior, a function containing only SELECT

commands can safely be marked STABLE, even if it selects from tables that
might be undergoing modifications by concurrent queries. PostgreSQL will
execute all commands of a STABLE function using the snapshot established
for the calling query, and so it will see a fixed view of the database
throughout that query.

And therefore I assume STABLE should work in this case. Well, it seems not
to.

I assume there's smth to do with implicit BEGIN issued in non-AUTOCOMMIT
mode and non-atomic DO block behaviour.

--
Victor Yegorov

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Victor Yegorov (#3)

Re: Unexpected results from CALL and AUTOCOMMIT=off

Victor Yegorov <vyegorov@gmail.com> writes:

пн, 3 июн. 2024 г. в 20:40, Pierre Forstmann <pierre.forstmann@gmail.com>:

If you remove stable from function declaration, it works as expected:

... therefore I assume STABLE should work in this case. Well, it seems not
to.

I agree that this looks like a bug, since your example shows that the
same function works as-expected in an ordinary expression but not in
a CALL. The dependency on AUTOCOMMIT (that is, being within an outer
transaction block) seems even odder. I've not dug into it yet, but
I suppose we're passing the wrong snapshot to the CALL arguments.
A volatile function wouldn't use that snapshot, explaining Pierre's
result.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Tom Lane (#4)

1 attachment(s)

Re: Unexpected results from CALL and AUTOCOMMIT=off

[ redirecting to pgsql-hackers ]

I wrote:

I agree that this looks like a bug, since your example shows that the
same function works as-expected in an ordinary expression but not in
a CALL. The dependency on AUTOCOMMIT (that is, being within an outer
transaction block) seems even odder. I've not dug into it yet, but
I suppose we're passing the wrong snapshot to the CALL arguments.

I poked into this and found that the source of the problem is that
plpgsql's exec_stmt_call passes allow_nonatomic = true even when
it's running in an atomic context. So we can fix it with basically
a one-line change:

-	options.allow_nonatomic = true;
+	options.allow_nonatomic = !estate->atomic;

I'm worried about whether external callers might've made a comparable
mistake, but I think all we can do is document it a little better.
AFAICS there isn't any good way for spi.c to realize that this mistake
has been made, else we could have it patch up the mistake centrally.
I've not attempted to make those doc updates in the attached draft
patch though, nor have I added a test case yet.

Before realizing that this was the issue, I spent a fair amount of
time on the idea that _SPI_execute_plan() was doing things wrong,
and that led me to notice that its comment about having four modes
of snapshot operation has been falsified in multiple ways. So this
draft does include fixes for that comment.

Thoughts?

regards, tom lane

Attachments:

fix-call-in-atomic-context-draft.patchtext/x-diff; charset=us-ascii; name=fix-call-in-atomic-context-draft.patchDownload

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a97a7e3bd4..c93b6c192f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2425,12 +2425,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 	 * snapshot != InvalidSnapshot, read_only = false: use the given snapshot,
 	 * modified by advancing its command ID before each querytree.
 	 *
-	 * snapshot == InvalidSnapshot, read_only = true: use the entry-time
-	 * ActiveSnapshot, if any (if there isn't one, we run with no snapshot).
+	 * snapshot == InvalidSnapshot, read_only = true: do nothing for queries
+	 * that require no snapshot.  For those that do, ensure that a Portal
+	 * snapshot exists; then use that, or use the entry-time ActiveSnapshot if
+	 * that exists and is different.
 	 *
-	 * snapshot == InvalidSnapshot, read_only = false: take a full new
-	 * snapshot for each user command, and advance its command ID before each
-	 * querytree within the command.
+	 * snapshot == InvalidSnapshot, read_only = false: do nothing for queries
+	 * that require no snapshot.  For those that do, ensure that a Portal
+	 * snapshot exists; then, in atomic execution (!allow_nonatomic) take a
+	 * full new snapshot for each user command, and advance its command ID
+	 * before each querytree within the command.  In allow_nonatomic mode we
+	 * just use the Portal snapshot unmodified.
 	 *
 	 * In the first two cases, we can just push the snap onto the stack once
 	 * for the whole plan list.
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index 6947575b94..54ede9d85e 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2218,12 +2218,12 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	 * for the plan.  This avoids refcount leakage complaints if the called
 	 * procedure ends the current transaction.
 	 *
-	 * Also, tell SPI to allow non-atomic execution.
+	 * Also, tell SPI to allow non-atomic execution if appropriate.
 	 */
 	memset(&options, 0, sizeof(options));
 	options.params = paramLI;
 	options.read_only = estate->readonly_func;
-	options.allow_nonatomic = true;
+	options.allow_nonatomic = !estate->atomic;
 	options.owner = estate->procedure_resowner;
 
 	rc = SPI_execute_plan_extended(expr->plan, &options);

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Tom Lane (#5)

1 attachment(s)

Re: Unexpected results from CALL and AUTOCOMMIT=off

I wrote:

I poked into this and found that the source of the problem is that
plpgsql's exec_stmt_call passes allow_nonatomic = true even when
it's running in an atomic context. So we can fix it with basically
a one-line change:

-	options.allow_nonatomic = true;
+	options.allow_nonatomic = !estate->atomic;

I'm worried about whether external callers might've made a comparable
mistake, but I think all we can do is document it a little better.
AFAICS there isn't any good way for spi.c to realize that this mistake
has been made, else we could have it patch up the mistake centrally.

Actually, after poking around some more I found that there *is* a way
to deal with this within spi.c: we can make _SPI_execute_plan ignore
options->allow_nonatomic unless the SPI_OPT_NONATOMIC flag was given
when connecting.

I like this better than my first solution because (a) it seems to
make the allow_nonatomic flag behave in a more intuitive way;
(b) spi.c gates some other behaviors on SPI_OPT_NONATOMIC, so that
gating this one too seems more consistent, and (c) this way, we fix
not only plpgsql but anything that has copied its coding pattern.

Hence, new patch attached, now with docs and tests. Barring
objections I'll push this one.

regards, tom lane

Attachments:

fix-call-in-atomic-context-v1.patchtext/x-diff; charset=us-ascii; name=fix-call-in-atomic-context-v1.patchDownload

diff --git a/doc/src/sgml/spi.sgml b/doc/src/sgml/spi.sgml
index bb3778688b..7d154914b9 100644
--- a/doc/src/sgml/spi.sgml
+++ b/doc/src/sgml/spi.sgml
@@ -752,7 +752,9 @@ int SPI_execute_extended(const char *<parameter>command</parameter>,
     <listitem>
      <para>
       <literal>true</literal> allows non-atomic execution of CALL and DO
-      statements
+      statements (but this field is ignored unless
+      the <symbol>SPI_OPT_NONATOMIC</symbol> flag was passed
+      to <function>SPI_connect_ext</function>)
      </para>
     </listitem>
    </varlistentry>
@@ -1893,7 +1895,9 @@ int SPI_execute_plan_extended(SPIPlanPtr <parameter>plan</parameter>,
     <listitem>
      <para>
       <literal>true</literal> allows non-atomic execution of CALL and DO
-      statements
+      statements (but this field is ignored unless
+      the <symbol>SPI_OPT_NONATOMIC</symbol> flag was passed
+      to <function>SPI_connect_ext</function>)
      </para>
     </listitem>
    </varlistentry>
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a97a7e3bd4..e516c0a67c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2399,6 +2399,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 	uint64		my_processed = 0;
 	SPITupleTable *my_tuptable = NULL;
 	int			res = 0;
+	bool		allow_nonatomic;
 	bool		pushed_active_snap = false;
 	ResourceOwner plan_owner = options->owner;
 	SPICallbackArg spicallbackarg;
@@ -2406,6 +2407,12 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
 
+	/*
+	 * We allow nonatomic behavior only if options->allow_nonatomic is set
+	 * *and* the SPI_OPT_NONATOMIC flag was given when connecting.
+	 */
+	allow_nonatomic = options->allow_nonatomic && !_SPI_current->atomic;
+
 	/*
 	 * Setup error traceback support for ereport()
 	 */
@@ -2425,12 +2432,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 	 * snapshot != InvalidSnapshot, read_only = false: use the given snapshot,
 	 * modified by advancing its command ID before each querytree.
 	 *
-	 * snapshot == InvalidSnapshot, read_only = true: use the entry-time
-	 * ActiveSnapshot, if any (if there isn't one, we run with no snapshot).
+	 * snapshot == InvalidSnapshot, read_only = true: do nothing for queries
+	 * that require no snapshot.  For those that do, ensure that a Portal
+	 * snapshot exists; then use that, or use the entry-time ActiveSnapshot if
+	 * that exists and is different.
 	 *
-	 * snapshot == InvalidSnapshot, read_only = false: take a full new
-	 * snapshot for each user command, and advance its command ID before each
-	 * querytree within the command.
+	 * snapshot == InvalidSnapshot, read_only = false: do nothing for queries
+	 * that require no snapshot.  For those that do, ensure that a Portal
+	 * snapshot exists; then, in atomic execution (!allow_nonatomic) take a
+	 * full new snapshot for each user command, and advance its command ID
+	 * before each querytree within the command.  In allow_nonatomic mode we
+	 * just use the Portal snapshot unmodified.
 	 *
 	 * In the first two cases, we can just push the snap onto the stack once
 	 * for the whole plan list.
@@ -2440,6 +2452,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 	 */
 	if (snapshot != InvalidSnapshot)
 	{
+		/* this intentionally tests the options field not the derived value */
 		Assert(!options->allow_nonatomic);
 		if (options->read_only)
 		{
@@ -2585,7 +2598,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 			 * Skip it when doing non-atomic execution, though (we rely
 			 * entirely on the Portal snapshot in that case).
 			 */
-			if (!options->read_only && !options->allow_nonatomic)
+			if (!options->read_only && !allow_nonatomic)
 			{
 				if (pushed_active_snap)
 					PopActiveSnapshot();
@@ -2685,14 +2698,13 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 				QueryCompletion qc;
 
 				/*
-				 * If the SPI context is atomic, or we were not told to allow
-				 * nonatomic operations, tell ProcessUtility this is an atomic
-				 * execution context.
+				 * If we're not allowing nonatomic operations, tell
+				 * ProcessUtility this is an atomic execution context.
 				 */
-				if (_SPI_current->atomic || !options->allow_nonatomic)
-					context = PROCESS_UTILITY_QUERY;
-				else
+				if (allow_nonatomic)
 					context = PROCESS_UTILITY_QUERY_NONATOMIC;
+				else
+					context = PROCESS_UTILITY_QUERY;
 
 				InitializeQueryCompletion(&qc);
 				ProcessUtility(stmt,
diff --git a/src/pl/plpgsql/src/expected/plpgsql_call.out b/src/pl/plpgsql/src/expected/plpgsql_call.out
index 17235fca91..0a63b1d44e 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_call.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_call.out
@@ -564,3 +564,53 @@ NOTICE:  inner_p(44)
  
 (1 row)
 
+-- Check that stable functions in CALL see the correct snapshot
+CREATE TABLE t_test (x int);
+INSERT INTO t_test VALUES (0);
+CREATE FUNCTION f_get_x () RETURNS int
+AS $$
+DECLARE l_result int;
+BEGIN
+  SELECT x INTO l_result FROM t_test;
+  RETURN l_result;
+END
+$$ LANGUAGE plpgsql STABLE;
+CREATE PROCEDURE f_print_x (x int)
+AS $$
+BEGIN
+  RAISE NOTICE 'f_print_x(%)', x;
+END
+$$ LANGUAGE plpgsql;
+-- test in non-atomic context
+DO $$
+BEGIN
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+  ROLLBACK;
+END
+$$;
+NOTICE:  f_get_x(1)
+NOTICE:  f_print_x(1)
+NOTICE:  f_get_x(2)
+NOTICE:  f_print_x(2)
+-- test in atomic context
+BEGIN;
+DO $$
+BEGIN
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+END
+$$;
+NOTICE:  f_get_x(1)
+NOTICE:  f_print_x(1)
+NOTICE:  f_get_x(2)
+NOTICE:  f_print_x(2)
+ROLLBACK;
diff --git a/src/pl/plpgsql/src/sql/plpgsql_call.sql b/src/pl/plpgsql/src/sql/plpgsql_call.sql
index 869d021a07..4cbda0382e 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_call.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_call.sql
@@ -522,3 +522,53 @@ CREATE FUNCTION f(int) RETURNS int AS $$ SELECT $1 + 2 $$ LANGUAGE sql;
 
 CALL outer_p(42);
 SELECT outer_f(42);
+
+-- Check that stable functions in CALL see the correct snapshot
+
+CREATE TABLE t_test (x int);
+INSERT INTO t_test VALUES (0);
+
+CREATE FUNCTION f_get_x () RETURNS int
+AS $$
+DECLARE l_result int;
+BEGIN
+  SELECT x INTO l_result FROM t_test;
+  RETURN l_result;
+END
+$$ LANGUAGE plpgsql STABLE;
+
+CREATE PROCEDURE f_print_x (x int)
+AS $$
+BEGIN
+  RAISE NOTICE 'f_print_x(%)', x;
+END
+$$ LANGUAGE plpgsql;
+
+-- test in non-atomic context
+DO $$
+BEGIN
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+  ROLLBACK;
+END
+$$;
+
+-- test in atomic context
+BEGIN;
+
+DO $$
+BEGIN
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+  UPDATE t_test SET x = x + 1;
+  RAISE NOTICE 'f_get_x(%)', f_get_x();
+  CALL f_print_x(f_get_x());
+END
+$$;
+
+ROLLBACK;

Nathan Bossart

nathandbossart@gmail.com

over 1 year ago

In reply to: Tom Lane (#6)

Re: Unexpected results from CALL and AUTOCOMMIT=off

On Tue, Jun 04, 2024 at 02:28:43PM -0400, Tom Lane wrote:

Actually, after poking around some more I found that there *is* a way
to deal with this within spi.c: we can make _SPI_execute_plan ignore
options->allow_nonatomic unless the SPI_OPT_NONATOMIC flag was given
when connecting.

I like this better than my first solution because (a) it seems to
make the allow_nonatomic flag behave in a more intuitive way;
(b) spi.c gates some other behaviors on SPI_OPT_NONATOMIC, so that
gating this one too seems more consistent, and (c) this way, we fix
not only plpgsql but anything that has copied its coding pattern.

Hence, new patch attached, now with docs and tests. Barring
objections I'll push this one.

Should we expand the documentation for SPI_connect_ext() to note that
SPI_execute_extended()/SPI_execute_plan_extended() depend on the flag?

--
nathan

Tom Lane

tgl@sss.pgh.pa.us

over 1 year ago

In reply to: Nathan Bossart (#7)

Re: Unexpected results from CALL and AUTOCOMMIT=off

Nathan Bossart <nathandbossart@gmail.com> writes:

On Tue, Jun 04, 2024 at 02:28:43PM -0400, Tom Lane wrote:

Hence, new patch attached, now with docs and tests. Barring
objections I'll push this one.

Should we expand the documentation for SPI_connect_ext() to note that
SPI_execute_extended()/SPI_execute_plan_extended() depend on the flag?

Perhaps. They already did, in that the atomic flag was taken into
account while deciding how to handle a nested CALL; basically what this
fix does is to make sure that the snapshot handling is done the same
way. I think that what I added to the docs is probably sufficient,
but I'll yield to majority opinion if people think not.

regards, tom lane