When IMMUTABLE is not.

Started by Yura Sokolovover 2 years ago14 messages

Yura Sokolov

y.sokolov@postgrespro.ru

over 2 years ago

2 attachment(s)

Good day, hackers.

I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
function doesn't manipulate data.

In fact, SPI checks only direct function kind, but fails to check indirect call.

Attached immutable_not.sql creates 3 functions:

- `immutable_direct` is IMMUTABLE and tries to insert into table directly.
PostgreSQL correctly detects and forbids this action.

- `volatile_direct` is VOLATILE and inserts into table directly.
It is allowed and executed well.

- `immutable_indirect` is IMMUTABLE and calls `volatile_direct`.
PostgreSQL failed to detect and prevent this DML manipulation.

Output:

select immutable_direct('immutable_direct'); psql:immutable_not.sql:28:
ERROR: INSERT is not allowed in a non-volatile function CONTEXT: SQL
statement "insert into xxx values(j)" PL/pgSQL function
immutable_direct(character varying) line 3 at SQL statement select
volatile_direct('volatile_direct'); volatile_direct -----------------
volatile_direct (1 row) select immutable_indirect('immutable_indirect');
immutable_indirect -------------------- immutable_indirect (1 row)
select * from xxx; i -------------------- volatile_direct
immutable_indirect (2 rows) Attached forbid-non-volatile-mutations.diff
add checks readonly function didn't made data manipulations. Output for
patched version: select immutable_indirect('immutable_indirect');
psql:immutable_not.sql:32: ERROR: Damn2! Update were done in a
non-volatile function CONTEXT: SQL statement "SELECT
volatile_direct(j)" PL/pgSQL function immutable_indirect(character
varying) line 3 at PERFORM I doubt check should be done this way. This
check is necessary, but it should be FATAL instead of ERROR. And ERROR
should be generated at same place, when it is generated for
`immutable_direct`, but with check of "read_only" status through whole
call stack instead of just direct function kind. ----- regards, Yura
Sokolov Postgres Professional

Attachments:

immutable_not.sqlapplication/sql; name=immutable_not.sqlDownload

forbid-non-volatile-mutations.difftext/x-patch; charset=UTF-8; name=forbid-non-volatile-mutations.diffDownload

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 33975687b38..82b6127d650 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1584,6 +1584,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	Portal		portal;
 	SPICallbackArg spicallbackarg;
 	ErrorContextCallback spierrcontext;
+	CommandId 	this_command = InvalidCommandId;
 
 	/*
 	 * Check that the plan is something the Portal code will special-case as
@@ -1730,6 +1731,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	if (read_only)
 	{
 		ListCell   *lc;
+		this_command = GetCurrentCommandId(false);
 
 		foreach(lc, stmt_list)
 		{
@@ -1778,6 +1780,14 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	/* Pop the SPI stack */
 	_SPI_end_call(true);
 
+	if (read_only && this_command != GetCurrentCommandId(false))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						/* translator: %s is a SQL statement name */
+						errmsg("Damn1! Update were done in a non-volatile function")));
+	}
+
 	/* Return the created portal */
 	return portal;
 }
@@ -2404,6 +2414,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 	ErrorContextCallback spierrcontext;
 	CachedPlan *cplan = NULL;
 	ListCell   *lc1;
+	CommandId   this_command = InvalidCommandId;
 
 	/*
 	 * Setup error traceback support for ereport()
@@ -2473,6 +2484,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("empty query does not return tuples")));
 
+	if (options->read_only)
+	{
+		this_command = GetCurrentCommandId(false);
+	}
+
 	foreach(lc1, plan->plancache_list)
 	{
 		CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
@@ -2788,6 +2804,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 			CommandCounterIncrement();
 	}
 
+	if (options->read_only && this_command != GetCurrentCommandId(false))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						/* translator: %s is a SQL statement name */
+						errmsg("Damn2! Update were done in a non-volatile function")));
+	}
+
 fail:
 
 	/* Pop the snapshot off the stack if we pushed one */

Yura Sokolov

y.sokolov@postgrespro.ru

over 2 years ago

In reply to: Yura Sokolov (#1)

Re: When IMMUTABLE is not.

Sorry, previous message were smashed for some reason.

I'll try to repeat

I found, than declaration of function as IMMUTABLE/STABLE is not enough
to be sure
function doesn't manipulate data.

In fact, SPI checks only direct function kind, but fails to check
indirect call.

Attached immutable_not.sql creates 3 functions:

- `immutable_direct` is IMMUTABLE and tries to insert into table directly.
PostgreSQL correctly detects and forbids this action.

- `volatile_direct` is VOLATILE and inserts into table directly.
It is allowed and executed well.

- `immutable_indirect` is IMMUTABLE and calls `volatile_direct`.
PostgreSQL failed to detect and prevent this DML manipulation.

Output:

select volatile_direct('volatile_direct');
volatile_direct
-----------------
volatile_direct
(1 row)

select immutable_indirect('immutable_indirect');
immutable_indirect
--------------------
immutable_indirect
(1 row)

select * from xxx;
i
--------------------
volatile_direct
immutable_indirect
(2 rows)

Attached forbid-non-volatile-mutations.diff add checks readonly function
didn't made data manipulations.
Output for patched version:

select immutable_indirect('immutable_indirect');
psql:immutable_not.sql:32: ERROR: Damn2! Update were done in a
non-volatile function
CONTEXT: SQL statement "SELECT volatile_direct(j)"
PL/pgSQL function immutable_indirect(character varying) line 3 at PERFORM

I doubt check should be done this way. This check is necessary, but it
should be
FATAL instead of ERROR. And ERROR should be generated at same place, when
it is generated for `immutable_direct`, but with check of "read_only"
status through
whole call stack instead of just direct function kind.

-----

regards,
Yura Sokolov
Postgres Professional

Laurenz Albe

laurenz.albe@cybertec.at

over 2 years ago

In reply to: Yura Sokolov (#1)

Re: When IMMUTABLE is not.

On Thu, 2023-06-15 at 13:22 +0300, Yura Sokolov wrote:

Good day, hackers.

I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
function doesn't manipulate data.

[...]

+ errmsg("Damn1! Update were done in a non-volatile function")));

I think it is project policy to start error messages with a lower case character.

Yours,
Laurenz Albe

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Yura Sokolov (#1)

Re: When IMMUTABLE is not.

Yura Sokolov <y.sokolov@postgrespro.ru> writes:

I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
function doesn't manipulate data.

Of course not. It is the user's responsibility to mark functions
properly. Trying to enforce that completely is a fool's errand;
you soon get into trying to solve the halting problem.

I don't like anything about the proposed patch. It's necessarily
only a partial solution, and it probably breaks cases that are
perfectly safe in context.

regards, tom lane

Yura Sokolov

y.sokolov@postgrespro.ru

over 2 years ago

In reply to: Tom Lane (#4)

Re: When IMMUTABLE is not.

15.06.2023 16:21, Tom Lane wrote:

Yura Sokolov <y.sokolov@postgrespro.ru> writes:

I found, than declaration of function as IMMUTABLE/STABLE is not enough to be sure
function doesn't manipulate data.

Of course not. It is the user's responsibility to mark functions
properly. Trying to enforce that completely is a fool's errand

https://github.com/postgres/postgres/commit/b2c4071299e02ed96d48d3c8e776de2fab36f88c.patch

https://github.com/postgres/postgres/commit/cdf8b56d5463815244467ea8f5ec6e72b6c65a6c.patch

Noname

chap@anastigmatix.net

over 2 years ago

In reply to: Tom Lane (#4)

Re: When IMMUTABLE is not.

On 2023-06-15 09:21, Tom Lane wrote:

Yura Sokolov <y.sokolov@postgrespro.ru> writes:

not enough to be sure function doesn't manipulate data.

Of course not. It is the user's responsibility to mark functions
properly.

And also, isn't it the case that IMMUTABLE should mark a function,
not merely that "doesn't manipulate data", but whose return value
doesn't depend in any way on data (outside its own arguments)?

The practice among PLs of choosing an SPI readonly flag based on
the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of
peculiar heuristic, not something inherent in what that declaration
means to the optimizer. (And also influences what snapshot the
function is looking at, and therefore what it can see, which has
also struck me more as a tacked-on effect than something inherent
in the declaration's meaning.)

Regards,
-Chap

Yura Sokolov

y.sokolov@postgrespro.ru

over 2 years ago

In reply to: Noname (#6)

Re: When IMMUTABLE is not.

15.06.2023 16:58, chap@anastigmatix.net пишет:

On 2023-06-15 09:21, Tom Lane wrote:

Yura Sokolov <y.sokolov@postgrespro.ru> writes:

not enough to be sure function doesn't manipulate data.

Of course not. It is the user's responsibility to mark functions
properly.

And also, isn't it the case that IMMUTABLE should mark a function,
not merely that "doesn't manipulate data", but whose return value
doesn't depend in any way on data (outside its own arguments)?

The practice among PLs of choosing an SPI readonly flag based on
the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of
peculiar heuristic, not something inherent in what that declaration
means to the optimizer. (And also influences what snapshot the
function is looking at, and therefore what it can see, which has
also struck me more as a tacked-on effect than something inherent
in the declaration's meaning.)

Documentation disagrees:

https://www.postgresql.org/docs/current/sql-createfunction.html#:~:text=IMMUTABLE%0ASTABLE%0AVOLATILE

|IMMUTABLE|indicates that the function cannot modify the database and

always returns the same result when given the same argument values

|STABLE|indicates that the function cannot modify the database, and

that within a single table scan it will consistently return the same
result for the same argument values, but that its result could change
across SQL statements.

|VOLATILE|indicates that the function value can change even within a

single table scan, so no optimizations can be made... But note that any
function that has side-effects must be classified volatile, even if its
result is quite predictable, to prevent calls from being optimized away

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Noname (#6)

Re: When IMMUTABLE is not.

chap@anastigmatix.net writes:

And also, isn't it the case that IMMUTABLE should mark a function,
not merely that "doesn't manipulate data", but whose return value
doesn't depend in any way on data (outside its own arguments)?

Right. We can't realistically enforce that either, so it's
up to the user.

The practice among PLs of choosing an SPI readonly flag based on
the IMMUTABLE/STABLE/VOLATILE declaration seems to be a sort of
peculiar heuristic, not something inherent in what that declaration
means to the optimizer. (And also influences what snapshot the
function is looking at, and therefore what it can see, which has
also struck me more as a tacked-on effect than something inherent
in the declaration's meaning.)

Well, it is a bit odd at first sight, but these properties play
together well. See

https://www.postgresql.org/docs/current/xfunc-volatility.html

regards, tom lane

Noname

chap@anastigmatix.net

over 2 years ago

In reply to: Noname (#6)

Re: When IMMUTABLE is not.

On 2023-06-15 09:58, chap@anastigmatix.net wrote:

also influences what snapshot the
function is looking at, and therefore what it can see, which has
also struck me more as a tacked-on effect than something inherent
in the declaration's meaning.

I just re-read that and realized I should anticipate the obvious
response "but how can it matter what the function can see, if
it's IMMUTABLE and depends on no data?".

So, I ran into the effect while working on PL/Java, where the
code of a function isn't all found in pg_proc.prosrc; that just
indicates what code has to be fetched from sqlj.jar_entry.

So one could take a strict view that "no PL/Java function should
ever be marked IMMUTABLE" because every one depends on fetching
something (once, at least).

But on the other hand, it would seem punctilious to say that
f(int x, int y) { return x + y; } isn't IMMUTABLE, only because
it depends on a fetch /of its own implementation/, and overall
its behavior is better described by marking it IMMUTABLE.

Regards,
-Chap

#10

David G. Johnston

david.g.johnston@gmail.com

over 2 years ago

In reply to: Noname (#9)

Re: When IMMUTABLE is not.

On Thursday, June 15, 2023, <chap@anastigmatix.net> wrote:

So one could take a strict view that "no PL/Java function should
ever be marked IMMUTABLE" because every one depends on fetching
something (once, at least).

The failure to find and execute the function code itself is not a failure
mode that these markers need be concerned with. Assuming one can execute
the function an immutable function will give the same answer for the same
input for all time.

David J.

#11

Noname

chap@anastigmatix.net

over 2 years ago

In reply to: David G. Johnston (#10)

Re: When IMMUTABLE is not.

On 2023-06-15 10:19, David G. Johnston wrote:

The failure to find and execute the function code itself is not a
failure
mode that these markers need be concerned with. Assuming one can
execute
the function an immutable function will give the same answer for the
same
input for all time.

That was the view I ultimately took, and just made PL/Java suppress that
SPI readonly flag when going to look for the function code.

Until that change, you could run into the not-uncommon situation
where you've just loaded a jar of new functions and try to use them
in the same transaction, and hey presto, the VOLATILE ones all work,
and the IMMUTABLE ones aren't there yet.

Regards,
-Chap

#12

Tom Lane

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: David G. Johnston (#10)

Re: When IMMUTABLE is not.

"David G. Johnston" <david.g.johnston@gmail.com> writes:

The failure to find and execute the function code itself is not a failure
mode that these markers need be concerned with. Assuming one can execute
the function an immutable function will give the same answer for the same
input for all time.

The viewpoint taken in the docs I mentioned is that an IMMUTABLE
marker is a promise from the user to the system about the behavior
of a function. While the system does provide a few simple tools
to catch obvious errors and to make it easier to write functions
that obey such promises, it's mostly on the user to get it right.

In particular, we've never enforced that an immutable function can't
call non-immutable functions. While that would seem like a good idea
in the abstract, we've intentionally not tried to do it. (I'm pretty
sure there is more than one round of previous discussions of the point
in the archives, although locating relevant threads seems hard.)
One reason not to is that polymorphic functions have to be marked
with worst-case volatility labels. There are plenty of examples of
functions that are stable for some input types and immutable for
others (array_to_string, for instance); but the marking system can't
represent that so we have to label them stable. Enforcing that a
user-defined immutable function can't use such a function might
just break things for no gain.

regards, tom lane

#13

Isaac Morland

isaac.morland@gmail.com

over 2 years ago

In reply to: Tom Lane (#12)

Re: When IMMUTABLE is not.

On Thu, 15 Jun 2023 at 10:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

In particular, we've never enforced that an immutable function can't

call non-immutable functions. While that would seem like a good idea
in the abstract, we've intentionally not tried to do it. (I'm pretty
sure there is more than one round of previous discussions of the point
in the archives, although locating relevant threads seems hard.)
One reason not to is that polymorphic functions have to be marked
with worst-case volatility labels. There are plenty of examples of
functions that are stable for some input types and immutable for
others (array_to_string, for instance); but the marking system can't
represent that so we have to label them stable. Enforcing that a
user-defined immutable function can't use such a function might
just break things for no gain.

More sophisticated type systems (which I am *not* volunteering to graft
onto Postgres) can handle some of this, but even Haskell has
unsafePerformIO. The current policy is both wise and practical.

#14

Yura Sokolov

y.sokolov@postgrespro.ru

over 2 years ago

In reply to: Tom Lane (#12)

Re: When IMMUTABLE is not.

15.06.2023 17:49, Tom Lane пишет:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

The failure to find and execute the function code itself is not a failure
mode that these markers need be concerned with. Assuming one can execute
the function an immutable function will give the same answer for the same
input for all time.

The viewpoint taken in the docs I mentioned is that an IMMUTABLE
marker is a promise from the user to the system about the behavior
of a function. While the system does provide a few simple tools
to catch obvious errors and to make it easier to write functions
that obey such promises, it's mostly on the user to get it right.

In particular, we've never enforced that an immutable function can't
call non-immutable functions. While that would seem like a good idea
in the abstract, we've intentionally not tried to do it. (I'm pretty
sure there is more than one round of previous discussions of the point
in the archives, although locating relevant threads seems hard.)
One reason not to is that polymorphic functions have to be marked
with worst-case volatility labels. There are plenty of examples of
functions that are stable for some input types and immutable for
others (array_to_string, for instance); but the marking system can't
represent that so we have to label them stable. Enforcing that a
user-defined immutable function can't use such a function might
just break things for no gain.

"Stable vs Immutable" is much lesser problem compared to "ReadOnly vs
Volatile".

Executing fairly read-only function more times than necessary (or less
times),
doesn't modify data in unexpecting way.

But executing immutable/stable function, that occasionally modifies
data, could
lead to different unexpected effects due to optimizer decided to call
them more
or less times than query assumes.

Some vulnerabilities were present due to user defined functions used in
index
definitions started to modify data. If "read-only" execution were forced
in index
operations, those issues couldn't happen.

it's mostly on the user to get it right.

It is really bad premise. Users does strange things and aren't expected
to be
professionals who really understand whole PostgreSQL internals.

And it is strange to hear it at the same time we don't allow users to do
query hints
since "optimizer does better" :-D

Ok, I'd go and cool myself. Certainly I don't get some point.