BUG #15060: Row in table not found when using pg function in an expression

Started by PG Bug reporting formabout 8 years ago11 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15060
Logged by: Dejan Petrovic
Email address: dejan.petrovic@islonline.com
PostgreSQL version: 10.2
Operating system: CentOS 6
Description:

Hello,

I tested this in postgresql versions 9.1, 10.1 and 10.2 on centOS.

In short this is what happens (in a plpgsql function):
1.) An insert is done into 'bug' table
2.) A SELECT is done to make sure the INSERT was successful
3.) Another function (get_bug_id) is called which returns id based on
value.
When the function is called directly, it returns the id correctly. When it's
called in an expression, it does not find the inserted row and an exception
is raised.

I have prepared a minimal example:

CREATE TABLE public.bug
(
id integer NOT NULL,
text text,
CONSTRAINT bug_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.bug
OWNER TO postgres;

CREATE OR REPLACE FUNCTION public.get_bug_id(in_text text)
RETURNS integer AS
$BODY$
DECLARE
my_int int;
BEGIN
SELECT INTO my_int id from bug WHERE text = in_text;
IF NOT FOUND THEN
RAISE EXCEPTION 'row not found - BUG?';
END IF;

RETURN my_int;
END;
$BODY$
LANGUAGE plpgsql STABLE
COST 100;
ALTER FUNCTION public.get_bug_id(text)
OWNER TO postgres;

CREATE OR REPLACE FUNCTION test_bug()
RETURNS text AS
$BODY$
DECLARE
my_int int;
my_text text;
BEGIN
my_text := 'this is a bug';

INSERT INTO bug (id,text) VALUES (1,my_text);

SELECT INTO my_int id from bug WHERE text = my_text;
IF NOT FOUND THEN
RAISE EXCEPTION 'row does not exist';
END IF;
perform get_bug_id(my_text); -- This is OK - get_bug_id returns '1'
perform id FROM bug WHERE id = get_bug_id(my_text); -- This fails -
get_bug_id raises exception in version 10, works OK in version 9.1
RETURN 'OK';
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION test_bug()
OWNER TO postgres;

select test_bug()

#2Mark Scheffer
pg2401k@pinkwin.com
In reply to: PG Bug reporting form (#1)
Re: BUG #15060: Row in table not found when using pg function in an expression

PG Bug reporting form wrote

I tested this in postgresql versions 9.1, 10.1 and 10.2 on centOS.

In short this is what happens (in a plpgsql function):
1.) An insert is done into 'bug' table
2.) A SELECT is done to make sure the INSERT was successful
3.) Another function (get_bug_id) is called which returns id based on
value.
When the function is called directly, it returns the id correctly. When
it's
called in an expression, it does not find the inserted row and an
exception
is raised.

Significant is that function get_bug_id() being STABLE is necessary for the
bug to occur.
It appears to be not a bug, since documentation reads:

For functions written in SQL or in any of the standard procedural
languages, there is a second important property determined by the
volatility category,

*

namely the visibility of any data changes that have been made by the SQL
command that is calling the function. A VOLATILE function will see such
changes, a STABLE or IMMUTABLE function will not

*

. This behavior is implemented using the snapshotting behavior of MVCC
(see Chapter 13): STABLE and IMMUTABLE functions use a snapshot
established as of the start of the calling query, whereas VOLATILE
functions obtain a fresh snapshot at the start of each query they execute.

Although i.m.o. it is strange that one produces the proper result and the
other call does not.

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html

#3Marko Tiikkaja
marko@joh.to
In reply to: Mark Scheffer (#2)
Re: BUG #15060: Row in table not found when using pg function in an expression

On Mon, Feb 12, 2018 at 2:37 PM, Mark Scheffer <pg2401k@pinkwin.com> wrote:

Significant is that function get_bug_id() being STABLE is necessary for the
bug to occur.
It appears to be not a bug, since documentation reads:

For functions written in SQL or in any of the standard procedural
languages, there is a second important property determined by the
volatility category,

*

namely the visibility of any data changes that have been made by the SQL
command that is calling the function. A VOLATILE function will see such
changes, a STABLE or IMMUTABLE function will not

I don't think this is relevant, since the changes were NOT made by the SQL
command calling the function. They were made by the INSERT which executed
earlier in a VOLATILE function.

.m

#4Mark Scheffer
pg2401k@pinkwin.com
In reply to: Marko Tiikkaja (#3)
Re: BUG #15060: Row in table not found when using pg function in an expression

Marko Tiikkaja-4 wrote

I don't think this is relevant, since the changes were NOT made by the SQL
command calling the function. They were made by the INSERT which executed
earlier in a VOLATILE function.

Yes I should have read a few lines further, and wake up:

Because of this snapshotting behavior, a function containing only SELECT
commands can safely be marked STABLE, even if it selects from tables that
might be undergoing modifications by concurrent queries. PostgreSQL will
execute all commands of a STABLE function using the snapshot established
for the calling query, and so it will see a fixed view of the database
throughout that query.

Sorry for inconvenience. Hope the VOLATILE-STABLE difference in behavior
helps in resolving this bug.

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html

#5Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: PG Bug reporting form (#1)
Re: BUG #15060: Row in table not found when using pg function in an expression

"PG" == PG Bug reporting form <noreply@postgresql.org> writes:

PG> I tested this in postgresql versions 9.1, 10.1 and 10.2 on centOS.

PG> In short this is what happens (in a plpgsql function):
PG> 1.) An insert is done into 'bug' table
PG> 2.) A SELECT is done to make sure the INSERT was successful
PG> 3.) Another function (get_bug_id) is called which returns id based on
PG> value.
PG> When the function is called directly, it returns the id correctly.
PG> When it's called in an expression, it does not find the inserted
PG> row and an exception is raised.

So what's happening here is that the function get_bug_id, being stable,
is being called speculatively at plan time for the query where it
appears in the WHERE clause. For whatever reason, the snapshot it's
being run in at that time is not the same one actually used for the
later execution of the query, and the plan-time snapshot doesn't see the
just-inserted row.

It looks like what's going on here is that SPI does GetCachedPlan -
which is where planning will happen - _before_ establishing the new
snapshot in the non-read-only case (read_only is false here because the
calling function, test_bug(), is volatile).

--
Andrew (irc:RhodiumToad)

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#5)
Re: BUG #15060: Row in table not found when using pg function in an expression

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So what's happening here is that the function get_bug_id, being stable,
is being called speculatively at plan time for the query where it
appears in the WHERE clause. For whatever reason, the snapshot it's
being run in at that time is not the same one actually used for the
later execution of the query, and the plan-time snapshot doesn't see the
just-inserted row.

It looks like what's going on here is that SPI does GetCachedPlan -
which is where planning will happen - _before_ establishing the new
snapshot in the non-read-only case (read_only is false here because the
calling function, test_bug(), is volatile).

Yeah, I came to the same conclusion. I think it's basically accidental
that the given test case works before 9.2: the reason seems to be that
in 9.1, the plancache doesn't pass through the parameter list containing
the value of "my_text", so that the planner is unable to speculatively
execute get_bug_id(). The order of operations in _SPI_execute_plan
is just as wrong though.

The attached patch makes it better and doesn't seem to break any
regression tests. I'm not sure how nervous I am about unexpected
side-effects ...

regards, tom lane

Attachments:

set-snapshot-sooner-in-spi.patchtext/x-diff; charset=us-ascii; name=set-snapshot-sooner-in-spi.patchDownload+12-12
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: BUG #15060: Row in table not found when using pg function in an expression

I wrote:

The attached patch makes it better and doesn't seem to break any
regression tests. I'm not sure how nervous I am about unexpected
side-effects ...

Oh ... my nervousness was justified. This is no good at all; it'd
create the same problem introduced at the client query level by
commit d573e239f and later reverted by 532994299, namely that we'd
be reading tables using a snapshot acquired before we've acquired
locks on said tables.

To really fix this complaint, it seems like we'd need to take a snapshot
before planning and then a new one after planning, matching what happens
in the main query pipeline. That's pretty nasty for performance :-(
... it would more or less double the load on the ProcArray for SPI-heavy
applications.

One could argue that, since this function is throwing an error, it's
not really stable --- that implies a lack of interesting side-effects.
I don't find that argument totally convincing, but the price of avoiding
the error might be more than we really want to pay.

regards, tom lane

#8Marko Tiikkaja
marko@joh.to
In reply to: Tom Lane (#6)
Re: BUG #15060: Row in table not found when using pg function in an expression

On Mon, Feb 12, 2018 at 9:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

So what's happening here is that the function get_bug_id, being stable,
is being called speculatively at plan time for the query where it
appears in the WHERE clause. For whatever reason, the snapshot it's
being run in at that time is not the same one actually used for the
later execution of the query, and the plan-time snapshot doesn't see the
just-inserted row.

It looks like what's going on here is that SPI does GetCachedPlan -
which is where planning will happen - _before_ establishing the new
snapshot in the non-read-only case (read_only is false here because the
calling function, test_bug(), is volatile).

Yeah, I came to the same conclusion. I think it's basically accidental
that the given test case works before 9.2: the reason seems to be that
in 9.1, the plancache doesn't pass through the parameter list containing
the value of "my_text", so that the planner is unable to speculatively
execute get_bug_id(). The order of operations in _SPI_execute_plan
is just as wrong though.

I'm not sure I understand. When's the snapshot used for planning actually
taken here?

.m

#9Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Marko Tiikkaja (#8)
Re: BUG #15060: Row in table not found when using pg function in an expression

"Marko" == Marko Tiikkaja <marko@joh.to> writes:

It looks like what's going on here is that SPI does GetCachedPlan -
which is where planning will happen - _before_ establishing the new
snapshot in the non-read-only case (read_only is false here because
the calling function, test_bug(), is volatile).

Yeah, I came to the same conclusion. I think it's basically
accidental that the given test case works before 9.2: the reason
seems to be that in 9.1, the plancache doesn't pass through the
parameter list containing the value of "my_text", so that the
planner is unable to speculatively execute get_bug_id(). The order
of operations in _SPI_execute_plan is just as wrong though.

Marko> I'm not sure I understand. When's the snapshot used for planning
Marko> actually taken here?

GetCachedPlan will use either whatever snapshot is already set, if there
is one, or it will set one of its own (actually at least two: separate
snapshots for revalidate + parse analysis and for planning).

In the case of a volatile plpgsql function, the snapshot in which the
function was called will, I believe, still be the active snapshot at the
relevant point, so calls made in planning won't see the function's own
changes.

The recent introduction of procedures exposes this interesting little
variation in behavior (pg11 only):

create table bug (id integer, d text);
create or replace function getbug(text) returns integer
language plpgsql stable
as $$
declare
b_id integer;
begin
select into b_id id from bug where d = $1;
if not found then
raise info 'bug % not found',$1;
else
raise info 'bug % id %',$1,b_id;
end if;
return b_id;
end;
$$;

truncate table bug;
do $$
begin
insert into bug values (1,'foo');
perform * from bug where id = getbug('foo');
end;
$$;
INFO: bug foo not found
INFO: bug foo id 1

truncate table bug;
do $$
begin
commit;
insert into bug values (1,'foo');
perform * from bug where id = getbug('foo');
end;
$$;
INFO: bug foo id 1
INFO: bug foo id 1

I assume that what's going on here is that the commit, which ends the
transaction in which the DO was invoked and begins a new one, doesn't
set a new active snapshot in the new transaction, and so planning of the
perform in the second case is taking new snapshots inside GetCachedPlan.

--
Andrew (irc:RhodiumToad)

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#9)
Re: BUG #15060: Row in table not found when using pg function in an expression

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Marko" == Marko Tiikkaja <marko@joh.to> writes:
Marko> I'm not sure I understand. When's the snapshot used for planning
Marko> actually taken here?

GetCachedPlan will use either whatever snapshot is already set, if there
is one, or it will set one of its own (actually at least two: separate
snapshots for revalidate + parse analysis and for planning).

Hm. So we could improve on the fix I proposed earlier if there were a
way to tell GetCachedPlan "take a new snapshot even though there is an
active snapshot". It's still expensive, but at least the code path where
we use an existing generic plan doesn't get any added cost.

CC'ing Peter because of this part:

The recent introduction of procedures exposes this interesting little
variation in behavior (pg11 only):
...
I assume that what's going on here is that the commit, which ends the
transaction in which the DO was invoked and begins a new one, doesn't
set a new active snapshot in the new transaction, and so planning of the
perform in the second case is taking new snapshots inside GetCachedPlan.

I find this concerning. No part of plpgsql, or any other PL, has ever
before executed without the constant presence of an ActiveSnapshot.
It seems likely to me that this will expose, if not outright bugs,
at least strange differences in behavior between "procedure" code and
"function" code. Perhaps the commit-and-restart-transaction code ought
to include pushing a new ActiveSnapshot.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#10)
Re: BUG #15060: Row in table not found when using pg function in an expression

I wrote:

Hm. So we could improve on the fix I proposed earlier if there were a
way to tell GetCachedPlan "take a new snapshot even though there is an
active snapshot". It's still expensive, but at least the code path where
we use an existing generic plan doesn't get any added cost.

Here's a draft patch along those lines. Because it changes
GetCachedPlan's API, I'd be a bit worried about back-patching it; there
might be third-party code calling that directly. However, given that it
took so many years for anyone to notice a problem here, I think maybe
just fixing it in HEAD is fine.

regards, tom lane

Attachments:

use-new-snapshot-for-planning-in-non-readonly-SPI.patchtext/x-diff; charset=us-ascii; name=use-new-snapshot-for-planning-in-non-readonly-SPI.patchDownload+164-80