spurious function execution in prepared statements.

Started by Merlin Moncureover 21 years ago7 messages

merlin.moncure@rcsonline.com

over 21 years ago

OK, I have a situation that might be a performance problem, a bug, or an
unavoidable consequence of using prepared statements. The short version
is that I am getting function executions for rows not returned in a
result set when they are in a prepared statement.

In other words, I have a query:
select f(t.c) from t where [boolean expr on t] limit 1;

because of the limit phrase, obviously, at most one record is returned
and f executes at most once regardless of the plan used (in practice,
sometimes index, sometimes seq_scan.

Now, if the same query is executed as a prepared statement,
prepare ps(...) as select f(t.c) from t where [expr] limit 1;
execute ps;

now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug? If necessary I should be able
to set up a reproducible example. The easy workaround is to not use
prepared statements in these situations, but I need to be able to
guarantee that f only executes once (even if that means exploring
subqueries).

Merlin

Michael Adler

adler@pobox.com

over 21 years ago

In reply to: Merlin Moncure (#1)

Re: [HACKERS] spurious function execution in prepared statements.

On Thu, Sep 30, 2004 at 09:45:51AM -0400, Merlin Moncure wrote:

Now, if the same query is executed as a prepared statement,
prepare ps(...) as select f(t.c) from t where [expr] limit 1;
execute ps;

now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug? If necessary I should be able
to set up a reproducible example. The easy workaround is to not use
prepared statements in these situations, but I need to be able to
guarantee that f only executes once (even if that means exploring
subqueries).

Here's another workaround that may let you use a prepared statement:

prepare ps(...) as
select f(c) from (select c from t where [expr] limit 1) as t1

-Mike

Merlin Moncure

merlin.moncure@rcsonline.com

over 21 years ago

In reply to: Michael Adler (#2)

Re: [HACKERS] spurious function execution in prepared statements.

Here's another workaround that may let you use a prepared statement:

prepare ps(...) as
select f(c) from (select c from t where [expr] limit 1) as t1

-Mike

I was just exploring that. In fact, the problem is not limited to
prepared statements...it's just that they are more likely to run a
seqscan so I noticed it there first. Since I am in a situation where I
need very strict control over when and why f gets executed, I pretty
much have to go with the subquery option.

That said, it just seems that out of result set excecutions of f should
be in violation of something...

Merlin

Import Notes

Resolved by subject fallback

Stephan Szabo

sszabo@megazone.bigpanda.com

over 21 years ago

In reply to: Merlin Moncure (#1)

Re: spurious function execution in prepared statements.

On Thu, 30 Sep 2004, Merlin Moncure wrote:

OK, I have a situation that might be a performance problem, a bug, or an
unavoidable consequence of using prepared statements. The short version
is that I am getting function executions for rows not returned in a
result set when they are in a prepared statement.

In other words, I have a query:
select f(t.c) from t where [boolean expr on t] limit 1;

An actual boolean expr on t? Or on a column in t?

because of the limit phrase, obviously, at most one record is returned
and f executes at most once regardless of the plan used (in practice,
sometimes index, sometimes seq_scan.

Now, if the same query is executed as a prepared statement,
prepare ps(...) as select f(t.c) from t where [expr] limit 1;
execute ps;

now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug? If necessary I should be able
to set up a reproducible example. The easy workaround is to not use
prepared statements in these situations, but I need to be able to
guarantee that f only executes once (even if that means exploring
subqueries).

I think a reproducible example would be good. Simple attempts to duplicate
this on 8.0b2 have failed for me, unless I'm using order by.

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Merlin Moncure (#1)

Re: [HACKERS] spurious function execution in prepared statements.

"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:

now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug?

Works for me.

regression=# create function f(int) returns int as '
regression'# begin
regression'# raise notice ''f(%)'', $1;
regression'# return $1;
regression'# end' language plpgsql;
CREATE FUNCTION
regression=# select f(unique2) from tenk1 where unique2%2 = 1 limit 2;
NOTICE: f(1)
NOTICE: f(3)
f
---
1
3
(2 rows)

regression=# prepare ps as
regression-# select f(unique2) from tenk1 where unique2%2 = 1 limit 2;
PREPARE
regression=# execute ps;
NOTICE: f(1)
NOTICE: f(3)
f
---
1
3
(2 rows)

regression=#

You sure you aren't using f() in the WHERE clause?

regards, tom lane

Merlin Moncure

merlin.moncure@rcsonline.com

over 21 years ago

In reply to: Tom Lane (#5)

Re: spurious function execution in prepared statements.

Stephan Szabo wrote:

On Thu, 30 Sep 2004, Merlin Moncure wrote:

OK, I have a situation that might be a performance problem, a bug,

or an

unavoidable consequence of using prepared statements. The short

version

is that I am getting function executions for rows not returned in a
result set when they are in a prepared statement.

An actual boolean expr on t? Or on a column in t?

[...]

I think a reproducible example would be good. Simple attempts to

duplicate

this on 8.0b2 have failed for me, unless I'm using order by.

Note: I confirmed that breaking out the 'where' part of the query into
subquery suppresses the behavior.

Here is the actual query:
select lock_cuid(id), *
from data3.wclaim_line_file
where wcl_vin_no >= '32-MHAB-C-X-7243' and
(wcl_vin_no > '32-MHAB-C-X-7243' or wcl_claim_no >=
001) and
(wcl_vin_no > '32-MHAB-C-X-7243' or wcl_claim_no >
001 or id > 2671212)
order by wcl_vin_no, wcl_claim_no, id
limit 1

Here is the prepared statement declaration:
prepare data3_read_next_wclaim_line_file_1_lock (character varying,
numeric, int8, numeric)
as select lock_cuid(id), *
from data3.wclaim_line_file
where wcl_vin_no >= $1 and
(wcl_vin_no > $1 or wcl_claim_no >= $2) and
(wcl_vin_no > $1 or wcl_claim_no > $2 or id > $3)
order by wcl_vin_no, wcl_claim_no, id limit $4

Here is the plan when it runs lock_cuid repeatedly (aside: disabling
seqscans causes an index plan, but that's not the point):

esp=# explain execute data3_read_next_wclaim_line_file_1_lock
('32-MHAB-C-X-7243', 001, 2671212, 1);

QUERY PLAN

------------------------------------------------------------------------
----------------------------
------------------------------------------------------------------------
----------------------------
--------------------------------
Limit (cost=13108.95..13162.93 rows=21592 width=260)
-> Sort (cost=13108.95..13162.93 rows=21592 width=260)
Sort Key: wcl_vin_no, wcl_claim_no, id
-> Seq Scan on wclaim_line_file (cost=0.00..11554.52
rows=21592 width=260)
Filter: (((wcl_vin_no)::text >= ($1)::text) AND
(((wcl_vin_no)::text > ($1)::text) OR
((wcl_claim_no)::numeric >= $2)) AND (((wcl_vin_no)::text > ($1)::text)
OR ((wcl_claim_no)::numeric

$2) OR ((id)::bigint > $3)))

(5 rows)

Import Notes

Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Merlin Moncure (#6)

Re: spurious function execution in prepared statements.

"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:

Here is the actual query:
select lock_cuid(id), *
...
order by wcl_vin_no, wcl_claim_no, id
limit 1

Looks like Stephan made the right guess.

Logically the LIMIT executes after the ORDER BY, so the sorted result
has to be formed completely. The fact that we are able to optimize
this in some cases does not represent a promise that we can do it in
all cases. Ergo, it's not a bug.

regards, tom lane