spurious function execution in prepared statements.
OK, I have a situation that might be a performance problem, a bug, or an
unavoidable consequence of using prepared statements. The short version
is that I am getting function executions for rows not returned in a
result set when they are in a prepared statement.
In other words, I have a query:
select f(t.c) from t where [boolean expr on t] limit 1;
because of the limit phrase, obviously, at most one record is returned
and f executes at most once regardless of the plan used (in practice,
sometimes index, sometimes seq_scan.
Now, if the same query is executed as a prepared statement,
prepare ps(...) as select f(t.c) from t where [expr] limit 1;
execute ps;
now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug? If necessary I should be able
to set up a reproducible example. The easy workaround is to not use
prepared statements in these situations, but I need to be able to
guarantee that f only executes once (even if that means exploring
subqueries).
Merlin
On Thu, Sep 30, 2004 at 09:45:51AM -0400, Merlin Moncure wrote:
Now, if the same query is executed as a prepared statement,
prepare ps(...) as select f(t.c) from t where [expr] limit 1;
execute ps;now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug? If necessary I should be able
to set up a reproducible example. The easy workaround is to not use
prepared statements in these situations, but I need to be able to
guarantee that f only executes once (even if that means exploring
subqueries).
Here's another workaround that may let you use a prepared statement:
prepare ps(...) as
select f(c) from (select c from t where [expr] limit 1) as t1
-Mike
Here's another workaround that may let you use a prepared statement:
prepare ps(...) as
select f(c) from (select c from t where [expr] limit 1) as t1-Mike
I was just exploring that. In fact, the problem is not limited to
prepared statements...it's just that they are more likely to run a
seqscan so I noticed it there first. Since I am in a situation where I
need very strict control over when and why f gets executed, I pretty
much have to go with the subquery option.
That said, it just seems that out of result set excecutions of f should
be in violation of something...
Merlin
Import Notes
Resolved by subject fallback
On Thu, 30 Sep 2004, Merlin Moncure wrote:
OK, I have a situation that might be a performance problem, a bug, or an
unavoidable consequence of using prepared statements. The short version
is that I am getting function executions for rows not returned in a
result set when they are in a prepared statement.In other words, I have a query:
select f(t.c) from t where [boolean expr on t] limit 1;
An actual boolean expr on t? Or on a column in t?
because of the limit phrase, obviously, at most one record is returned
and f executes at most once regardless of the plan used (in practice,
sometimes index, sometimes seq_scan.Now, if the same query is executed as a prepared statement,
prepare ps(...) as select f(t.c) from t where [expr] limit 1;
execute ps;now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug? If necessary I should be able
to set up a reproducible example. The easy workaround is to not use
prepared statements in these situations, but I need to be able to
guarantee that f only executes once (even if that means exploring
subqueries).
I think a reproducible example would be good. Simple attempts to duplicate
this on 8.0b2 have failed for me, unless I'm using order by.
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
now, if ps ends up using a index scan on t, everything is ok. However,
if ps does a seqscan, f executes for every row on t examined until the
[expr] criteria is met. Is this a bug?
Works for me.
regression=# create function f(int) returns int as '
regression'# begin
regression'# raise notice ''f(%)'', $1;
regression'# return $1;
regression'# end' language plpgsql;
CREATE FUNCTION
regression=# select f(unique2) from tenk1 where unique2%2 = 1 limit 2;
NOTICE: f(1)
NOTICE: f(3)
f
---
1
3
(2 rows)
regression=# prepare ps as
regression-# select f(unique2) from tenk1 where unique2%2 = 1 limit 2;
PREPARE
regression=# execute ps;
NOTICE: f(1)
NOTICE: f(3)
f
---
1
3
(2 rows)
regression=#
You sure you aren't using f() in the WHERE clause?
regards, tom lane
Stephan Szabo wrote:
On Thu, 30 Sep 2004, Merlin Moncure wrote:
OK, I have a situation that might be a performance problem, a bug,
or an
unavoidable consequence of using prepared statements. The short
version
is that I am getting function executions for rows not returned in a
result set when they are in a prepared statement.An actual boolean expr on t? Or on a column in t?
[...]
I think a reproducible example would be good. Simple attempts to
duplicate
this on 8.0b2 have failed for me, unless I'm using order by.
Note: I confirmed that breaking out the 'where' part of the query into
subquery suppresses the behavior.
Here is the actual query:
select lock_cuid(id), *
from data3.wclaim_line_file
where wcl_vin_no >= '32-MHAB-C-X-7243' and
(wcl_vin_no > '32-MHAB-C-X-7243' or wcl_claim_no >=
001) and
(wcl_vin_no > '32-MHAB-C-X-7243' or wcl_claim_no >
001 or id > 2671212)
order by wcl_vin_no, wcl_claim_no, id
limit 1
Here is the prepared statement declaration:
prepare data3_read_next_wclaim_line_file_1_lock (character varying,
numeric, int8, numeric)
as select lock_cuid(id), *
from data3.wclaim_line_file
where wcl_vin_no >= $1 and
(wcl_vin_no > $1 or wcl_claim_no >= $2) and
(wcl_vin_no > $1 or wcl_claim_no > $2 or id > $3)
order by wcl_vin_no, wcl_claim_no, id limit $4
Here is the plan when it runs lock_cuid repeatedly (aside: disabling
seqscans causes an index plan, but that's not the point):
esp=# explain execute data3_read_next_wclaim_line_file_1_lock
('32-MHAB-C-X-7243', 001, 2671212, 1);
QUERY PLAN
------------------------------------------------------------------------
----------------------------
------------------------------------------------------------------------
----------------------------
--------------------------------
Limit (cost=13108.95..13162.93 rows=21592 width=260)
-> Sort (cost=13108.95..13162.93 rows=21592 width=260)
Sort Key: wcl_vin_no, wcl_claim_no, id
-> Seq Scan on wclaim_line_file (cost=0.00..11554.52
rows=21592 width=260)
Filter: (((wcl_vin_no)::text >= ($1)::text) AND
(((wcl_vin_no)::text > ($1)::text) OR
((wcl_claim_no)::numeric >= $2)) AND (((wcl_vin_no)::text > ($1)::text)
OR ((wcl_claim_no)::numeric
$2) OR ((id)::bigint > $3)))
(5 rows)
Import Notes
Resolved by subject fallback
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
Here is the actual query:
select lock_cuid(id), *
...
order by wcl_vin_no, wcl_claim_no, id
limit 1
Looks like Stephan made the right guess.
Logically the LIMIT executes after the ORDER BY, so the sorted result
has to be formed completely. The fact that we are able to optimize
this in some cases does not represent a promise that we can do it in
all cases. Ergo, it's not a bug.
regards, tom lane