BUG #5228: Execution of prepared query is slow when timestamp parameter is used
The following bug has been logged online:
Bug reference: 5228
Logged by: aftab
Email address: akhangd@hotmail.co.uk
PostgreSQL version: 8.3.8
Operating system: Centos 5
Description: Execution of prepared query is slow when timestamp
parameter is used
Details:
e.g.
prepare testplan (int, int) as
SELECT *
FROM position WHERE
position.POSITION_STATE_ID=$1 AND
position.TARGET_ID=$2
AND position.TIME>='2009-10-30 13:43:32'
ORDER BY position.ID DESC ;
EXPLAIN ANALYZE EXECUTE testplan(2,63)
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------
Sort (cost=166238.58..166370.97 rows=52956 width=297) (actual
time=28.618..28.619 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on "position" (cost=6182.89..147236.51 rows=52956
width=297) (actual time=28.518..28.521 rows=1 loops=1)
Recheck Cond: (target_id = $2)
Filter: (("time" >= '2009-10-30 13:43:32'::timestamp without time
zone) AND (position_state_id = $1))
-> Bitmap Index Scan on position_target_fk (cost=0.00..6169.65
rows=210652 width=0) (actual time=0.624..0.624 rows=1006 loops=1)
Index Cond: (target_id = $2)
Total runtime: 28.763 ms
(9 rows)
When I replace "time" filter with a parameter then the same query takes
longer
prepare testplan (int, int, timestamp) as
SELECT *
FROM position WHERE
position.POSITION_STATE_ID=$1 AND
position.TARGET_ID=$2
AND position.TIME>=$3
ORDER BY position.ID DESC ;
EXPLAIN ANALYZE EXECUTE testplan(2,63,'2009-10-30 13:43:32');
QUERY
PLAN
----------------------------------------------------------------------------
------------------------------------------------------------------
Sort (cost=154260.75..154348.53 rows=35111 width=297) (actual
time=2852.357..2852.358 rows=1 loops=1)
Sort Key: id
Sort Method: quicksort Memory: 25kB
-> Index Scan using position_time on "position" (cost=0.00..146688.94
rows=35111 width=297) (actual time=0.113..2852.338 rows=1 loops=1)
Index Cond: ("time" >= $3)
Filter: ((position_state_id = $1) AND (target_id = $2))
Total runtime: 2852.439 ms
(7 rows)
aftab wrote:
The following bug has been logged online:
Bug reference: 5228
Logged by: aftab
Email address: akhangd@hotmail.co.uk
PostgreSQL version: 8.3.8
Operating system: Centos 5
Description: Execution of prepared query is slow when timestamp
parameter is used
It's far from clear that this is a bug. I've replied to the
pgsql-performance list to direct further discussion there.
Why this is happening: PostgreSQL has more information to use to plan a
query when it knows the actual values of parameters at planning time. It
can use statistics about the distribution of data in the table to make
better choices about query plans.
When you prepare a parameterized query, you're telling PostgreSQL to
plan a query that'll work well for _any_ value in those parameters. It
can't make as much use of statistics about the column(s) involved.
There has been periodic discussion on the mailing list about having an
'PREPARE NOCACHE' or 'EXECUTE REPLAN' command or something like that,
where you can use a parameterized query, but query planning is done each
time the query is executed based on the actual values of the parameters.
I don't know if this has come to anything or if anybody thinks it's even
a good idea.
( If it's thought to be, perhaps a TODO entry would be warranted? It
certainly needs a FAQ entry or an article in [[Category:Performance]] ).
At present, the only way I'm aware of to force re-planning while still
using query parameters is to wrap your parameterized query up in a
PL/PgSQL function that uses EXECUTE ... USING :
http://www.postgresql.org/docs/8.4/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
... but PL/PgSQL has its own performance costs.
One thing that might help your query perform better without making any
changes to it is to give it more work_mem, which might let it use a
different sort or sort more efficiently. You can set work_mem per-user,
per-connection, per-database or globally - see the PostgreSQL documentation.
--
Craig Ringer