Prepared statements considered harmful
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.
A couple of actions to consider:
- Never use prepared statements unless the user has turned them on. (This is
the opposite of the current behavior.)
- Transparently invalidate and regenerate prepared plans more often. This
could be tied to the transaction count, update activity obtained from the
statistics collector, etc.
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
Each of these or similar changes would only solve a subset of the possible
problems. Possibly, we need more knobs to adjust these things. But
something needs to be done.
Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
On Thu, Aug 31, 2006 at 01:56:29PM +0200, Peter Eisentraut wrote:
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
Yeah, it seems to me that many of the benefits of not planning are
overrun by the effects of bad plans.
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
I think this is the best. Some way to specify that you don't want
planning to take place immediately would be good.
One question though: there is a function PQexecParams(). Does this
suffer from the same problem? I imagine most interfaces like
out-of-band parameters (no escaping issues), why do they not use this?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?
That would be immune both to statistics changes and parameter value
changes in certain limits. It would be also a lot more complex too than
a simple plan...
Cheers,
Csaba.
Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy:
How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?
I don't think this could solve one particularly frequent problem which is that
pattern matching queries don't get along with prepared plans if the search
pattern isn't known at planning time.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote:
Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy:
How about "prepared" means really "prepared"... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?I don't think this could solve one particularly frequent problem which is that
pattern matching queries don't get along with prepared plans if the search
pattern isn't known at planning time.
Why not ? I specifically said you would prepare a few sensible plans
based on statistics/expected variations of the statistics, and parameter
value ranges which would trigger different plans.
So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value. The meta-plan I mentioned would
be a collection of plans with rules to choose the right one at run time
based on parameter values and perhaps the current statistics.
This of course would need a lot more preparation time than just prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.
Cheers,
Csaba.
On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote:
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.
I have to respectfully disagree. I have used them to great effect in
many of my projects. In the most extreme case, prepared statements can
provide a 50% reduction or greater in overall query time...this is too
good a benefit to simply discard. I worked on converted isam projects
which would not have been possbile to make efficient without prepared
statements. However you are correct that the planner does often
create wacky plans which can cause disasterous results in some cases.
My major issue is that you cannot supply hints to the query engine.
For example one of my favorite tricks is to paramterize the limit
clause in a query which creates a sliding window over the table for
progressive readahead. Unfortunately the planner assumes 10% which
borks the plan. My work around is to turn off bitmap, seqscan before
plan and turn them on after the prepare.
The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.
merlin
hello everyone ,
i has been add to you guys' mail list by accident, i don't how to
refuse to receive your mails, would you please help me to remove my mail
address form mail group pgsql-hackers@postgresql.org?
i appreciatewhat you will do for me. (my mail address:
tate_zhou@hotmail.com)
thanks .
From: Csaba Nagy <nagy@ecircle-ag.com>
To: Peter Eisentraut <peter_e@gmx.net>
CC: postgres hackers <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] Prepared statements considered harmful
Date: Thu, 31 Aug 2006 14:52:05 +0200On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote:
Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy:
How about "prepared" means really "prepared"... in the sense of
parsed,
analyzed all sensible plans, and save a meta-plan which based on
current
statistics and parameter values chooses one of the considered (and
cached) plans ?I don't think this could solve one particularly frequent problem which
is that
pattern matching queries don't get along with prepared plans if the
search
Show quoted text
pattern isn't known at planning time.
Why not ? I specifically said you would prepare a few sensible plans
based on statistics/expected variations of the statistics, and parameter
value ranges which would trigger different plans.So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value. The meta-plan I mentioned would
be a collection of plans with rules to choose the right one at run time
based on parameter values and perhaps the current statistics.This of course would need a lot more preparation time than just prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.Cheers,
Csaba.---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
On Thu, August 31, 2006 18:56, Peter Eisentraut wrote:
With time, it becomes ever clearer to me that prepared SQL statements are
just
a really bad idea. On some days, it seems like half the performance
problems
in PostgreSQL-using systems are because a bad plan was cached somewhere.
Is there any kind of pattern at all to this problem? Anything
recognizable? A few typical pitfalls?
Without knowing much of the internals, I could imagine [waves hands in
vague gestures] other options--something like recognizing major changes
that upset the cost functions that went into generating a plan, and
invalidating the plan based on those; or noting bad estimates somehow as
they become apparent during execution, and annotating the plan with a
"this assumption was a bad idea" marker so you'll do better next time.
I guess you can't go far wrong if you re-define "prepared" to mean merely
"pre-parsed," but it sounds like such a waste of opportunity...
Jeroen
Peter Eisentraut wrote:
With time, it becomes ever clearer to me that prepared SQL statements are just
a really bad idea. On some days, it seems like half the performance problems
in PostgreSQL-using systems are because a bad plan was cached somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or suboptimal
plans. I wonder if other people have similar experiences.I'd wish that we reconsider when and how prepared statements are used. The
JDBC interface and PL/pgSQL are frequently noticed perpetrators, but the
problem is really all over the place.A couple of actions to consider:
- Never use prepared statements unless the user has turned them on. (This is
the opposite of the current behavior.)- Transparently invalidate and regenerate prepared plans more often. This
could be tied to the transaction count, update activity obtained from the
statistics collector, etc.- Redefine "prepared" to mean "parsed" rather than "parsed and planned".
Each of these or similar changes would only solve a subset of the possible
problems. Possibly, we need more knobs to adjust these things. But
something needs to be done.
Not to mention problems with outdated plans after schema changes. Using
views unplanned (replanned) when used in joins could lead to improved
resulting plans (e.g. if the view contains outer joins itself).
Regards,
Andreas
Am Donnerstag, 31. August 2006 14:52 schrieb Csaba Nagy:
So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value.
OK, why don't you work out an example. Let's look at this query:
SELECT * FROM t1 WHERE a LIKE $1;
What two plans would you prepare?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
How about "prepared" means really "prepared"... in the sense of
parsed, analyzed all sensible plans, and save a meta-plan which
based
on current statistics and parameter values chooses one of the
considered (and cached) plans ?I don't think this could solve one particularly frequent
problem which is that pattern matching queries don't get
along with prepared plans if the search pattern isn't known
at planning time.
I think what we would actually want is knowledge about how
much difference different parameters actually make in plan decision.
(the stats show an even distribution and join correlation)
Then we could prepare the plan when there is not much difference
and postpone planning until we know the parameters when the difference
is big.
OLTP workload typically benefits from prepared plans, and the one plan
is good
for all possible inputs, so imho we cannot just assume all plans need
replanning
for different parameters.
Andreas
Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure:
The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.
Planner hints are a way to address a deficient planner. But neither a
manually hinted planner nor a perfectly good planner will help if the
planning decisions are based on outdated information.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Am Donnerstag, 31. August 2006 15:06 schrieb Jeroen T. Vermeulen:
Is there any kind of pattern at all to this problem? Anything
recognizable? A few typical pitfalls?
If data is not distributed evenly, then any old WHERE foo = $1 is prone to be
the wrong plan for half of the possible values of $1. The more data you have
and the more it changes, the worse this gets.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Am Donnerstag, 31. August 2006 15:18 schrieb Andreas Pflug:
Not to mention problems with outdated plans after schema changes. Using
views unplanned (replanned) when used in joins could lead to improved
resulting plans (e.g. if the view contains outer joins itself).
Views don't contain execution plans. I don't see how this is relevant.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Merlin Moncure wrote:
On 8/31/06, Peter Eisentraut <peter_e@gmx.net> wrote:
With time, it becomes ever clearer to me that prepared SQL statements
are just
a really bad idea. On some days, it seems like half the performance
problems
in PostgreSQL-using systems are because a bad plan was cached
somewhere. I'd
say, in the majority of cases the time you save parsing and planning is
irrelevant compared to the possibly disastrous effects of wrong or
suboptimal
plans. I wonder if other people have similar experiences.I have to respectfully disagree. I have used them to great effect in
many of my projects.
Peter doesn't propose to remove prepared statements as such. They are
certainly of great value, if used carefully and specifically, as in your
case. The problems he's addressing stem from plans _implicitly_ created
and stored.
In the most extreme case, prepared statements can
provide a 50% reduction or greater in overall query time...this is too
good a benefit to simply discard. I worked on converted isam projects
which would not have been possbile to make efficient without prepared
statements. However you are correct that the planner does often
create wacky plans which can cause disasterous results in some cases.My major issue is that you cannot supply hints to the query engine.
I don't believe extending this thread to the we-need-hints issue is a
good idea.
Regards,
Andreas
On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote:
OK, why don't you work out an example. Let's look at this query:
SELECT * FROM t1 WHERE a LIKE $1;
What two plans would you prepare?
if substring($1 from 1 for 1) != '%' then
use plan 1 (see below);
else
use plan 2 (see below);
end if;
Save both plans from below with the meta-plan from above, and call it a
prepared plan.
cnagy=# create table t1 (a text);
CREATE TABLE
cnagy=# insert into t1 select round(10000000 * random()) from
generate_series(1,10000);
INSERT 0 10000
cnagy=# create index idx_t1_a on t1 (a);
CREATE INDEX
cnagy=# analyze verbose t1;
INFO: analyzing "public.t1"
INFO: "t1": scanned 55 of 55 pages, containing 10000 live rows and 0
dead rows; 3000 rows in sample, 10000 estimated total rows
ANALYZE
cnagy=# explain select a from t1 where a like '121%';
QUERY PLAN
------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=2.06..27.63 rows=10 width=10)
Filter: (a ~~ '121%'::text)
-> Bitmap Index Scan on idx_t1_a (cost=0.00..2.06 rows=10 width=0)
Index Cond: ((a >= '121'::text) AND (a < '122'::text))
(4 rows)
cnagy=# explain select a from t1 where a like '%121';
QUERY PLAN
------------------------------------------------------
Seq Scan on t1 (cost=0.00..180.00 rows=80 width=10)
Filter: (a ~~ '%121'::text)
(2 rows)
Cheers,
Csaba.
Am Donnerstag, 31. August 2006 15:36 schrieb Csaba Nagy:
On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote:
OK, why don't you work out an example. Let's look at this query:
SELECT * FROM t1 WHERE a LIKE $1;
What two plans would you prepare?
if substring($1 from 1 for 1) != '%' then
use plan 1 (see below);
else
use plan 2 (see below);
end if;
Note that plan 1 can only be created if you know the actual value for $1.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
On Thu, 2006-08-31 at 15:49, Peter Eisentraut wrote:
Note that plan 1 can only be created if you know the actual value for $1.
Why would that be so ? The plan can contain functions of $1 (both
constants in plan 1 are a function of $1).
Cheers,
Csaba
On Aug 31, 2006, at 9:25 AM, Peter Eisentraut wrote:
Am Donnerstag, 31. August 2006 15:05 schrieb Merlin Moncure:
The proposal to supply hints to statements and functions has been
voted down several times due to the argument that it is better to fix
the planner. I think supplying hints does fix the planner, and is a
balanced solution.Planner hints are a way to address a deficient planner. But neither a
manually hinted planner nor a perfectly good planner will help if the
planning decisions are based on outdated information.
I don't chime in very often, but I do think the refusal to
incorporate hints into the planner system is fantastically stubborn
and nonsensical. I whole-heartedly agree that it is _better_ to fix
the planner, but many of us have production systems and can't just go
check out CVS HEAD to address our day-to-day issues and we suffer
from this decision.
There are many databases out there with better planners than
PostgreSQL -- likely there will always be. Even those databases have
query planner hints. Why? Because the authors of those database had
the humility to realize that the planner they designed wasn't perfect
and that people _still_ need their database to perform well despite a
non-optimal query plan here and there.
A good query planner hint system would act as a catalyst to the
improvement of the current query planner as users could share their
complex queries and associated improved query plans through hinting.
I like Postgres a lot, I think the people that work on it are very
very sharp. I do feel that the consistent refusal to allow query
hinting to be introduced demonstrates an unhealthy amount of hubris
that, in the end, negatively impacts users.
While Postgres is missing a ton of other needed features, I rarely
see the attitude that they are _unwanted_. Instead I see the "if it
is important to you, go build it" attitude which is what I would
expect in an open source project.
// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
On Aug 31, 2006, at 8:52 , Csaba Nagy wrote:
This of course would need a lot more preparation time than just
prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.
If prepared statements become more expensive to create, then it would
make more sense for them to persist across sessions. All of an
application's prepared statements could be cached.
-M