Date Parameter To Query Confusing Optimizer
I have a JDBC-based application which passes date/time parameters using JDBC query parameters, which is performing very badly (ie. doing full table scans). In an effort to try to narrow down the problem, I am taking the query and running it in interactive SQL mode, but changing the date parameters (which are BETWEEN ? and ? clauses) and placing a date literal instead, using the "date '2011-01-01' syntax. When I do this, the query runs instantly, obviously using indices on the tables involved.
Now, I suspect the optimizer is taking the wrong path based on the usage of query parameters. I'm pretty surprised by this, because I would think the optimizer would do the same thing for any query parameter, however it arrived. Unfortunately for this situation, the code which forms the query actually is used for several database back-ends, and I don't want to change it to use postgres-specific query syntax if I can help it. I'm trying to avoid this at all costs.
What's really weird, and makes me suspect an optimizer or protocol bug, is that if I place "protocolVersion=2" as a JDBC parameter, the problem goes away. That is, I'm seeing the query take <1sec as opposed to >3min when using the legacy protocol. I stumbled on this based on reading that the older protocol sent everything as string, and inferred the type on the server side.
Now, that's a reasonable workaround, but it does seem like I've hit either a Postgres server bug, optimizer or other, or a JDBC bug of some kind.
Any help in narrowing down the problem is appreciated!
On Mon, Jan 3, 2011 at 2:48 PM, Kurt Westerfeld <kwesterfeld@novell.com> wrote:
I have a JDBC-based application which passes date/time parameters using JDBC
query parameters, which is performing very badly (ie. doing full table
scans). In an effort to try to narrow down the problem, I am taking the
query and running it in interactive SQL mode, but changing the date
parameters (which are BETWEEN ? and ? clauses) and placing a date literal
instead, using the "date '2011-01-01' syntax. When I do this, the query
runs instantly, obviously using indices on the tables involved.
Try using PREPARE to simulate your issue. We ran into a similar issue
using PHP and prepared statements, where the plan choices were
occasionally abysmal depending what the filters were and their
relative distributions within the table.
http://www.postgresql.org/docs/current/interactive/sql-prepare.html
On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:
I have a JDBC-based application which passes date/time parameters using JDBC query parameters, which is performing very badly (ie. doing full table scans). In an effort to try to narrow down the problem, I am taking the query and running it in interactive SQL mode, but changing the date parameters (which are BETWEEN ? and ? clauses) and placing a date literal instead, using the "date '2011-01-01' syntax. When I do this, the query runs instantly, obviously using indices on the tables involved.
Now, I suspect the optimizer is taking the wrong path based on the usage of query parameters. I'm pretty surprised by this, because I would think the optimizer would do the same thing for any query parameter, however it arrived. Unfortunately for this situation, the code which forms the query
The problem here is that JDBC uses prepared statements for parameterised queries. By the very definition of a prepared statement the query plan gets stored before the parameter values are known, which forces the database to use a query plan that would work for every possible value of those parameters.
Thus you end up with a generic query plan.
This isn't often a problem, but if a significant number of your possible parameter values exist in a high percentage of your table rows, then chances are you'll end up with a plan with a sequential scan.
You didn't tell what version of Postgres you're using - I recall recent versions (since 8.3?) are smarter about this particular scenario.
Alban Hertroys
--
Screwing up is an excellent way to attach something to the ceiling.
!DSPAM:737,4d22ca9a11548321074132!
"By the very definition of a prepared statement the query plan gets stored before the parameter values are known"
Is this true for all databases? It would seem to me that this approach would always lead to the wrong query plan, especially in the case I am testing where the selectivity is very low for the BETWEEN clause I am using. The trouble is, the BETWEEN clause can also yield a highly selective result--which is basically the case as the database "ages". See, the date/timestamp column in this case is a good choice as it is a database activity table, a temporal database. The trouble is at times there is a huge amount of activity, yielding very low selectivity.
Maybe it would be best for me to just disable the indices on the timestamp fields and just use/expect my join would yield the best results.
But, back on topic, to me it seems wrong that choice of prepared vs. non-prepared, and protocol 2 vs. 3, would influence the optimizer so profoundly. I would think it's got to be something I can tune, that prepared statement parameters be considered for execution plan. Is there such a setting?
Alban Hertroys <dalroi@solfertje.student.utwente.nl> 1/4/2011 2:21 AM >>>
On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:
I have a JDBC-based application which passes date/time parameters using JDBC query parameters, which is performing very badly (ie. doing full table scans). In an effort to try to narrow down the problem, I am taking the query and running it in interactive SQL mode, but changing the date parameters (which are BETWEEN ? and ? clauses) and placing a date literal instead, using the "date '2011-01-01' syntax. When I do this, the query runs instantly, obviously using indices on the tables involved.
Now, I suspect the optimizer is taking the wrong path based on the usage of query parameters. I'm pretty surprised by this, because I would think the optimizer would do the same thing for any query parameter, however it arrived. Unfortunately for this situation, the code which forms the query
The problem here is that JDBC uses prepared statements for parameterised queries. By the very definition of a prepared statement the query plan gets stored before the parameter values are known, which forces the database to use a query plan that would work for every possible value of those parameters.
Thus you end up with a generic query plan.
This isn't often a problem, but if a significant number of your possible parameter values exist in a high percentage of your table rows, then chances are you'll end up with a plan with a sequential scan.
You didn't tell what version of Postgres you're using - I recall recent versions (since 8.3?) are smarter about this particular scenario.
Alban Hertroys
--
Screwing up is an excellent way to attach something to the ceiling.
!DSPAM:1213,4d22ca9211544532215324!
Can You try "...BETWEEN ?::date and ?::date ..." syntax or send
statement causing problems?
Kind regards,
Radosław Smogura
On Tue, 04 Jan 2011 07:27:42 -0700, "Kurt Westerfeld"
<kwesterfeld@novell.com> wrote:
Show quoted text
"By the very definition of a prepared statement the query plan gets
stored before the parameter values are known"Is this true for all databases? It would seem to me that this
approach would always lead to the wrong query plan, especially in the
case I am testing where the selectivity is very low for the BETWEEN
clause I am using. The trouble is, the BETWEEN clause can also yield
a
highly selective result--which is basically the case as the database
"ages". See, the date/timestamp column in this case is a good choice
as it is a database activity table, a temporal database. The trouble
is at times there is a huge amount of activity, yielding very low
selectivity.Maybe it would be best for me to just disable the indices on the
timestamp fields and just use/expect my join would yield the best
results.But, back on topic, to me it seems wrong that choice of prepared vs.
non-prepared, and protocol 2 vs. 3, would influence the optimizer so
profoundly. I would think it's got to be something I can tune, that
prepared statement parameters be considered for execution plan. Is
there such a setting?Alban Hertroys 1/4/2011 2:21 AM >>>
On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:
I have a JDBC-based application which passes date/time parameters
using JDBC query parameters, which is performing very badly (ie.
doing
full table scans). In an effort to try to narrow down the problem, I
am taking the query and running it in interactive SQL mode, but
changing the date parameters (which are BETWEEN ? and ? clauses) and
placing a date literal instead, using the "date '2011-01-01' syntax.
When I do this, the query runs instantly, obviously using indices on
the tables involved.Now, I suspect the optimizer is taking the wrong path based on the
usage of query parameters. I'm pretty surprised by this, because I
would think the optimizer would do the same thing for any query
parameter, however it arrived. Unfortunately for this situation, the
code which forms the queryThe problem here is that JDBC uses prepared statements for
parameterised queries. By the very definition of a prepared statement
the query plan gets stored before the parameter values are known,
which forces the database to use a query plan that would work for
every possible value of those parameters.Thus you end up with a generic query plan.
This isn't often a problem, but if a significant number of your
possible parameter values exist in a high percentage of your table
rows, then chances are you'll end up with a plan with a sequential
scan.You didn't tell what version of Postgres you're using - I recall
recent versions (since 8.3?) are smarter about this particular
scenario.Alban Hertroys
--
Screwing up is an excellent way to attach something to the ceiling.!DSPAM:1213,4d22ca9211544532215324!
First, sorry I didn't mention that I am using PostgreSQL 9, but the problem existed also on 8.4.
As for the BETWEEN clause, I'm using hibernate and don't want to put a database-specific SQL query in place.
I also decided to drop a few indexes, which were probably causing the optimizer to choose the wrong plan. This "fixes" the problem also, but I think in general the "bug" here is that the query plan is bound too early in prepared statement execution.
Radosław Smogura<rsmogura@softperience.eu> 1/4/2011 9:48 AM >>>
Can You try "...BETWEEN ?::date and ?::date ..." syntax or send
statement causing problems?
Kind regards,
Radosław Smogura
On Tue, 04 Jan 2011 07:27:42 -0700, "Kurt Westerfeld"
<kwesterfeld@novell.com> wrote:
Show quoted text
"By the very definition of a prepared statement the query plan gets
stored before the parameter values are known"Is this true for all databases? It would seem to me that this
approach would always lead to the wrong query plan, especially in the
case I am testing where the selectivity is very low for the BETWEEN
clause I am using. The trouble is, the BETWEEN clause can also yield
a
highly selective result--which is basically the case as the database
"ages". See, the date/timestamp column in this case is a good choice
as it is a database activity table, a temporal database. The trouble
is at times there is a huge amount of activity, yielding very low
selectivity.Maybe it would be best for me to just disable the indices on the
timestamp fields and just use/expect my join would yield the best
results.But, back on topic, to me it seems wrong that choice of prepared vs.
non-prepared, and protocol 2 vs. 3, would influence the optimizer so
profoundly. I would think it's got to be something I can tune, that
prepared statement parameters be considered for execution plan. Is
there such a setting?Alban Hertroys 1/4/2011 2:21 AM >>>
On 3 Jan 2011, at 23:48, Kurt Westerfeld wrote:
I have a JDBC-based application which passes date/time parameters
using JDBC query parameters, which is performing very badly (ie.
doing
full table scans). In an effort to try to narrow down the problem, I
am taking the query and running it in interactive SQL mode, but
changing the date parameters (which are BETWEEN ? and ? clauses) and
placing a date literal instead, using the "date '2011-01-01' syntax.
When I do this, the query runs instantly, obviously using indices on
the tables involved.Now, I suspect the optimizer is taking the wrong path based on the
usage of query parameters. I'm pretty surprised by this, because I
would think the optimizer would do the same thing for any query
parameter, however it arrived. Unfortunately for this situation, the
code which forms the queryThe problem here is that JDBC uses prepared statements for
parameterised queries. By the very definition of a prepared statement
the query plan gets stored before the parameter values are known,
which forces the database to use a query plan that would work for
every possible value of those parameters.Thus you end up with a generic query plan.
This isn't often a problem, but if a significant number of your
possible parameter values exist in a high percentage of your table
rows, then chances are you'll end up with a plan with a sequential
scan.You didn't tell what version of Postgres you're using - I recall
recent versions (since 8.3?) are smarter about this particular
scenario.Alban Hertroys
--
Screwing up is an excellent way to attach something to the ceiling.!DSPAM:1213,4d22ca9211544532215324!
In response to "Kurt Westerfeld" <kwesterfeld@novell.com>:
Radosław Smogura<rsmogura@softperience.eu> 1/4/2011 9:48 AM >>>
Can You try "...BETWEEN ?::date and ?::date ..." syntax or send
statement causing problems?As for the BETWEEN clause, I'm using hibernate and don't want to put a database-specific SQL query in place.
You could do "BEWTEEN CAST(? AS DATE) AND CAST(? AS DATE)" and it wouldn't
be database-specific.
You might also want to consider top-posting. I'm not sure why I read enough
of this to understand it, as I usually get confused an frustrated when
things have been top posted and stop reading.
--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/