SQL Query Optimization

Started by Dav Colemanalmost 24 years ago4 messageshackers
Jump to latest
#1Dav Coleman
dav@serve.com

Hello,

I am using postgresql to house chemical informatics data which consists
of
several interlinked tables with tens of thousands (maximum) of rows.
When
doing search queries against these tables (which always requires
multiple
joins) I have noticed that the semantically equivalent SQL queries can
differ
vastly in speed performance depending on the order of clauses ANDed
together ( "WHERE cond1 AND cond2" takes forever, but "WHERE cond2
AND cond1" comes right back).

So it appears I need to do some pre-optimization of the SQL query
generated
by the user before submitting it to postgresql in order to guarantee (or
at least
increase the likelihood of) the fastest results. I've tried STFW and
RTFM but
haven't found any good pointers on where to start with this, although I
feel that
there must be some published algorithms or theories. Can anyone point me
to
a URL or other source to get me on my way?

Also, I wonder if this sort of query optimization is built into other
databases
such as Oracle?

I did find this URL: http://redbook.cs.berkeley.edu/lec7.html
which seems to be interesting, but honestly I'm far from a DB expert so
I
can't follow most of it, and I can't tell if it is talking about
optimization that
can be done in application space (query rewrite) or something that has
to
be done in the database engine itself. I'm going to try to find the book
it
references though.

Basically I feel a bit in over my head, which is ok but I don't want to
waste
time paddling in the wrong direction, so I'm hoping someone can
recognize
where I need to look and nudge me in that direction. Maybe I just need
proper terminology to plug into google.

Thanks,
Dav

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dav Coleman (#1)
Re: SQL Query Optimization

Dav Coleman <dav@serve.com> writes:

I have noticed that the semantically equivalent SQL queries can
differ
vastly in speed performance depending on the order of clauses ANDed
together ( "WHERE cond1 AND cond2" takes forever, but "WHERE cond2
AND cond1" comes right back).

Could we see a specific example?

It would also be useful to know what PG version you are using, whether
you've VACUUM ANALYZEd the tables, and what EXPLAIN has to say about
your query.

regards, tom lane

#3Dav Coleman
dav@danger-island.com
In reply to: Tom Lane (#2)
Re: [SQL] SQL Query Optimization

I'm sorry, I realized after posting this that it went to the wrong
list, I resent it to pgsql-sql instead.

But basically I haven't done any ANALYZE or EXPLAIN yet because of the
fact that the order -is- making a difference so it can't be executing
the same query inside the database engine. Given that, I figured I would
jsut look for theories on how to rewrite the queries before submitting first.

btw, we are using postgresql 7.1.2 compiled from source on rh linux 7.0.

I also might not have been very clear about the fact that the user is
basically constructing the query dynamically in the application, so it's
not a matter of just optimizing any specifc query, but any possible query.

Tom Lane [tgl@sss.pgh.pa.us] wrote:

Dav Coleman <dav@serve.com> writes:

I have noticed that the semantically equivalent SQL queries can
differ
vastly in speed performance depending on the order of clauses ANDed
together ( "WHERE cond1 AND cond2" takes forever, but "WHERE cond2
AND cond1" comes right back).

Could we see a specific example?

It would also be useful to know what PG version you are using, whether
you've VACUUM ANALYZEd the tables, and what EXPLAIN has to say about
your query.

regards, tom lane

--
Dav Coleman
http://www.danger-island.com/dav/

--
Dav Coleman
http://www.danger-island.com/dav/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dav Coleman (#3)
Re: [SQL] SQL Query Optimization

Dav Coleman <dav@danger-island.com> writes:

But basically I haven't done any ANALYZE or EXPLAIN yet because of the
fact that the order -is- making a difference so it can't be executing
the same query inside the database engine.

If you haven't ever done VACUUM ANALYZE then the planner is flying
completely blind as to table sizes and data distributions. This would
(among other things) very possibly allow different plans to be estimated
as exactly the same cost --- since all the cost numbers will be based on
exactly the same default statistics. So it's not surprising that you'd
get an arbitrary choice of plans depending on trivial details like
WHERE clause order.

regards, tom lane