reducing number of ANDs speeds up query

Started by T. E. Lawrenceabout 13 years ago17 messagesgeneral
Jump to latest
#1T. E. Lawrence
t.e.lawrence@icloud.com

Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute), because the value column is not indexed (contains items longer than 8K).

However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few (table_b contains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND b.value=..." on the (very small) result?

Thank you and kind regards,
T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Amit Kapila
amit.kapila16@gmail.com
In reply to: T. E. Lawrence (#1)
Re: reducing number of ANDs speeds up query

On Saturday, January 12, 2013 7:17 AM T. E. Lawrence wrote:

Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute), because the value column is not indexed (contains items longer than 8K).

However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few (table_b contains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND b.value=..." on the (very small) result?

You can try once with below query:
Select * from (SELECT a.id,b.value FROM table_a a, table_b b WHERE ... AND ... ) X where X.value=...;

If this doesn't work can you send the Explain .. output for both queries(the query you are using and the query I have suggested)

With Regards,
Amit Kapila.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Alban Hertroys
haramrae@gmail.com
In reply to: T. E. Lawrence (#1)
Re: reducing number of ANDs speeds up query

You really ought to include the output of EXPLAIN ANALYZE in cases such as
these (if it doesn't already point you to the culprit).

Most likely you'll find that the last condition added a sequential scan to
the query plan, which can have several causes/reasons. Are the estimated
#rows close to the actual #rows? Is b.value indexed? How selective is the
value you're matching it against (is it uncommon or quite common)? Etc, etc.

Meanwhile, it looks like most of your AND's are involved in joining tables
a and b. Perhaps it helps to use an explicit join instead of an implicit
one?

On 12 January 2013 02:47, T. E. Lawrence <t.e.lawrence@icloud.com> wrote:

Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND
b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait
for it to end, but more than a minute), because the value column is not
indexed (contains items longer than 8K).

However the previous conditions "WHERE ... AND ... AND" should have
already reduced the candidate rows to just a few (table_b contains over 50m
rows). And indeed, removing the last "AND b.value=..." speeds the query to
just a millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ...
AND ... AND" and then the last "AND b.value=..." on the (very small) result?

Thank you and kind regards,
T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

#4Eduardo Morras
emorrasg@yahoo.es
In reply to: T. E. Lawrence (#1)
Re: reducing number of ANDs speeds up query

On Sat, 12 Jan 2013 02:47:26 +0100
"T. E. Lawrence" <t.e.lawrence@icloud.com> wrote:

Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute), because the value column is not indexed (contains items longer than 8K).

You can construct your own home made index, add a new column in table b, with the first 8-16 bytes/chars of b.value, use this column on your query and refine to a complete b.value. Don't forget tocCreate an index for it too. You can keep this column updated with a trigger.

Perhaps you can use a partial index for b.value column, i never used that feature so documentation/others can point you how to do it.

However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few (table_b contains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND b.value=..." on the (very small) result?

Thank you and kind regards,
T.

---   ---
Eduardo Morras <emorrasg@yahoo.es>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5T. E. Lawrence
t.e.lawrence@icloud.com
In reply to: Amit Kapila (#2)
Re: reducing number of ANDs speeds up query

On 12.01.2013, at 07:10, Amit kapila <amit.kapila@huawei.com> wrote:

You can try once with below query:
Select * from (SELECT a.id,b.value FROM table_a a, table_b b WHERE ... AND ... ) X where X.value=...;

If this doesn't work can you send the Explain .. output for both queries(the query you are using and the query I have suggested)

With Regards,
Amit Kapila.

Hi and thank you!

I will try this and let you know!

T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6T. E. Lawrence
t.e.lawrence@icloud.com
In reply to: Alban Hertroys (#3)
Re: reducing number of ANDs speeds up query

Hi and thank you for your notes!

You really ought to include the output of EXPLAIN ANALYZE in cases such as these (if it doesn't already point you to the culprit).

I'll do so, it takes quite long...

Most likely you'll find that the last condition added a sequential scan to the query plan,

Exactly! EXPLAIN says so.

which can have several causes/reasons. Are the estimated #rows close to the actual #rows?

Yes, this is the problem. I read that in such cases indexes are not read. However if the previous conditions are executed first, the result is zero or just a few rows and there is no need seq scan the whole values column.

Is b.value indexed?

No, because it contains too long values for indexing.

How selective is the value you're matching it against (is it uncommon or quite common)? Etc, etc.

Zero to a few.

Meanwhile, it looks like most of your AND's are involved in joining tables a and b. Perhaps it helps to use an explicit join instead of an implicit one?

I am not quite sure what this means, but will read about it.

There were 2 more suggestions, I'll try now everything and write back.

Thank you very much for your help!
T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7T. E. Lawrence
t.e.lawrence@icloud.com
In reply to: Eduardo Morras (#4)
Re: reducing number of ANDs speeds up query

Hi and thank you!

On 12.01.2013, at 11:52, Eduardo Morras <emorrasg@yahoo.es> wrote:

With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute), because the value column is not indexed (contains items longer than 8K).

You can construct your own home made index, add a new column in table b, with the first 8-16 bytes/chars of b.value, use this column on your query and refine to a complete b.value. Don't forget tocCreate an index for it too. You can keep this column updated with a trigger.

Yes, I have been considering this in a slightly different way. value contains short and long values (mixed). Only the short values are queried directly. The long values are queried in a tsearch column or in an external Sphinx Search. So probably I should split the short and long values and then index the short values.

It is nevertheless slightly annoying that I cannot make the query do the value thing last...

Perhaps you can use a partial index for b.value column, i never used that feature so documentation/others can point you how to do it.

Did not know of them, reading. Thank you!
T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Tony Theodore
tony.theodore@gmail.com
In reply to: T. E. Lawrence (#1)
Re: reducing number of ANDs speeds up query

On 12/01/2013, at 12:47 PM, T. E. Lawrence <t.e.lawrence@icloud.com> wrote:

Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute), because the value column is not indexed (contains items longer than 8K).

However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few (table_b contains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND b.value=..." on the (very small) result?

Have you looked at the WITH clause [1,2]:

WITH filtered as (SELECT table_a.id, b.value as val FROM table_a a, table_b b WHERE … AND …)
SELECT * FROM filtered WHERE filtered.val=…

It evaluates the the first SELECT once, then applies the second SELECT to the first in memory (at least that's the way I think about them).

Cheers,

Tony

[1]: http://www.postgresql.org/docs/9.2/static/queries-with.html
[2]: http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-WITH

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Alban Hertroys
haramrae@gmail.com
In reply to: T. E. Lawrence (#6)
Re: reducing number of ANDs speeds up query

On 12 January 2013 12:41, T. E. Lawrence <t.e.lawrence@icloud.com> wrote:

Hi and thank you for your notes!

You really ought to include the output of EXPLAIN ANALYZE in cases such

as these (if it doesn't already point you to the culprit).

I'll do so, it takes quite long...

Most likely you'll find that the last condition added a sequential scan

to the query plan,

Exactly! EXPLAIN says so.

which can have several causes/reasons. Are the estimated #rows close to

the actual #rows?

Yes, this is the problem. I read that in such cases indexes are not read.
However if the previous conditions are executed first, the result is zero
or just a few rows and there is no need seq scan the whole values column.

You mean they don't match, do you?

The database doesn't know what you know and its making the wrong decision
based on incorrect data.

The database won't use an index if it thinks that there aren't many rows to
check against a condition. Most likely (the results from explain analyze
would tell) the database thinks there are much fewer rows in table b than
there actually are.

You'll probably want to read about database maintenance for Postgres and
how to keep its statistics up to date. Autovacuum may need some tuning or
you need to run manual VACUUM more frequently.
In fact, run VACUUM now and see if the problem goes away.

You'll usually also want to run VACUUM after a large batch job.

Meanwhile, it looks like most of your AND's are involved in joining

tables a and b. Perhaps it helps to use an explicit join instead of an
implicit one?

I am not quite sure what this means, but will read about it.

You're currently using implicit joins by combining your join conditions in
the WHERE clause of your query, like this:
SELECT *
FROM a, b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND b.value = 'yadayada';

You can also explicitly put your join conditions with the joins, like so:
SELECT *
FROM a INNER JOIN b ON (a.col1 = b.col1 AND a.col2 = b.col2)
WHERE b.value = 'yadayada';

You explicitly tell the database that those are the conditions to be joined
on and that the remaining conditions are filters on the result set. With
just two tables the need for such isn't that obvious, but with more tables
it quickly becomes difficult to see what condition in an implicit join is
part of the joins and which is the result set filter. With explicit joins
that's much clearer.
It wouldn't be the first time that I rewrite a query to use explicit joins,
only to find that the original query was incorrect.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

#10T. E. Lawrence
t.e.lawrence@icloud.com
In reply to: Alban Hertroys (#9)
Re: reducing number of ANDs speeds up query RESOLVED

RESOLVED
--
Dear all,

Thank you for your great help and multiple advices.

I discovered the problem and I have to say that it is very stupid and strange.

Here is what happened.

From all advices I tried first partial index. The index was built and there was no change in the speed of the slow query. Which depressed me greatly. In the midst of my depression I ran VACUUM ANALYZE which took about 10 hours (the db is about 170 GB and has more than 500m rows in some tables, running on a 4 core, 8 GB RAM dedicated PostgreSQL cloud server). Towards the end of VACUUM ANALYZE I was playing with some queries and suddenly the slow query became fast! (which partially defeated the notion that one does not need ANALYZE upon CREATE INDEX) And I said "Aha!".

So I decided to try the whole thing properly from the beginning. Dropped the index, did again VACUUM ANALYZE and tried the queries, in order to measure them without and with index. Surprise! - the slow query was blazing fast. The previous indexes (not the dropped partial index) were properly used. All was fine.

Which makes me think that, as we grew the database more than 250 times in size over a 2-3 months period, relying on autovacuum (some tables grew from 200k to 50m records, other from 1m to 500m records), the autovacuum has either let us down or something has happen to the ANALYZE.

Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

Tank you and all the best,
T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Jeff Janes
jeff.janes@gmail.com
In reply to: T. E. Lawrence (#10)
Re: reducing number of ANDs speeds up query RESOLVED

On Monday, January 14, 2013, T. E. Lawrence wrote:

RESOLVED
--
Dear all,

Thank you for your great help and multiple advices.

I discovered the problem and I have to say that it is very stupid and
strange.

Here is what happened.

...

So I decided to try the whole thing properly from the beginning. Dropped
the index, did again VACUUM ANALYZE and tried the queries, in order to
measure them without and with index. Surprise! - the slow query was blazing
fast. The previous indexes (not the dropped partial index) were properly
used. All was fine.

Which makes me think that, as we grew the database more than 250 times in
size over a 2-3 months period, relying on autovacuum (some tables grew from
200k to 50m records, other from 1m to 500m records), the autovacuum has
either let us down or something has happen to the ANALYZE.

What do pg_stat_user_tables tell you
about last_vacuum, last_autovacuum, last_analyze, last_autoanalyze ?

Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

No. For example, if you constantly do things that need an access exclusive
lock, then autovac will keep getting interrupted and never finish.

Cheers,

Jeff

#12T. E. Lawrence
t.e.lawrence@icloud.com
In reply to: Jeff Janes (#11)
Re: reducing number of ANDs speeds up query RESOLVED

On 15.01.2013, at 05:45, Jeff Janes <jeff.janes@gmail.com> wrote:

Which makes me think that, as we grew the database more than 250 times in size over a 2-3 months period, relying on autovacuum (some tables grew from 200k to 50m records, other from 1m to 500m records), the autovacuum has either let us down or something has happen to the ANALYZE.

What do pg_stat_user_tables tell you about last_vacuum, last_autovacuum, last_analyze, last_autoanalyze?

relname | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
elements | 2013-01-14 16:14:48.963573+00 | | 2013-01-14 16:19:48.651155+00 | 2012-12-12 12:23:31.308877+00

This is the problematic table. I think it is clear. Last autovacuum has been never and last autoanalyze has been mid-December.

Thank you!

Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

No. For example, if you constantly do things that need an access exclusive lock, then autovac will keep getting interrupted and never finish.

I see.

So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?

Cheers,

Jeff

Thank you very much,
T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: T. E. Lawrence (#12)
Re: reducing number of ANDs speeds up query RESOLVED

"T. E. Lawrence" <t.e.lawrence@icloud.com> <CAMkU=1y6UuxPYbf_ky8DVDsJi=g=uQ1t0B6kwLEtdc7NLxB_-Q@mail.gmail.com> writes:

On 15.01.2013, at 05:45, Jeff Janes <jeff.janes@gmail.com> wrote:

Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

No. For example, if you constantly do things that need an access exclusive lock, then autovac will keep getting interrupted and never finish.

I see.

So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?

Data import as such, no matter how "heavy", shouldn't be a problem.
The question is what are you doing that takes access-exclusive table
locks frequently, and do you really need to do that?

A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
would be the most likely candidates for taking exclusive table locks.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#13)
Re: reducing number of ANDs speeds up query RESOLVED

On Tue, Jan 15, 2013 at 7:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"T. E. Lawrence" <t.e.lawrence@icloud.com> <CAMkU=1y6UuxPYbf_ky8DVDsJi=g=uQ1t0B6kwLEtdc7NLxB_-Q@mail.gmail.com> writes:

On 15.01.2013, at 05:45, Jeff Janes <jeff.janes@gmail.com> wrote:

Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

No. For example, if you constantly do things that need an access exclusive lock, then autovac will keep getting interrupted and never finish.

I see.

So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?

Data import as such, no matter how "heavy", shouldn't be a problem.
The question is what are you doing that takes access-exclusive table
locks frequently, and do you really need to do that?

A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
would be the most likely candidates for taking exclusive table locks.

But that isn't an exhaustive list--weaker locks will also cancel
autovacuum, for example I think the SHARE lock taken by CREATE INDEX
will and the even weaker one taken by CREATE INDEX CONCURRENTLY will
too.

But will all of those cancel auto-analyze as well as auto-vac? I
guess they will because they use the same lock level.

T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
etc.), the default server log settings will log both the cancel and
the command triggering the cancel. So if you are running an up to
date server, you can just look in the logs to see what is happening.

Cheers,

Jeff

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15T. E. Lawrence
t.e.lawrence@icloud.com
In reply to: Tom Lane (#13)
Re: reducing number of ANDs speeds up query RESOLVED

On 15.01.2013, at 16:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"T. E. Lawrence" <t.e.lawrence@icloud.com> <CAMkU=1y6UuxPYbf_ky8DVDsJi=g=uQ1t0B6kwLEtdc7NLxB_-

So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?

Data import as such, no matter how "heavy", shouldn't be a problem.
The question is what are you doing that takes access-exclusive table
locks frequently, and do you really need to do that?

A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
would be the most likely candidates for taking exclusive table locks.

regards, tom lane

Thank you for this.

We will have to research into this, it will take a while.

T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16T. E. Lawrence
t.e.lawrence@icloud.com
In reply to: Jeff Janes (#14)
Re: reducing number of ANDs speeds up query RESOLVED

On 15.01.2013, at 17:32, Jeff Janes <jeff.janes@gmail.com> wrote:

T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
etc.), the default server log settings will log both the cancel and
the command triggering the cancel. So if you are running an up to
date server, you can just look in the logs to see what is happening.

Cheers,

Jeff

That's interesting, I'll check it. Thank you.
T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17Eduardo Morras
emorrasg@yahoo.es
In reply to: T. E. Lawrence (#16)
Re: reducing number of ANDs speeds up query RESOLVED

On Wed, 16 Jan 2013 23:42:23 +0100
"T. E. Lawrence" <t.e.lawrence@icloud.com> wrote:

On 15.01.2013, at 17:32, Jeff Janes <jeff.janes@gmail.com> wrote:

T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
etc.), the default server log settings will log both the cancel and
the command triggering the cancel. So if you are running an up to
date server, you can just look in the logs to see what is happening.

Cheers,

Jeff

That's interesting, I'll check it. Thank you.

And now the million dollars question, do you have any transaction in 'IDLE IN TRANSACTION' state?

If yes, there's one possible problem. For example, we used (note the past) a message queue middleware which uses a postgres db for message passing, but it keeps the transaction in 'IDLE IN TRANSACTION' state, so if i do a select * from message_tbl i get 1-5 rows, but the tbl used (note again the past) several Gb of hd space because autovacuum couldn't clean. Of course once discovered the problem the middleware was changed by another (home-built) one which don't keeps the IDLE IN TRANSACTION state.

T.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

---   ---
Eduardo Morras <emorrasg@yahoo.es>

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general