index not used in joins

Started by Sebastian Böckover 21 years ago5 messagesgeneral

Jump to latest

Sebastian Böck

sebastianboeck@freenet.de

over 21 years ago

Hello all,

i have a problem with index usage and joins.
Attached is some SQL demonstrating my problem;

Why is the index only used in the 2nd query?

Can anybody explain me how to avoid/fix this.

Thanks in advance

Sebastian

Richard Huxton

dev@archonet.com

over 21 years ago

In reply to: Sebastian Böck (#1)

Re: index not used in joins

Sebastian Bï¿½ck wrote:

Richard Huxton wrote:

Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.

Sure, here it is.

Thanks. (PS - remember to cc the list too).

EXPLAIN ANALYZE SELECT * FROM v;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.00..263.12 rows=116 width=20) (actual
time=5.171..109.910 rows=1020 loops=1)
Join Filter: (("inner"."version" = "outer"."version") OR
("inner".approved IS NOT NULL))
-> Seq Scan on users u (cost=0.00..1.01 rows=1 width=12) (actual
time=0.005..0.009 rows=1 loops=1)
-> Index Scan using test_ on test t (cost=0.00..155.74 rows=7092
width=20) (actual time=0.012..64.873 rows=21000 loops=1)
Index Cond: (t.datum <= "outer".datum)
Total runtime: 111.879 ms

EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON
t.datum <= u.datum AND (t.version = '999' OR t.approved IS NOT NULL);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.00..7.78 rows=133 width=20) (actual
time=0.035..7.733 rows=1020 loops=1)
-> Seq Scan on users u (cost=0.00..1.01 rows=1 width=8) (actual
time=0.006..0.010 rows=1 loops=1)
-> Index Scan using test_999 on test t (cost=0.00..5.11 rows=132
width=20) (actual time=0.017..3.358 rows=1020 loops=1)
Index Cond: (t.datum <= "outer".datum)
Filter: (("version" = 999) OR (approved IS NOT NULL))
Total runtime: 9.528 ms

OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.

The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it can
run. Don't forget that the planner needs to pick which index is best
*before* it starts fetching data.

So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.

--
Richard Huxton
Archonet Ltd

Import Notes

Reply to msg id not found: 41BD713F.6060309@freenet.de

Sebastian Böck

sebastianboeck@freenet.de

over 21 years ago

In reply to: Richard Huxton (#2)

Re: index not used in joins

Richard Huxton wrote:

Sebastian Böck wrote:

Richard Huxton wrote:

Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.

Sure, here it is.

Thanks. (PS - remember to cc the list too).

[output of EXPLAIN ANALYZE]

OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.

The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it can
run. Don't forget that the planner needs to pick which index is best
*before* it starts fetching data.

So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.

But what if the table users contains only 1 row and the column "version"
has a value of "999"?

Are there any other options to speed up this kind of query?

Thanks so far

Sebastian

Richard Huxton

dev@archonet.com

over 21 years ago

In reply to: Sebastian Böck (#3)

Re: index not used in joins

Sebastian Bï¿½ck wrote:

Richard Huxton wrote:

Sebastian Bï¿½ck wrote:

Richard Huxton wrote:

Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.

Sure, here it is.

Thanks. (PS - remember to cc the list too).

[output of EXPLAIN ANALYZE]

OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.

The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it
can run. Don't forget that the planner needs to pick which index is
best *before* it starts fetching data.

So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.

But what if the table users contains only 1 row and the column "version"
has a value of "999"?

It still doesn't know that the only value in "version" is 999(*). Let's
say there were 2000 rows and 1900 had the value 999 - the index is still
useless because we'd have to do a sequential scan to check the remaining
200 rows.

Are there any other options to speed up this kind of query?

Well, your problem is the (version=X OR approved IS NOT NULL) clause. I
must admit I can't quite see what this is supposed to do. The "test"
table connects to the "users" table via "version" (and "datum", though
not a simple check) unless the "test" has been "approved", in which case
it applies to all users?
Can you explain what the various tables/columns are really for?

(*) Don't forget the statistics for column values are usually
out-of-date compared to the actual data, so you can't rely on it.

--
Richard Huxton
Archonet Ltd

Sebastian Böck

sebastianboeck@freenet.de

over 21 years ago

In reply to: Richard Huxton (#4)

Re: index not used in joins

Richard Huxton wrote:

Sebastian Böck wrote:

Richard Huxton wrote:

Sebastian Böck wrote:

Richard Huxton wrote:

Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.

Sure, here it is.

Thanks. (PS - remember to cc the list too).

[output of EXPLAIN ANALYZE]

OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.

The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it
can run. Don't forget that the planner needs to pick which index is
best *before* it starts fetching data.

So - in the first example there might be rows where e.g.
t.version=998 which means test_999 would be a poor choice of index.

But what if the table users contains only 1 row and the column "version"
has a value of "999"?

It still doesn't know that the only value in "version" is 999(*). Let's
say there were 2000 rows and 1900 had the value 999 - the index is still
useless because we'd have to do a sequential scan to check the remaining
200 rows.

Are there any other options to speed up this kind of query?

Well, your problem is the (version=X OR approved IS NOT NULL) clause. I
must admit I can't quite see what this is supposed to do. The "test"
table connects to the "users" table via "version" (and "datum", though
not a simple check) unless the "test" has been "approved", in which case
it applies to all users?
Can you explain what the various tables/columns are really for?

The whole thing is a multiuser facility managment application.
Every user can plan things like he wants (different versions).
All these changes apply to a common (approved) version.
Things get complicated as everybody should be able to "travel"
through the history via the "datum" field.

That's why i need this "silly OR" in my where-clause.

At the moment i get very exciting results using immutable
functions, but i have another question.

In the docs it is stated that:

IMMUTABLE indicates that the function always returns the same
result when given the same argument values;

What if i define my functions like:

CREATE OR REPLACE FUNCTION datum () RETURNS TIMESTAMP AS '
SELECT datum FROM public.benutzer;
' LANGUAGE sql IMMUTABLE;

They normally (untill now) give the correct results,
also if the values in the underlaying view changes.

Can i relay on this or is it only luck.

(*) Don't forget the statistics for column values are usually
out-of-date compared to the actual data, so you can't rely on it.

I'm aware of that.

Thanks

Sebastian

index not used in joins

Attachments: