index not used in joins
Hello all,
i have a problem with index usage and joins.
Attached is some SQL demonstrating my problem;
Why is the index only used in the 2nd query?
Can anybody explain me how to avoid/fix this.
Thanks in advance
Sebastian
Attachments:
test.sqltext/plain; name=test.sqlDownload
Sebastian B�ck wrote:
Richard Huxton wrote:
Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.Sure, here it is.
Thanks. (PS - remember to cc the list too).
EXPLAIN ANALYZE SELECT * FROM v;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.00..263.12 rows=116 width=20) (actual
time=5.171..109.910 rows=1020 loops=1)
Join Filter: (("inner"."version" = "outer"."version") OR
("inner".approved IS NOT NULL))
-> Seq Scan on users u (cost=0.00..1.01 rows=1 width=12) (actual
time=0.005..0.009 rows=1 loops=1)
-> Index Scan using test_ on test t (cost=0.00..155.74 rows=7092
width=20) (actual time=0.012..64.873 rows=21000 loops=1)
Index Cond: (t.datum <= "outer".datum)
Total runtime: 111.879 ms
EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON
t.datum <= u.datum AND (t.version = '999' OR t.approved IS NOT NULL);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------Nested Loop (cost=0.00..7.78 rows=133 width=20) (actual
time=0.035..7.733 rows=1020 loops=1)
-> Seq Scan on users u (cost=0.00..1.01 rows=1 width=8) (actual
time=0.006..0.010 rows=1 loops=1)
-> Index Scan using test_999 on test t (cost=0.00..5.11 rows=132
width=20) (actual time=0.017..3.358 rows=1020 loops=1)
Index Cond: (t.datum <= "outer".datum)
Filter: (("version" = 999) OR (approved IS NOT NULL))
Total runtime: 9.528 ms
OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.
The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it can
run. Don't forget that the planner needs to pick which index is best
*before* it starts fetching data.
So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.
--
Richard Huxton
Archonet Ltd
Import Notes
Reply to msg id not found: 41BD713F.6060309@freenet.de
Richard Huxton wrote:
Sebastian Böck wrote:
Richard Huxton wrote:
Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.Sure, here it is.
Thanks. (PS - remember to cc the list too).
[output of EXPLAIN ANALYZE]
OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it can
run. Don't forget that the planner needs to pick which index is best
*before* it starts fetching data.So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.
But what if the table users contains only 1 row and the column "version"
has a value of "999"?
Are there any other options to speed up this kind of query?
Thanks so far
Sebastian
Sebastian B�ck wrote:
Richard Huxton wrote:
Sebastian B�ck wrote:
Richard Huxton wrote:
Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.Sure, here it is.
Thanks. (PS - remember to cc the list too).
[output of EXPLAIN ANALYZE]
OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it
can run. Don't forget that the planner needs to pick which index is
best *before* it starts fetching data.So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.But what if the table users contains only 1 row and the column "version"
has a value of "999"?
It still doesn't know that the only value in "version" is 999(*). Let's
say there were 2000 rows and 1900 had the value 999 - the index is still
useless because we'd have to do a sequential scan to check the remaining
200 rows.
Are there any other options to speed up this kind of query?
Well, your problem is the (version=X OR approved IS NOT NULL) clause. I
must admit I can't quite see what this is supposed to do. The "test"
table connects to the "users" table via "version" (and "datum", though
not a simple check) unless the "test" has been "approved", in which case
it applies to all users?
Can you explain what the various tables/columns are really for?
(*) Don't forget the statistics for column values are usually
out-of-date compared to the actual data, so you can't rely on it.
--
Richard Huxton
Archonet Ltd
Richard Huxton wrote:
Sebastian Böck wrote:
Richard Huxton wrote:
Sebastian Böck wrote:
Richard Huxton wrote:
Can you post the output from your "explain analyse" calls too? The
statistics aren't going to be the same on different machines.Sure, here it is.
Thanks. (PS - remember to cc the list too).
[output of EXPLAIN ANALYZE]
OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.The fact is that the conditional index:
CREATE INDEX test_999 ON test (datum)
WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it
can run. Don't forget that the planner needs to pick which index is
best *before* it starts fetching data.So - in the first example there might be rows where e.g.
t.version=998 which means test_999 would be a poor choice of index.But what if the table users contains only 1 row and the column "version"
has a value of "999"?It still doesn't know that the only value in "version" is 999(*). Let's
say there were 2000 rows and 1900 had the value 999 - the index is still
useless because we'd have to do a sequential scan to check the remaining
200 rows.Are there any other options to speed up this kind of query?
Well, your problem is the (version=X OR approved IS NOT NULL) clause. I
must admit I can't quite see what this is supposed to do. The "test"
table connects to the "users" table via "version" (and "datum", though
not a simple check) unless the "test" has been "approved", in which case
it applies to all users?
Can you explain what the various tables/columns are really for?
The whole thing is a multiuser facility managment application.
Every user can plan things like he wants (different versions).
All these changes apply to a common (approved) version.
Things get complicated as everybody should be able to "travel"
through the history via the "datum" field.
That's why i need this "silly OR" in my where-clause.
At the moment i get very exciting results using immutable
functions, but i have another question.
In the docs it is stated that:
IMMUTABLE indicates that the function always returns the same
result when given the same argument values;
What if i define my functions like:
CREATE OR REPLACE FUNCTION datum () RETURNS TIMESTAMP AS '
SELECT datum FROM public.benutzer;
' LANGUAGE sql IMMUTABLE;
They normally (untill now) give the correct results,
also if the values in the underlaying view changes.
Can i relay on this or is it only luck.
(*) Don't forget the statistics for column values are usually
out-of-date compared to the actual data, so you can't rely on it.
I'm aware of that.
Thanks
Sebastian