Problems with non use of indexes

Started by Tyler Durdenabout 14 years ago8 messagesgeneral
Jump to latest
#1Tyler Durden
tylersticky@gmail.com

Hi,
I can't figure out why query planner doesn't use the proper index, anyone
can help me?

This query properly uses indexes:

mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE
(U0."content_type_id" = 3 AND U0."user_id" = 1);

QUERY PLAN
--------------------------------------------------------------------------------------------------------
Index Scan using activity_follow_user_id on activity_follow u0
(cost=0.00..4875.15 rows=4898 width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)
(3 rows)

But the same query on a "IN" statement doesn't. The query planner uses Seq
Scan on *U0."user_id" = 1*

mydb=# EXPLAIN SELECT "activity_action"."id", "activity_action"."actor_id",
"activity_action"."verb", "activity_action"."action_content_type_id",
"activity_action"."action_object_id",
"activity_action"."target_content_type_id",
"activity_action"."target_object_id", "activity_action"."public",
"activity_action"."created", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login",
"auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
("activity_action"."actor_id" = "auth_user"."id") WHERE
"activity_action"."actor_id" IN (SELECT U0."object_id" FROM
"activity_follow" U0 WHERE (U0."content_type_id" = 3 AND *U0."user_id"
= 1*)) ORDER BY "activity_action"."created" DESC LIMIT 100;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9206.97..9207.22 rows=100 width=155)
-> Sort (cost=9206.97..9320.34 rows=45347 width=155)
Sort Key: activity_action.created
-> Hash Join (cost=5447.39..7473.84 rows=45347 width=155)
Hash Cond: (activity_action.actor_id = auth_user.id)
-> Nested Loop (cost=4887.39..5020.58 rows=45347 width=55)
-> HashAggregate (cost=4887.39..4887.41 rows=2
width=4)
-> Index Scan using activity_follow_user_id on
activity_follow u0 (cost=0.00..4875.15 rows=4898 width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)
-> Index Scan using activity_action_actor_id on
activity_action (cost=0.00..65.20 rows=111 width=51)
Index Cond: (activity_action.actor_id =
u0.object_id)
-> Hash (cost=278.00..278.00 rows=10000 width=104)
-> Seq Scan on auth_user (cost=0.00..278.00
rows=10000 width=104)

If I do a SET enable_seqscan TO 'off'; It uses the index but is also slow.

---

*TABLES:*

CREATE TABLE "auth_user" (
"id" serial NOT NULL PRIMARY KEY,
"username" varchar(30) NOT NULL UNIQUE,
"first_name" varchar(30) NOT NULL,
"last_name" varchar(30) NOT NULL,
"email" varchar(75) NOT NULL,
"password" varchar(128) NOT NULL,
"is_staff" boolean NOT NULL,
"is_active" boolean NOT NULL,
"is_superuser" boolean NOT NULL,
"last_login" timestamp with time zone NOT NULL,
"date_joined" timestamp with time zone NOT NULL
);

CREATE TABLE "activity_follow" (
"id" serial NOT NULL PRIMARY KEY,
"user_id" integer NOT NULL REFERENCES "auth_user" ("id") DEFERRABLE
INITIALLY DEFERRED,
"content_type_id" integer NOT NULL REFERENCES "django_content_type"
("id") DEFERRABLE INITIALLY DEFERRED,
"object_id" integer CHECK ("object_id" >= 0),
UNIQUE ("user_id", "content_type_id", "object_id")
);

CREATE TABLE "activity_action" (
"id" serial NOT NULL PRIMARY KEY,
"actor_id" integer NOT NULL REFERENCES "auth_user" ("id") DEFERRABLE
INITIALLY DEFERRED,
"verb" varchar(50) NOT NULL,
"action_content_type_id" integer REFERENCES "django_content_type"
("id") DEFERRABLE INITIALLY DEFERRED,
"action_object_id" integer CHECK ("action_object_id" >= 0),
"target_content_type_id" integer REFERENCES "django_content_type"
("id") DEFERRABLE INITIALLY DEFERRED,
"target_object_id" integer CHECK ("target_object_id" >= 0),
"public" boolean NOT NULL,
"created" timestamp with time zone NOT NULL
);

CREATE INDEX "activity_follow_user_id" ON "activity_follow" ("user_id");
CREATE INDEX "activity_follow_content_type_id" ON "activity_follow"
("content_type_id");
CREATE INDEX "activity_follow_object_id" ON "activity_follow" ("object_id");
CREATE INDEX "activity_action_actor_id" ON "activity_action" ("actor_id");
CREATE INDEX "activity_action_action_content_type_id" ON "activity_action"
("action_content_type_id");
CREATE INDEX "activity_action_action_object_id" ON "activity_action"
("action_object_id");
CREATE INDEX "activity_action_target_content_type_id" ON "activity_action"
("target_content_type_id");
CREATE INDEX "activity_action_target_object_id" ON "activity_action"
("target_object_id");

---

mydb=# SELECT COUNT(1) FROM activity_action;
count
---------
1104800
(1 row)

mydb=# SELECT COUNT(1) FROM activity_follow;
count
---------
1104800
(1 row)

mydb=# SELECT COUNT(1) FROM auth_user;
count
-------
10000
(1 row)

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tyler Durden (#1)
Re: Problems with non use of indexes

Hi,

On 2 Březen 2012, 13:12, Tyler Durden wrote:

Hi,
I can't figure out why query planner doesn't use the proper index, anyone
can help me?

This query properly uses indexes:

mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE
(U0."content_type_id" = 3 AND U0."user_id" = 1);

QUERY PLAN
--------------------------------------------------------------------------------------------------------
Index Scan using activity_follow_user_id on activity_follow u0
(cost=0.00..4875.15 rows=4898 width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)
(3 rows)

But the same query on a "IN" statement doesn't. The query planner uses Seq
Scan on *U0."user_id" = 1*

mydb=# EXPLAIN SELECT "activity_action"."id",
"activity_action"."actor_id",
"activity_action"."verb", "activity_action"."action_content_type_id",
"activity_action"."action_object_id",
"activity_action"."target_content_type_id",
"activity_action"."target_object_id", "activity_action"."public",
"activity_action"."created", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login",
"auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
("activity_action"."actor_id" = "auth_user"."id") WHERE
"activity_action"."actor_id" IN (SELECT U0."object_id" FROM
"activity_follow" U0 WHERE (U0."content_type_id" = 3 AND *U0."user_id"
= 1*)) ORDER BY "activity_action"."created" DESC LIMIT 100;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9206.97..9207.22 rows=100 width=155)
-> Sort (cost=9206.97..9320.34 rows=45347 width=155)
Sort Key: activity_action.created
-> Hash Join (cost=5447.39..7473.84 rows=45347 width=155)
Hash Cond: (activity_action.actor_id = auth_user.id)
-> Nested Loop (cost=4887.39..5020.58 rows=45347
width=55)
-> HashAggregate (cost=4887.39..4887.41 rows=2
width=4)
-> Index Scan using activity_follow_user_id on
activity_follow u0 (cost=0.00..4875.15 rows=4898 width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)
-> Index Scan using activity_action_actor_id on
activity_action (cost=0.00..65.20 rows=111 width=51)
Index Cond: (activity_action.actor_id =
u0.object_id)
-> Hash (cost=278.00..278.00 rows=10000 width=104)
-> Seq Scan on auth_user (cost=0.00..278.00
rows=10000 width=104)

If I do a SET enable_seqscan TO 'off'; It uses the index but is also slow.

Errr, what? The only sequential scan in that explain output is on
auth_user, not activity_follow which is the table referenced in the
original query. It actually uses index scan to read activity_follow

-> Index Scan using activity_follow_user_id on
activity_follow u0 (cost=0.00..4875.15 rows=4898
width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)

kind regards
Tomas

#3Tyler Durden
tylersticky@gmail.com
In reply to: Tomas Vondra (#2)
Re: Problems with non use of indexes

Hi,

On Fri, Mar 2, 2012 at 12:23 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

Hi,

On 2 Březen 2012, 13:12, Tyler Durden wrote:

Hi,
I can't figure out why query planner doesn't use the proper index, anyone
can help me?

This query properly uses indexes:

mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE
(U0."content_type_id" = 3 AND U0."user_id" = 1);

QUERY PLAN

--------------------------------------------------------------------------------------------------------

Index Scan using activity_follow_user_id on activity_follow u0
(cost=0.00..4875.15 rows=4898 width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)
(3 rows)

But the same query on a "IN" statement doesn't. The query planner uses

Seq

Scan on *U0."user_id" = 1*

mydb=# EXPLAIN SELECT "activity_action"."id",
"activity_action"."actor_id",
"activity_action"."verb", "activity_action"."action_content_type_id",
"activity_action"."action_object_id",
"activity_action"."target_content_type_id",
"activity_action"."target_object_id", "activity_action"."public",
"activity_action"."created", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login",
"auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user"

ON

("activity_action"."actor_id" = "auth_user"."id") WHERE
"activity_action"."actor_id" IN (SELECT U0."object_id" FROM
"activity_follow" U0 WHERE (U0."content_type_id" = 3 AND *U0."user_id"
= 1*)) ORDER BY "activity_action"."created" DESC LIMIT 100;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=9206.97..9207.22 rows=100 width=155)
-> Sort (cost=9206.97..9320.34 rows=45347 width=155)
Sort Key: activity_action.created
-> Hash Join (cost=5447.39..7473.84 rows=45347 width=155)
Hash Cond: (activity_action.actor_id = auth_user.id)
-> Nested Loop (cost=4887.39..5020.58 rows=45347
width=55)
-> HashAggregate (cost=4887.39..4887.41 rows=2
width=4)
-> Index Scan using activity_follow_user_id

on

activity_follow u0 (cost=0.00..4875.15 rows=4898 width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)
-> Index Scan using activity_action_actor_id on
activity_action (cost=0.00..65.20 rows=111 width=51)
Index Cond: (activity_action.actor_id =
u0.object_id)
-> Hash (cost=278.00..278.00 rows=10000 width=104)
-> Seq Scan on auth_user (cost=0.00..278.00
rows=10000 width=104)

If I do a SET enable_seqscan TO 'off'; It uses the index but is also

slow.

Errr, what? The only sequential scan in that explain output is on
auth_user, not activity_follow which is the table referenced in the
original query. It actually uses index scan to read activity_follow

-> Index Scan using activity_follow_user_id on
activity_follow u0 (cost=0.00..4875.15 rows=4898
width=4)
Index Cond: (user_id = 1)
Filter: (content_type_id = 3)

kind regards
Tomas

Yes, but if I remove *U0."user_id" = 1 *will use the index:

EXPLAIN SELECT "activity_action"."id", "activity_action"."actor_id",
"activity_action"."verb", "activity_action"."action_content_type_id",
"activity_action"."action_object_id",
"activity_action"."target_content_type_id",
"activity_action"."target_object_id", "activity_action"."public",
"activity_action"."created", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login",
"auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
("activity_action"."actor_id" = "auth_user"."id") WHERE
"activity_action"."actor_id" IN (SELECT U0."object_id" FROM
"activity_follow" U0 WHERE ( U0."content_type_id" = 3 )) ORDER BY
"activity_action"."created" DESC LIMIT 100;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..4502.18 rows=100 width=155)
-> Nested Loop (cost=0.00..2041605.23 rows=45347 width=155)
-> Nested Loop Semi Join (cost=0.00..1907985.65 rows=45347
width=55)
-> Index Scan using activity_action_created on
activity_action (cost=0.00..40093.37 rows=1104800 width=51)
-> Index Scan using activity_follow_object_id on
activity_follow u0 (cost=0.00..5519.13 rows=3328 width=4)
Index Cond: (u0.object_id = activity_action.actor_id)
Filter: (u0.content_type_id = 3)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..2.93
rows=1 width=104)
Index Cond: (auth_user.id = activity_action.actor_id)

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tyler Durden (#3)
Re: Problems with non use of indexes

On 2 Březen 2012, 13:45, Tyler Durden wrote:

Yes, but if I remove *U0."user_id" = 1 *will use the index:

Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
three queries, not just EXPLAIN. And use explain.depesz.com if possible,
it's much more readable.

kind regards
Tomas

#5Tyler Durden
tylersticky@gmail.com
In reply to: Tomas Vondra (#4)
Re: Problems with non use of indexes

On Fri, Mar 2, 2012 at 12:55 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 2 Březen 2012, 13:45, Tyler Durden wrote:

Yes, but if I remove *U0."user_id" = 1 *will use the index:

Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
three queries, not just EXPLAIN. And use explain.depesz.com if possible,
it's much more readable.

kind regards
Tomas

I'm using PostgreSQL 8.4.2 and you can find the EXPLAIN ANALYSE VERBOSE in
http://explain.depesz.com/s/hk2

Thanks!

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tyler Durden (#5)
Re: Problems with non use of indexes

On 2 Březen 2012, 14:34, Tyler Durden wrote:

On Fri, Mar 2, 2012 at 12:55 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 2 Březen 2012, 13:45, Tyler Durden wrote:

Yes, but if I remove *U0."user_id" = 1 *will use the index:

Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
three queries, not just EXPLAIN. And use explain.depesz.com if possible,
it's much more readable.

kind regards
Tomas

I'm using PostgreSQL 8.4.2 and you can find the EXPLAIN ANALYSE VERBOSE in
http://explain.depesz.com/s/hk2

For all three queries, please.

Tomas

#7Tyler Durden
tylersticky@gmail.com
In reply to: Tomas Vondra (#6)
Re: Problems with non use of indexes

http://explain.depesz.com/s/f92O

*EXPLAIN ANALYSE VERBOSE SELECT "activity_action"."id",
"activity_action"."actor_id", "activity_action"."verb",
"activity_action"."action_content_type_id",
"activity_action"."action_object_id",
"activity_action"."target_content_type_id",
"activity_action"."target_object_id", "activity_action"."public",
"activity_action"."created", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login",
"auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
("activity_action"."actor_id" = "auth_user"."id") WHERE
"activity_action"."actor_id" IN (SELECT U0."object_id" FROM
"activity_follow" U0 WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1))
ORDER BY "activity_action"."created" DESC LIMIT 100;*

---

http://explain.depesz.com/s/o3w

EXPLAIN ANALYSE VERBOSE SELECT "activity_action"."id",
"activity_action"."actor_id", "activity_action"."verb",
"activity_action"."action_content_type_id",
"activity_action"."action_object_id",
"activity_action"."target_content_type_id",
"activity_action"."target_object_id", "activity_action"."public",
"activity_action"."created", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login",
"auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
("activity_action"."actor_id" = "auth_user"."id") WHERE
"activity_action"."actor_id" IN (SELECT U0."object_id" FROM
"activity_follow" U0 WHERE (U0."content_type_id" = 3)) ORDER BY
"activity_action"."created" DESC LIMIT 100;

---

http://explain.depesz.com/s/ccJ

EXPLAIN ANALYSE VERBOSE SELECT U0."object_id" FROM "activity_follow" U0
WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1);

On Fri, Mar 2, 2012 at 2:23 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

Show quoted text

On 2 Březen 2012, 14:34, Tyler Durden wrote:

On Fri, Mar 2, 2012 at 12:55 PM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 2 Březen 2012, 13:45, Tyler Durden wrote:

Yes, but if I remove *U0."user_id" = 1 *will use the index:

Which PostgreSQL version is that? Post EXPLAIN ANALYZE output for all
three queries, not just EXPLAIN. And use explain.depesz.com if

possible,

it's much more readable.

kind regards
Tomas

I'm using PostgreSQL 8.4.2 and you can find the EXPLAIN ANALYSE VERBOSE

in

http://explain.depesz.com/s/hk2

For all three queries, please.

Tomas

#8Scott Marlowe
scott.marlowe@gmail.com
In reply to: Tyler Durden (#1)
Re: Problems with non use of indexes

On Fri, Mar 2, 2012 at 5:12 AM, Tyler Durden <tylersticky@gmail.com> wrote:

Hi,
I can't figure out why query planner doesn't use the proper index, anyone
can help me?

This query properly uses indexes:

mydb=# EXPLAIN SELECT U0."object_id" FROM "activity_follow" U0 WHERE
(U0."content_type_id" = 3 AND U0."user_id" = 1);

Query plan: http://explain.depesz.com/s/ccJ
No order by in the above. Order by in the below:

mydb=# EXPLAIN SELECT "activity_action"."id", "activity_action"."actor_id",
"activity_action"."verb", "activity_action"."action_content_type_id",
"activity_action"."action_object_id",
"activity_action"."target_content_type_id",
"activity_action"."target_object_id", "activity_action"."public",
"activity_action"."created", "auth_user"."id", "auth_user"."username",
"auth_user"."first_name", "auth_user"."last_name", "auth_user"."email",
"auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active",
"auth_user"."is_superuser", "auth_user"."last_login",
"auth_user"."date_joined" FROM "activity_action" INNER JOIN "auth_user" ON
("activity_action"."actor_id" = "auth_user"."id") WHERE
"activity_action"."actor_id" IN (SELECT U0."object_id" FROM
"activity_follow" U0 WHERE (U0."content_type_id" = 3 AND U0."user_id" = 1 ))
ORDER BY "activity_action"."created" DESC LIMIT 100;

query plan: http://explain.depesz.com/s/f92O

What happens if you drop the order by on it? Just for comparison.
I'm guessing that needing to sort is where the cost is coming from.