planner question re index vs seqscan

Started by Ed Loehrover 25 years ago6 messages
#1Ed Loehr
eloehr@austin.rr.com

I have 2 tables with indices as follows:

Table "activity" (~4000 rows)
id serial
start_time timestamp not null
stop_time timestamp not null
...

CREATE INDEX activity_start_time ON activity (start_time)
CREATE INDEX activity_stop_time ON activity (stop_time)

Table "activity_hr_need" (~2000 rows)
id serial
activity_id integer not null
hr_type_id integer not null
hr_count integer not null
...

CREATE UNIQUE INDEX activity_hr_need_pkey
ON activity_hr_need (activity_id, hr_type_id)
CREATE INDEX activity_hr_need_hrtid
ON activity_hr_need (hr_type_id)
CREATE INDEX activity_hr_need_aid
ON activity_hr_need (activity_id int4_ops)

QUESTION: Why doesn't the planner, just after 'vacuum analyze', use the
provided indices for this query? How can I tweak it to use the indices?

sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",
sdb-# SUM(ahrn.hr_count) AS "Planned Consulting Days"
sdb-# FROM activity a, activity_hr_need ahrn
sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
sdb-# AND a.stop_time::date <= '1-Jul-2000'::date
sdb-# AND ahrn.activity_id = a.id
sdb-# GROUP BY "Resource Type";
NOTICE: QUERY PLAN:

Aggregate (cost=243.74..244.58 rows=17 width=16)
-> Group (cost=243.74..244.16 rows=169 width=16)
-> Sort (cost=243.74..243.74 rows=169 width=16)
-> Hash Join (cost=142.65..237.50 rows=169 width=16)
-> Seq Scan on activity_hr_need ahrn
(cost=0.00..53.58 rows=2358 width=12)
-> Hash (cost=141.60..141.60 rows=420 width=4)
-> Seq Scan on activity a (cost=0.00..141.60
rows=420 width=4)

Regards,
Ed Loehr

#2Jeff Hoffmann
jeff@propertykey.com
In reply to: Ed Loehr (#1)
Re: planner question re index vs seqscan

Ed Loehr wrote:

QUESTION: Why doesn't the planner, just after 'vacuum analyze', use the
provided indices for this query? How can I tweak it to use the indices?

sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",
sdb-# SUM(ahrn.hr_count) AS "Planned Consulting Days"
sdb-# FROM activity a, activity_hr_need ahrn
sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
sdb-# AND a.stop_time::date <= '1-Jul-2000'::date
sdb-# AND ahrn.activity_id = a.id
sdb-# GROUP BY "Resource Type";
NOTICE: QUERY PLAN:

dump the typecasting in the query and try again. not sure if it'll
work, but it's worth a try. typecasting has an annoying effect of
disabling index scans in some cases even when you'd swear logically that
they should be used. if that doesn't help, it's possible that it just
shouldn't be using the indexes based on cost estimates. try shutting
off the sequential scan with "set enable_seqscan=off" before the query
to check if that's the case.

--

Jeff Hoffmann
PropertyKey.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed Loehr (#1)
Re: planner question re index vs seqscan

Ed Loehr <eloehr@austin.rr.com> writes:

QUESTION: Why doesn't the planner, just after 'vacuum analyze', use the
provided indices for this query? How can I tweak it to use the indices?

sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",
sdb-# SUM(ahrn.hr_count) AS "Planned Consulting Days"
sdb-# FROM activity a, activity_hr_need ahrn
sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
sdb-# AND a.stop_time::date <= '1-Jul-2000'::date
sdb-# AND ahrn.activity_id = a.id
sdb-# GROUP BY "Resource Type";

At least part of the problem is that you have two separate one-sided
inequalities, neither one of which is very selective by itself ---
and of course the planner has no idea that there might be any semantic
connection between "start_time" and "stop_time". You could help it out
by providing something it can recognize as a range restriction on one
index or the other. For example:

WHERE a.start_time::date >= '1-Jun-2000'::date
AND a.start_time::date <= '1-Jul-2000'::date
AND a.stop_time::date <= '1-Jul-2000'::date
AND ahrn.activity_id = a.id

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Hoffmann (#2)
Re: planner question re index vs seqscan

Jeff Hoffmann <jeff@propertykey.com> writes:

QUESTION: Why doesn't the planner, just after 'vacuum analyze', use the
provided indices for this query? How can I tweak it to use the indices?

sdb=# EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",
sdb-# SUM(ahrn.hr_count) AS "Planned Consulting Days"
sdb-# FROM activity a, activity_hr_need ahrn
sdb-# WHERE a.start_time::date >= '1-Jun-2000'::date
sdb-# AND a.stop_time::date <= '1-Jul-2000'::date
sdb-# AND ahrn.activity_id = a.id
sdb-# GROUP BY "Resource Type";

dump the typecasting in the query and try again. not sure if it'll
work, but it's worth a try. typecasting has an annoying effect of
disabling index scans in some cases even when you'd swear logically that
they should be used.

Oh, that's a good point --- if the start_time and stop_time columns are
not of type date then the above is guaranteed not to be indexscanable,
because what you've really written is

WHERE date(a.start_time) >= '1-Jun-2000'::date
AND date(a.stop_time) <= '1-Jul-2000'::date

It might be able to use a functional index on date(start_time) or
date(stop_time), but not a straight index on the timestamp columns.

A good rule of thumb is not to use casts unless you have no choice...

regards, tom lane

#5Ed Loehr
eloehr@austin.rr.com
In reply to: Ed Loehr (#1)
Re: planner question re index vs seqscan

Tom Lane wrote:

Jeff Hoffmann <jeff@propertykey.com> writes:

QUESTION: Why doesn't the planner, just after 'vacuum analyze', use the
provided indices for this query? How can I tweak it to use the indices?

dump the typecasting in the query and try again. not sure if it'll
work, but it's worth a try. typecasting has an annoying effect of
disabling index scans in some cases even when you'd swear logically that
they should be used.

I dropped the typecasting, but that had no visible effect. Adding the
additional predicate to the where clause as Tom suggested had the desired
effect of replacing one seqscan with an index scan. But I'm still
wondering why it is still doing a seq scan on the "ahrn.activity_id =
a.id" part when both of those integer columns are indexed??

EXPLAIN SELECT ahrn.hr_type_id AS "Resource Type",
SUM(ahrn.hr_count) AS "Planned Consulting Days"
FROM activity a, activity_hr_need ahrn
WHERE a.start_time >= '1-Jun-2000'
AND a.stop_time <= '1-Jul-2000'
AND a.start_time <= '1-Jul-2000'
AND ahrn.activity_id = a.id
GROUP BY "Resource Type";
QUERY PLAN:

Aggregate (cost=137.12..137.16 rows=1 width=16)
-> Group (cost=137.12..137.14 rows=7 width=16)
-> Sort (cost=137.12..137.12 rows=7 width=16)
-> Hash Join (cost=47.86..137.04 rows=7 width=16)
-> Seq Scan on activity_hr_need ahrn
(cost=0.00..53.58 rows=2358 width=12)
-> Hash (cost=47.82..47.82 rows=16 width=4)
-> Index Scan using activity_start_time on
activity a (cost=0.00..47.82 rows=16 width=4)

Regards,
Ed Loehr

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed Loehr (#5)
Re: planner question re index vs seqscan

Ed Loehr <eloehr@austin.rr.com> writes:

But I'm still wondering why it is still doing a seq scan on the
"ahrn.activity_id = a.id" part when both of those integer columns are
indexed??

Presumably because it thinks the hash join is cheaper than a nestloop
or merge join would be ... although that seems kinda surprising. What
plans do you get if you try various combinations of
set enable_hashjoin = off;
set enable_mergejoin = off;
set enable_nestloop = off;
How do the cost estimates compare against the actual runtimes for
doing the query each way?

regards, tom lane