Index not being used unless enable_seqscan=false

Started by Shaneover 20 years ago9 messagesgeneral

shane-pgsql@cm.nu

over 20 years ago

Hello all,

I am working with a simple table and query abut cannot seem
to get it to use the index I have created. However, if I
set enable_seqscan=false, the index is used and the query
is much faster. I have tried a vacuum analyze but to no
avail.

Table layout:
Table "public.seen"
Column | Type | Modifiers
----------+--------------------------------+-----------
group_id | integer | not null
msgid | text | not null
msgtime | timestamp(0) without time zone | not null
Indexes:
"seen_group_id_key" unique, btree (group_id, msgid)
"seen_msgtime" btree (msgtime)
Foreign-key constraints:
"$1" FOREIGN KEY (group_id) REFERENCES groups(id) ON UPDATE CASCADE ON DELETE CASCADE

explain analyze with enable_seqscan=true
explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on seen (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1)
Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
Total runtime: 27096.337 ms
(3 rows)

Same query with enable_seqscan=false
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using seen_msgtime on seen (cost=0.00..3818325.78 rows=1081044 width=46) (actual time=0.140..156.222 rows=28907 loops=1)
Index Cond: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
Total runtime: 248.737 ms
(3 rows)

Any ideas on how I can fix this. I get this problem now
and again with other databases but a vacuum usually fixes
it.

Thanks,
Shane

Sven Willenberger

sven@dmv.com

over 20 years ago

In reply to: Shane (#1)

Re: Index not being used unless enable_seqscan=false

On Wed, 2005-08-10 at 12:01 -0700, Shane wrote:

Hello all,

I am working with a simple table and query abut cannot seem
to get it to use the index I have created. However, if I
set enable_seqscan=false, the index is used and the query
is much faster. I have tried a vacuum analyze but to no
avail.

Table layout:
Table "public.seen"
Column | Type | Modifiers
----------+--------------------------------+-----------
group_id | integer | not null
msgid | text | not null
msgtime | timestamp(0) without time zone | not null
Indexes:
"seen_group_id_key" unique, btree (group_id, msgid)
"seen_msgtime" btree (msgtime)
Foreign-key constraints:
"$1" FOREIGN KEY (group_id) REFERENCES groups(id) ON UPDATE CASCADE ON DELETE CASCADE

explain analyze with enable_seqscan=true
explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on seen (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1)
Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
Total runtime: 27096.337 ms
(3 rows)

Same query with enable_seqscan=false
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using seen_msgtime on seen (cost=0.00..3818325.78 rows=1081044 width=46) (actual time=0.140..156.222 rows=28907 loops=1)
Index Cond: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
Total runtime: 248.737 ms
(3 rows)

Any ideas on how I can fix this. I get this problem now
and again with other databases but a vacuum usually fixes
it.

Right off the bat (if I am interpreting the results of your explain
analyze correctly) it looks like the planner is basing its decision to
seqscan as it thinks that it needs to filter over 1 million rows (versus
the 29,000 rows that actually are pulled). Perhaps increasing stats on
msgtime and then analyzing the table may help. Depending on your
hardware, decreasing random_page_cost in your postgresql.conf just a
touch may help too.

Sven

Ragnar Hafstað

gnari@simnet.is

over 20 years ago

In reply to: Shane (#1)

Re: Index not being used unless enable_seqscan=false

On Wed, 2005-08-10 at 12:01 -0700, Shane wrote:

Hello all,

I am working with a simple table and query abut cannot seem
to get it to use the index I have created. However, if I
set enable_seqscan=false, the index is used and the query
is much faster. I have tried a vacuum analyze but to no
avail.

[snip]

explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on seen (cost=0.00..107879.45 rows=1081044 width=46) (actual time=7597.387..27000.777 rows=28907 loops=1)
Filter: (msgtime < ((now() - '6 mons'::interval))::timestamp(0) without time zone)
Total runtime: 27096.337 ms
(3 rows)

Same query with enable_seqscan=false

[snip faster plan]

Any ideas on how I can fix this. I get this problem now
and again with other databases but a vacuum usually fixes
it.

The planner is not very good at estimating selectivity of
single unequalities. If you can specify a range in the
where clause, you might possibly have better luck.
...WHERE msgtime < cast(now() - interval '6 months'
as timestamp(0) without time zone
AND msgtime >= '2000-01-01'

Also, you might want to try to increase the STATISTICS target
of msgtime.

Sometimes an ORDER BY clause can help the planner on choosing indexscan,
although in this case the difference in estimated cost is so high that
I doubt it.

gnari

Shane

shane-pgsql@cm.nu

over 20 years ago

In reply to: Sven Willenberger (#2)

Re: Index not being used unless enable_seqscan=false

On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:

Right off the bat (if I am interpreting the results of your explain
analyze correctly) it looks like the planner is basing its decision to
seqscan as it thinks that it needs to filter over 1 million rows (versus
the 29,000 rows that actually are pulled). Perhaps increasing stats on
msgtime and then analyzing the table may help. Depending on your
hardware, decreasing random_page_cost in your postgresql.conf just a
touch may help too.

Thanks for the pointers.

I tried increasing the stats from the default of 10 to 25
with no change. How high would you bring it? Also, I've
never played with the various cost variables. The database
sits on a raid5 partition composed of 4 15k u320 SCSI
drives, dual xeon 2.8(ht enabled) 2gb ram. I suppose this
might actually increase the cost of fetching a random disk
page as it may well be on another physical disk and
wouldn't be in the readahead cache. Any idea as to what it
should be on this sort of system?

Sven Willenberger

sven@dmv.com

over 20 years ago

In reply to: Shane (#4)

Re: Index not being used unless enable_seqscan=false

On Wed, 2005-08-10 at 12:58 -0700, Shane wrote:

On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:

Right off the bat (if I am interpreting the results of your explain
analyze correctly) it looks like the planner is basing its decision to
seqscan as it thinks that it needs to filter over 1 million rows (versus
the 29,000 rows that actually are pulled). Perhaps increasing stats on
msgtime and then analyzing the table may help. Depending on your
hardware, decreasing random_page_cost in your postgresql.conf just a
touch may help too.

Thanks for the pointers.

I tried increasing the stats from the default of 10 to 25
with no change. How high would you bring it? Also, I've
never played with the various cost variables. The database
sits on a raid5 partition composed of 4 15k u320 SCSI
drives, dual xeon 2.8(ht enabled) 2gb ram. I suppose this
might actually increase the cost of fetching a random disk
page as it may well be on another physical disk and
wouldn't be in the readahead cache. Any idea as to what it
should be on this sort of system?

---------------------------(end of broadcast)---------------------------

Try increasing stats to 100 on just the msgtime column, not the default
(changing the default will only have an effect on newly created columns
-- you may want to change the default back to 10):

ALTER TABLE seen ALTER msgtime SET STATISTICS 100;

After running that command, analyze the table again and see if that
helps.

I am assuming the culprit is this particular column as your index and
search criteria is based on that one. The default random_page_cost I
believe is 4.0; on your system you could probably easily drop it to 3,
possibly lower, and see how that performs.

Sven

Shane

shane-pgsql@cm.nu

over 20 years ago

In reply to: Sven Willenberger (#5)

Re: Index not being used unless enable_seqscan=false

On Wed, Aug 10, 2005 at 04:24:51PM -0400, Sven Willenberger wrote:

On Wed, 2005-08-10 at 12:58 -0700, Shane wrote:

On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:

Right off the bat (if I am interpreting the results of your explain
analyze correctly) it looks like the planner is basing its decision to
seqscan as it thinks that it needs to filter over 1 million rows (versus
the 29,000 rows that actually are pulled). Perhaps increasing stats on
msgtime and then analyzing the table may help. Depending on your
hardware, decreasing random_page_cost in your postgresql.conf just a
touch may help too.

Try increasing stats to 100 on just the msgtime column, not the default
(changing the default will only have an effect on newly created columns
-- you may want to change the default back to 10):

Hi,

I brought the statistics on msgtime up to 100, vacuum
analyzed and brought random_page_cost down to 2.
Unfortunately, explain analyze still wants to seqscan and
estimates 1m returned rows.

Is there a way to simply force an index usage for this
particular query?

Sven Willenberger

sven@dmv.com

over 20 years ago

In reply to: Shane (#6)

Re: Index not being used unless enable_seqscan=false

On Wed, 2005-08-10 at 13:31 -0700, Shane wrote:

On Wed, Aug 10, 2005 at 04:24:51PM -0400, Sven Willenberger wrote:

On Wed, 2005-08-10 at 12:58 -0700, Shane wrote:

On Wed, Aug 10, 2005 at 03:31:27PM -0400, Sven Willenberger wrote:

Right off the bat (if I am interpreting the results of your explain
analyze correctly) it looks like the planner is basing its decision to
seqscan as it thinks that it needs to filter over 1 million rows (versus
the 29,000 rows that actually are pulled). Perhaps increasing stats on
msgtime and then analyzing the table may help. Depending on your
hardware, decreasing random_page_cost in your postgresql.conf just a
touch may help too.

Try increasing stats to 100 on just the msgtime column, not the default
(changing the default will only have an effect on newly created columns
-- you may want to change the default back to 10):

Hi,

I brought the statistics on msgtime up to 100, vacuum
analyzed and brought random_page_cost down to 2.
Unfortunately, explain analyze still wants to seqscan and
estimates 1m returned rows.

Is there a way to simply force an index usage for this
particular query?

S

What version of PostgreSQL are you running? Also, what happens if you
explain analyze choosing where msgtime > cast(now() - interval '6
months' as timestamp(0) without time zone); (instead of less than).
Depending on how you are connecting to run this query (script, webpage,
psql) you could always set enable_seq_scan=off; select ....; set
enable_seq_scan=on; scriptomagically.

Sven

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Shane (#1)

Re: Index not being used unless enable_seqscan=false

Shane <shane-pgsql@cm.nu> writes:

I am working with a simple table and query abut cannot seem
to get it to use the index I have created.
...
explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone);

As some other people already pointed out, the problem is the horrible
misestimate of the number of matching rows. You did not say your
Postgres version, but I'm betting it's pre-8.0. Versions before 8.0
would not assume that they could get any useful statistical info from
an expression involving now() (or in general, any non-immutable
function). The default assumption in such cases is that a lot of
rows are retrieved --- too many for an indexscan.

If you cannot update to 8.0.* at the moment, a workaround is to do the
timestamp calculation on the client side so that you can send over
a query that's just a comparison to a constant:

... where msgtime < '2005-02-14 ...'::timestamp;

regards, tom lane

Shane

shane-pgsql@cm.nu

over 20 years ago

In reply to: Tom Lane (#8)

Re: Index not being used unless enable_seqscan=false

On Thu, Aug 11, 2005 at 12:10:33AM -0400, Tom Lane wrote:

Shane <shane-pgsql@cm.nu> writes:

I am working with a simple table and query abut cannot seem
to get it to use the index I have created.
...
explain analyze select msgid from seen where msgtime < cast(now() - interval '6 months' as timestamp(0) without time zone);

As some other people already pointed out, the problem is the horrible
misestimate of the number of matching rows. You did not say your
Postgres version, but I'm betting it's pre-8.0. Versions before 8.0
would not assume that they could get any useful statistical info from
an expression involving now() (or in general, any non-immutable
function). The default assumption in such cases is that a lot of
rows are retrieved --- too many for an indexscan.

Hi Tom,

The version being used is 7.4.7. However, as an
experiment, I ran pg8 on a different port and loaded the
dataset. It needed a vacuum analyze but after that pg8 was
using the index and got the row estimate correct.
Thanks for the pointer,
Shane