Doesn't use index, why?

Started by Marco Catundaover 25 years ago6 messagesgeneral
Jump to latest
#1Marco Catunda
catunda@pobox.com

Hello,

I have some doubt about index.
Look the follow example:

naslog=# \d desconexao
Table "desconexao"
Attribute | Type | Modifier
----------------+-------------+-------------------------------------------------
id | integer | not null default
nextval('id_desconexao'::text)
time | timestamp |
client_user | varchar(20) |
client | varchar(40) |
ip_nas | inet |
ip_client_user | inet |
disconnect | smallint |
user_time | interval |
data_rate | integer |
called_number | varchar(14) |
calling_number | varchar(14) |
filtrado | boolean | default 'f'::bool
Indices: client_desconexao_idx,
desconexao_pkey,
filtro_idx,
time_idx

naslog=# \d time_idx
Index "time_idx"
Attribute | Type
-----------+-----------
time | timestamp
btree

naslog=# explain select * from desconexao where time = '2000-12-01';
NOTICE: QUERY PLAN:

Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
width=103)

EXPLAIN

Every thing is perfect, but when i execute de follow query:

naslog=# explain select * from desconexao where time > '2000-12-01';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)

EXPLAIN

Why postgresql didn't use the time_idx index? I only changed the
operator
"=" to ">" in clause "where".

Thank you.
Regards
-- Marco Catunda

#2Dave Smith
dave@candata.com
In reply to: Marco Catunda (#1)
Re: Doesn't use index, why?

Marco Catunda wrote:

Hello,

I have some doubt about index.
Look the follow example:

naslog=# \d desconexao
Table "desconexao"
Attribute | Type | Modifier
----------------+-------------+-------------------------------------------------
id | integer | not null default
nextval('id_desconexao'::text)
time | timestamp |
client_user | varchar(20) |
client | varchar(40) |
ip_nas | inet |
ip_client_user | inet |
disconnect | smallint |
user_time | interval |
data_rate | integer |
called_number | varchar(14) |
calling_number | varchar(14) |
filtrado | boolean | default 'f'::bool
Indices: client_desconexao_idx,
desconexao_pkey,
filtro_idx,
time_idx

naslog=# \d time_idx
Index "time_idx"
Attribute | Type
-----------+-----------
time | timestamp
btree

naslog=# explain select * from desconexao where time = '2000-12-01';
NOTICE: QUERY PLAN:

Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
width=103)

EXPLAIN

Every thing is perfect, but when i execute de follow query:

naslog=# explain select * from desconexao where time > '2000-12-01';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)

EXPLAIN

Why postgresql didn't use the time_idx index? I only changed the
operator
"=" to ">" in clause "where".

Thank you.
Regards
-- Marco Catunda

Firstly you should include your version along with the report.

Two thoughts ... did you run vacuum?
Of the total number of records how many are greater than 2000-12-01? It
could be that postgresql figures that a large portion of the result set
is greater than this date and it is faster to do a sequential scan

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marco Catunda (#1)
Re: Doesn't use index, why?

Marco Catunda <catunda@pobox.com> writes:

naslog=# explain select * from desconexao where time = '2000-12-01';
NOTICE: QUERY PLAN:

Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
width=103)

naslog=# explain select * from desconexao where time > '2000-12-01';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)

Why postgresql didn't use the time_idx index? I only changed the
operator "=" to ">" in clause "where".

Because the '>' condition is much less restrictive (notice the
difference in estimated row counts: 5 versus 657958). Unless the
planner's row count estimates are way off, it very likely *is* faster
to do the second query by sequential scan.

regards, tom lane

#4Marco Catunda
catunda@pobox.com
In reply to: Dave Smith (#2)
Re: Doesn't use index, why?

On 04 Jan 2001 11:55:14 -0500, Dave Smith wrote:

Marco Catunda wrote:

Hello,

I have some doubt about index.
Look the follow example:

naslog=# \d desconexao
Table "desconexao"
Attribute | Type | Modifier
----------------+-------------+-------------------------------------------------
id | integer | not null default
nextval('id_desconexao'::text)
time | timestamp |
client_user | varchar(20) |
client | varchar(40) |
ip_nas | inet |
ip_client_user | inet |
disconnect | smallint |
user_time | interval |
data_rate | integer |
called_number | varchar(14) |
calling_number | varchar(14) |
filtrado | boolean | default 'f'::bool
Indices: client_desconexao_idx,
desconexao_pkey,
filtro_idx,
time_idx

naslog=# \d time_idx
Index "time_idx"
Attribute | Type
-----------+-----------
time | timestamp
btree

naslog=# explain select * from desconexao where time = '2000-12-01';
NOTICE: QUERY PLAN:

Index Scan using time_idx on desconexao (cost=0.00..20.18 rows=5
width=103)

EXPLAIN

Every thing is perfect, but when i execute de follow query:

naslog=# explain select * from desconexao where time > '2000-12-01';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=657958 width=103)

EXPLAIN

Why postgresql didn't use the time_idx index? I only changed the
operator
"=" to ">" in clause "where".

Thank you.
Regards
-- Marco Catunda

Firstly you should include your version along with the report.

Two thoughts ... did you run vacuum?
Of the total number of records how many are greater than 2000-12-01? It
could be that postgresql figures that a large portion of the result set
is greater than this date and it is faster to do a sequential scan

Sorry, the version of Postgresql is 7.0.3

This is the number of records in database:

naslog=# select count(*) from desconexao;
count
--------
658617
(1 row)

I changed the data of example because all records is greater than
'2000-12-01', sorry.
But when I try '2000-12-10' the samething happens.

naslog=# explain select * from desconexao where time < '2000-12-10';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103)

EXPLAIN
naslog=# explain select * from desconexao where time >=
'2000-12-10';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103)

EXPLAIN

The number of records are:

naslog=# select count(*) from desconexao where time >= '2000-12-10';
count
--------
585789
(1 row)

naslog=# select count(*) from desconexao where time < '2000-12-10';
count
-------
72828
(1 row)

I think the estimate rows (231489) in query < '2000-12-10' is far away
to real value (72828).
So I execute vacuum analyze:

naslog=# vacuum verbose analyze desconexao;

NOTICE: --Relation desconexao--
NOTICE: Pages 11318: Changed 0, reaped 2616, Empty 0, New 0; Tup
658617: Vac 200, Keep/VTL 0/0, Crash 0, UnUsed 8007, MinLen 105,
MaxLen 166; Re-using: Free/Avail. Space 150084/35492;
EndEmpty/Avail. Pages 0/105. CPU 3.08s/27.90u sec.
NOTICE: Index client_desconexao_idx: Pages 9786; Tuples 658617:
Deleted 200. CPU 2.50s/3.33u sec.
NOTICE: Index filtro_idx: Pages 2946; Tuples 658617: Deleted 200.
CPU 0.70s/3.20u sec.
NOTICE: Index time_idx: Pages 5503; Tuples 658617: Deleted 200. CPU
1.52s/3.04u sec.
NOTICE: Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted
200. CPU 1.59s/3.12u sec.
NOTICE: Rel desconexao: Pages: 11318 --> 11315; Tuple(s) moved:
199. CPU 5.41s/1.03u sec.
NOTICE: Index client_desconexao_idx: Pages 9786; Tuples 658617:
Deleted 199. CPU 2.85s/1.79u sec.
NOTICE: Index filtro_idx: Pages 2946; Tuples 658617: Deleted 199.
CPU 0.71s/1.60u sec.
NOTICE: Index time_idx: Pages 5503; Tuples 658617: Deleted 199. CPU
1.51s/1.65u sec.
NOTICE: Index desconexao_pkey: Pages 5893; Tuples 658617: Deleted
199. CPU 1.59s/1.58u sec.
VACUUM

naslog=# explain select * from desconexao where time < '2000-12-10';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103)

EXPLAIN
naslog=# explain select * from desconexao where time >=
'2000-12-10';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103)

EXPLAIN

The same estimates values happen.
Is there a way to force index?

Thank you
-- Marco Catunda

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marco Catunda (#4)
Re: Doesn't use index, why?

Marco Catunda <catunda@pobox.com> writes:

naslog=# explain select * from desconexao where time < '2000-12-10';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=231489 width=103)

naslog=# explain select * from desconexao where time >=
'2000-12-10';
NOTICE: QUERY PLAN:

Seq Scan on desconexao (cost=0.00..19547.71 rows=427128 width=103)

The number of records are:

naslog=# select count(*) from desconexao where time >= '2000-12-10';
count
--------
585789
(1 row)

naslog=# select count(*) from desconexao where time < '2000-12-10';
count
-------
72828
(1 row)

In this case the planner is doing *exactly* the right thing; it is
smarter than you are. If you want to prove it, force the planner to
use an indexscan by doing SET ENABLE_SEQSCAN TO OFF. Then time the
query, and compare the runtime against the seqscan version.

The bottom line here is that a query that needs to touch more than a
few percent of the rows in a table is better off being done as a
seqscan.

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: Doesn't use index, why?

In this case the planner is doing *exactly* the right thing; it is
smarter than you are. If you want to prove it, force the planner to
use an indexscan by doing SET ENABLE_SEQSCAN TO OFF. Then time the
query, and compare the runtime against the seqscan version.

The bottom line here is that a query that needs to touch more than a
few percent of the rows in a table is better off being done as a
seqscan.

The only other workaround is to CLUSTER the table on an index, then
force an index scan. That _may_ be faster.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026