*really* simple select doesn't use indices ...

Started by Marc G. Fournierover 24 years ago6 messages
#1Marc G. Fournier
scrappy@hub.org

First, this is still a v7.1 system ... its totally possible that this is
long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
accept as a response ...

That said ... seems like a very painful way to arrive at 1 row ... :)

table structure:

globalmatch=# \d locations
Table "locations"
Attribute | Type | Modifier
-----------+---------+--------------------------------------------------------
gid | integer | not null default nextval('locationstmp_gid_seq'::text)
city | text |
state | text |
country | text |
zip | text |
location | point |
Indices: locations_zip,
locationstmp_gid_key

globalmatch=# \d locations_zip
Index "locations_zip"
Attribute | Type
-----------+------
zip | text
btree

globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Aggregate (cost=2950.18..2950.18 rows=1 width=16)
-> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN

globalmatch=# SELECT count(location) from locations WHERE zip = '80012';
count
-------
1
(1 row)

globalmatch=# SELECT count(location) from locations;
count
--------
123571
(1 row)

#2Marc G. Fournier
scrappy@hub.org
In reply to: Marc G. Fournier (#1)
appendum: Re: *really* simple select doesn't use indices ...

Okay, just bit the bullet, upgraded to v7.1.2, and the problem still
persists:

globalmatch=# vacuum verbose analyze locations;
NOTICE: --Relation locations--
NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
NOTICE: Index locationstmp_gid_key: Pages 272; Tuples 123571. CPU 0.01s/0.15u sec.
NOTICE: Index locations_zip: Pages 320; Tuples 123571. CPU 0.02s/0.14u sec.
NOTICE: Index locations_country: Pages 342; Tuples 123571. CPU 0.03s/0.13u sec.
NOTICE: --Relation pg_toast_9373225--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_9373225_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE: Analyzing...
VACUUM
globalmatch=# explain SELECT location from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN
globalmatch=# select version();
version
---------------------------------------------------------------------
PostgreSQL 7.1.2 on i386-unknown-freebsd4.3, compiled by GCC 2.95.3
(1 row)

On Mon, 28 May 2001, Marc G. Fournier wrote:

First, this is still a v7.1 system ... its totally possible that this is
long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
accept as a response ...

That said ... seems like a very painful way to arrive at 1 row ... :)

table structure:

globalmatch=# \d locations
Table "locations"
Attribute | Type | Modifier
-----------+---------+--------------------------------------------------------
gid | integer | not null default nextval('locationstmp_gid_seq'::text)
city | text |
state | text |
country | text |
zip | text |
location | point |
Indices: locations_zip,
locationstmp_gid_key

globalmatch=# \d locations_zip
Index "locations_zip"
Attribute | Type
-----------+------
zip | text
btree

globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Aggregate (cost=2950.18..2950.18 rows=1 width=16)
-> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN

globalmatch=# SELECT count(location) from locations WHERE zip = '80012';
count
-------
1
(1 row)

globalmatch=# SELECT count(location) from locations;
count
--------
123571
(1 row)

Marc G. Fournier scrappy@hub.org
Systems Administrator @ hub.org
scrappy@{postgresql|isc}.org ICQ#7615664

#3Gavin Sherry
swm@linuxworld.com.au
In reply to: Marc G. Fournier (#2)
Re: appendum: Re: *really* simple select doesn't use indices ...

Marc,

The column 'zip' is of type text. As such, indices will not be used except
in the case when the where clause is WHERE zip ~ '^<text>' for btree
indices.

Gavin

On Tue, 29 May 2001, Marc G. Fournier wrote:

Show quoted text

Okay, just bit the bullet, upgraded to v7.1.2, and the problem still
persists:

globalmatch=# vacuum verbose analyze locations;
NOTICE: --Relation locations--
NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.
NOTICE: Index locationstmp_gid_key: Pages 272; Tuples 123571. CPU 0.01s/0.15u sec.
NOTICE: Index locations_zip: Pages 320; Tuples 123571. CPU 0.02s/0.14u sec.
NOTICE: Index locations_country: Pages 342; Tuples 123571. CPU 0.03s/0.13u sec.
NOTICE: --Relation pg_toast_9373225--
NOTICE: Pages 0: Changed 0, reaped 0, Empty 0, New 0; Tup 0: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 0, MaxLen 0; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE: Index pg_toast_9373225_idx: Pages 1; Tuples 0. CPU 0.00s/0.00u sec.
NOTICE: Analyzing...
VACUUM
globalmatch=# explain SELECT location from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN
globalmatch=# select version();
version
---------------------------------------------------------------------
PostgreSQL 7.1.2 on i386-unknown-freebsd4.3, compiled by GCC 2.95.3
(1 row)

On Mon, 28 May 2001, Marc G. Fournier wrote:

First, this is still a v7.1 system ... its totally possible that this is
long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
accept as a response ...

That said ... seems like a very painful way to arrive at 1 row ... :)

table structure:

globalmatch=# \d locations
Table "locations"
Attribute | Type | Modifier
-----------+---------+--------------------------------------------------------
gid | integer | not null default nextval('locationstmp_gid_seq'::text)
city | text |
state | text |
country | text |
zip | text |
location | point |
Indices: locations_zip,
locationstmp_gid_key

globalmatch=# \d locations_zip
Index "locations_zip"
Attribute | Type
-----------+------
zip | text
btree

globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Aggregate (cost=2950.18..2950.18 rows=1 width=16)
-> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN

globalmatch=# SELECT count(location) from locations WHERE zip = '80012';
count
-------
1
(1 row)

globalmatch=# SELECT count(location) from locations;
count
--------
123571
(1 row)

Marc G. Fournier scrappy@hub.org
Systems Administrator @ hub.org
scrappy@{postgresql|isc}.org ICQ#7615664

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gavin Sherry (#3)
Re: appendum: Re: *really* simple select doesn't use indices ...

Gavin Sherry <swm@linuxworld.com.au> writes:

The column 'zip' is of type text. As such, indices will not be used except
in the case when the where clause is WHERE zip ~ '^<text>' for btree
indices.

Uh ... nonsense.

On Tue, 29 May 2001, Marc G. Fournier wrote:

globalmatch=# vacuum verbose analyze locations;
NOTICE: --Relation locations--
NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.

globalmatch=# explain SELECT location from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
rows, which is more than enough to drive it to a sequential scan
(with an average of more than three matched rows on every page of the
relation, there'd be no I/O savings at all from consulting the index).

Since the real number of matches is only 1, this estimate is obviously
way off. In 7.1 the estimate is being driven by the frequency of the
most common value in the column --- what is the most common value?
If you're lucky, the most common value is a dummy (empty string, maybe)
that you could replace by NULL with a few simple changes in application
logic. 7.1 is smart enough to distinguish NULL from real data values
in its estimates. If you're not lucky, there really are a few values
that are far more common than average, in which case you're stuck unless
you want to run development sources. Current sources should do a lot
better on that kind of data distribution.

regards, tom lane

#5The Hermit Hacker
scrappy@hub.org
In reply to: Tom Lane (#4)
Re: appendum: Re: *really* simple select doesn't use indices ...

On Tue, 29 May 2001, Tom Lane wrote:

Gavin Sherry <swm@linuxworld.com.au> writes:

The column 'zip' is of type text. As such, indices will not be used except
in the case when the where clause is WHERE zip ~ '^<text>' for btree
indices.

Uh ... nonsense.

Oh good, I was worried there for a sec ... :)

On Tue, 29 May 2001, Marc G. Fournier wrote:

globalmatch=# vacuum verbose analyze locations;
NOTICE: --Relation locations--
NOTICE: Pages 1395: Changed 0, reaped 0, Empty 0, New 0; Tup 123571: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 76, MaxLen 124; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.11s/0.00u sec.

globalmatch=# explain SELECT location from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

Okay, so it thinks that "zip = '80012'" will match 4217 out of 123571
rows, which is more than enough to drive it to a sequential scan
(with an average of more than three matched rows on every page of the
relation, there'd be no I/O savings at all from consulting the index).

Since the real number of matches is only 1, this estimate is obviously
way off. In 7.1 the estimate is being driven by the frequency of the
most common value in the column --- what is the most common value? If
you're lucky, the most common value is a dummy (empty string, maybe)
that you could replace by NULL with a few simple changes in
application logic. 7.1 is smart enough to distinguish NULL from real
data values in its estimates. If you're not lucky, there really are a
few values that are far more common than average, in which case you're
stuck unless you want to run development sources. Current sources
should do a lot better on that kind of data distribution.

Hit it right on the mark:

zip | cnt
-------+-------
| 81403
00210 | 1
00211 | 1

Will look at the code and see what I can do abuot that NULL issue ...
thanks :)

#6mlw
markw@mohawksoft.com
In reply to: Marc G. Fournier (#1)
Re: *really* simple select doesn't use indices ...

This is one my top two problem with Postgres, the seemingly braindead index
selection mechanism.

First, of course try "VACUUM ANALYZE'
Then if the fails, try

set ENABLE_SEQSCAN = off;

Then try your query.

"Marc G. Fournier" wrote:

Show quoted text

First, this is still a v7.1 system ... its totally possible that this is
long fixed, and I'm way overdue to get it to v7.1.2, which I'll gladly
accept as a response ...

That said ... seems like a very painful way to arrive at 1 row ... :)

table structure:

globalmatch=# \d locations
Table "locations"
Attribute | Type | Modifier
-----------+---------+--------------------------------------------------------
gid | integer | not null default nextval('locationstmp_gid_seq'::text)
city | text |
state | text |
country | text |
zip | text |
location | point |
Indices: locations_zip,
locationstmp_gid_key

globalmatch=# \d locations_zip
Index "locations_zip"
Attribute | Type
-----------+------
zip | text
btree

globalmatch=# EXPLAIN SELECT count(location) from locations WHERE zip = '80012';
NOTICE: QUERY PLAN:

Aggregate (cost=2950.18..2950.18 rows=1 width=16)
-> Seq Scan on locations (cost=0.00..2939.64 rows=4217 width=16)

EXPLAIN

globalmatch=# SELECT count(location) from locations WHERE zip = '80012';
count
-------
1
(1 row)

globalmatch=# SELECT count(location) from locations;
count
--------
123571
(1 row)

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)