select distinct

Started by Timothy H. Keittover 25 years ago2 messages
#1Timothy H. Keitt
keitt@nceas.ucsb.edu

I've noticed that adding an index on a column has no effect on "select
distinct" (index or no index, the entire table is sequence scanned; see
example below). Couldn't you just traverse the index to get the
distinct values?

Tim

keitt=# vacuum analyze test;
VACUUM
keitt=# select distinct a from test;
a
---
a
b
c
d
f
s
z
(7 rows)

keitt=# explain select distinct a from test;
NOTICE: QUERY PLAN:

Unique (cost=1.67..1.72 rows=2 width=12)
-> Sort (cost=1.67..1.67 rows=21 width=12)
-> Seq Scan on test (cost=0.00..1.21 rows=21 width=12)

EXPLAIN
keitt=# create index test_index on test (a);
CREATE
keitt=# explain select distinct a from test;
NOTICE: QUERY PLAN:

Unique (cost=1.67..1.72 rows=2 width=12)
-> Sort (cost=1.67..1.67 rows=21 width=12)
-> Seq Scan on test (cost=0.00..1.21 rows=21 width=12)

EXPLAIN

--
Timothy H. Keitt
National Center for Ecological Analysis and Synthesis
735 State Street, Suite 300, Santa Barbara, CA 93101
Phone: 805-892-2519, FAX: 805-892-2510
http://www.nceas.ucsb.edu/~keitt/

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Timothy H. Keitt (#1)
Re: select distinct

"Timothy H. Keitt" <keitt@nceas.ucsb.edu> writes:

Couldn't you just traverse the index to get the distinct values?

No, because the index doesn't contain commit status; you can't tell
which values are actually valid without visiting the main table.

7.0 does consider both indexscan | unique and seqscan | sort | unique
plans for this problem, but the explicit sort is usually faster
(according to the system's cost models, anyway).

regards, tom lane