If an index is based on 3 columns will a query using two of the columns utilize the index?

Started by Reid Thompsonover 20 years ago12 messagesgeneral
Jump to latest
#1Reid Thompson
Reid.Thompson@ateb.com

Example:
assume a table of 10 columns, three of which are fname, lname, and dob.
If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname = 'X'
and lname = 'Y') utilize the index?

thanks,
reid

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Reid Thompson (#1)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:

Example:
assume a table of 10 columns, three of which are fname, lname, and dob.
If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname = 'X'
and lname = 'Y') utilize the index?

Yes, if it is selective enough. (It _can_ use the index, which does not
mean that it _will_ use it.) Note that if your example query used the
columns (lname, dob), the answer would be "no."

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")

#3Michael Fuhr
mike@fuhr.org
In reply to: Reid Thompson (#1)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:

assume a table of 10 columns, three of which are fname, lname, and dob.
If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname = 'X'
and lname = 'Y') utilize the index?

See "Multicolumn Indexes" in the "Indexes" chapter of the documentation.

http://www.postgresql.org/docs/8.0/interactive/indexes-multicolumn.html

You can use EXPLAIN to see whether the planner will use an index for
a particular query.

http://www.postgresql.org/docs/8.0/interactive/performance-tips.html#USING-EXPLAIN

Note, however, that the planner will ignore an index and use a
sequential scan if it thinks the latter will be faster, so if you
want to see whether the query *can* use an index (as opposed to
*will* use it) then you could execute "SET enable_seqscan TO off"
and then run EXPLAIN (don't forget to RESET enable_seqscan or SET
it back to "on" when you're done testing).

--
Michael Fuhr

#4Reid Thompson
Reid.Thompson@ateb.com
In reply to: Michael Fuhr (#3)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

Alvaro Herrera wrote:

On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:

Example:
assume a table of 10 columns, three of which are fname, lname, and
dob. If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname =
'X' and lname = 'Y') utilize the index?

Yes, if it is selective enough. (It _can_ use the index,
which does not mean that it _will_ use it.) Note that if
your example query used the columns (lname, dob), the answer would be
"no."

Why is that? In order to use an index, does the query have to utilize
the 'first' element of the index?

reid

#5Michael Fuhr
mike@fuhr.org
In reply to: Reid Thompson (#4)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:

Alvaro Herrera wrote:

Note that if your example query used the columns (lname, dob),
the answer would be "no."

Why is that? In order to use an index, does the query have to utilize
the 'first' element of the index?

In released versions of PostgreSQL, yes. Version 8.1 will remove
that restriction.

--
Michael Fuhr

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Reid Thompson (#4)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:

Alvaro Herrera wrote:

On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:

Example:
assume a table of 10 columns, three of which are fname, lname, and
dob. If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname =
'X' and lname = 'Y') utilize the index?

Yes, if it is selective enough. (It _can_ use the index,
which does not mean that it _will_ use it.) Note that if
your example query used the columns (lname, dob), the answer would be
"no."

Why is that? In order to use an index, does the query have to utilize
the 'first' element of the index?

The "leftmost part." There's no way to scan an index if you don't know
the key. On a btree index, the key is ordered, and the columns at the
left are more significant than those at the right. If you don't provide
a value for the leftmost (first) column, there's no way to start
scanning the index because there's no starting point.

I don't think that was nearly clear enough, but OTOH I haven't had any
coffee today yet.

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"If you have nothing to say, maybe you need just the right tool to help you
not say it." (New York Times, about Microsoft PowerPoint)

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#2)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On Mon, Sep 12, 2005 at 09:43:57AM -0400, Reid Thompson wrote:

Example:
assume a table of 10 columns, three of which are fname, lname, and dob.
If an index is created on (fname, lname, dob), will a query that
utilizes two of the columns ( select 'data' from table where fname = 'X'
and lname = 'Y') utilize the index?

Yes, if it is selective enough. (It _can_ use the index, which does not
mean that it _will_ use it.) Note that if your example query used the
columns (lname, dob), the answer would be "no."

Actually, that last point is not true anymore as of 8.1 --- see this
thread:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php
which led to this patch:
http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

I missed the fact that the documentation said it wouldn't work though.
Will fix...

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On Mon, Sep 12, 2005 at 10:05:36AM -0400, Reid Thompson wrote:

Why is that? In order to use an index, does the query have to utilize
the 'first' element of the index?

The "leftmost part." There's no way to scan an index if you don't know
the key. On a btree index, the key is ordered, and the columns at the
left are more significant than those at the right. If you don't provide
a value for the leftmost (first) column, there's no way to start
scanning the index because there's no starting point.

Actually, btree doesn't have any particular problem with that --- it
just starts the scan at the beginning of the index. However the other
index types do all require a constraint on the first index column;
for instance hash has to be able to determine a hash value.

Greg Stark suggests here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php
that GiST could also be fixed to work with any subset of the index
columns, but it hasn't been done yet, unless Teodor and Oleg snuck
something in during that last round of GiST work.

regards, tom lane

#9Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#8)
Re: If an index is based on 3 columns will a query using

Greg Stark suggests here:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00966.php
that GiST could also be fixed to work with any subset of the index
columns, but it hasn't been done yet, unless Teodor and Oleg snuck
something in during that last round of GiST work.

GiST may work with any subset of index columns too. Even in existing code I
don't see any problem except NULL in a first column. GiST doesn't store tuples
with leading NULL value (gist.c lines 174, 326), so index doesn't contained them.

After our work about WAL-lization GiST, it may work with "invalid" tuples
(possibly occured after crash recovery), so itsn't a big deal to add support
NULL in a first column. But freeze date is outdated... Should I add or leave it
to 8.2?

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

Tom Lane <tgl@sss.pgh.pa.us> writes:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Yes, if it is selective enough. (It _can_ use the index, which does not
mean that it _will_ use it.) Note that if your example query used the
columns (lname, dob), the answer would be "no."

Actually, that last point is not true anymore as of 8.1 --- see this
thread:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00939.php
which led to this patch:
http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

Did that patch actually implement "skip scanning"?

The comment seems to only describe removing the restriction from the planner.
Which would make it theoretically possible but presumably the the cost
estimator should ensure it essentially never gets chosen for btree indexes.
The btree index would very very rarely help since it would require a complete
index scan.

I guess I could see some corner cases where it would help. Very wide tables
with an index on a few very selective relatively narrow columns. So the index
could be scanned in its entirety much faster than a full table scan. But the
index would have to be *much* narrower than the table and quite selective
to overcome the random access penalty.

Skip scanning would make it much more likely to be helpful.

Also, I think Oracle has another scan method called a "fast index scan" that
basically does a full sequential scan of the index. So the tuples come out
unordered but the access pattern is sequential. Would that be a good TODO for
Postgres? Is it feasible given the index disk structures in Postgres?

--
greg

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

Greg Stark <gsstark@mit.edu> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

http://archives.postgresql.org/pgsql-committers/2005-06/msg00156.php

Did that patch actually implement "skip scanning"?

No, it just removed the planner's arbitrary assumption that the index
methods wouldn't cope. Skip scanning is actually something rather
different anyway.

The comment seems to only describe removing the restriction from the planner.
Which would make it theoretically possible but presumably the the cost
estimator should ensure it essentially never gets chosen for btree indexes.

btcostestimate does understand this now.

I guess I could see some corner cases where it would help. Very wide tables
with an index on a few very selective relatively narrow columns. So the index
could be scanned in its entirety much faster than a full table scan. But the
index would have to be *much* narrower than the table and quite selective
to overcome the random access penalty.

With a bitmap index scan the penalty wouldn't be so high.

Also, I think Oracle has another scan method called a "fast index scan" that
basically does a full sequential scan of the index. So the tuples come out
unordered but the access pattern is sequential. Would that be a good TODO for
Postgres? Is it feasible given the index disk structures in Postgres?

I think this would probably fail under concurrent update conditions: you
couldn't guarantee not to miss or multiply return index entries. There
is interlocking in an index-order scan that prevents such problems, but
I don't see how it'd work for a physical-order scan.

You could probably make it work if you were willing to lock out writers
for the duration of the scan, but that'd severely restrict the
usefulness I would think. I'm also not sure how we'd express such a
constraint within the system...

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#9)
Re: If an index is based on 3 columns will a query using two of the columns utilize the index?

Teodor Sigaev <teodor@sigaev.ru> writes:

GiST may work with any subset of index columns too. Even in existing code I
don't see any problem except NULL in a first column. GiST doesn't store tuples
with leading NULL value (gist.c lines 174, 326), so index doesn't contained them.

Well, that's exactly the problem :-(. Or at least one of the problems;
the other being what you'd use as search key to find such tuples.

After our work about WAL-lization GiST, it may work with "invalid"
tuples (possibly occured after crash recovery), so itsn't a big deal
to add support NULL in a first column. But freeze date is
outdated... Should I add or leave it to 8.2?

Too late for 8.1 I'd say --- this definitely sounds like a new feature
rather than a bug fix.

regards, tom lane