Covering Indexes

Started by David E. Wheeleralmost 14 years ago36 messageshackers
Jump to latest
#1David E. Wheeler
david@kineticode.com

Hackers,

Very interesting design document for SQLite 4:

http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David

#2Andreas Joseph Krogh
andreak@officenet.no
In reply to: David E. Wheeler (#1)
Re: Covering Indexes

On 06/28/2012 02:16 PM, David E. Wheeler wrote:

Hackers,

Very interesting design document for SQLite 4:

http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David

This is analogous to SQL Server's "include" :

|CREATE NONCLUSTERED INDEX my_idx|
|ON my_table (status)|
|INCLUDE (someColumn, otherColumn)|

Which is useful, but bloats the index.

--
Andreas Joseph Krogh<andreak@officenet.no> - mob: +47 909 56 963
Senior Software Developer / CEO - OfficeNet AS - http://www.officenet.no
Public key: http://home.officenet.no/~andreak/public_key.asc

#3Rob Wultsch
wultsch@gmail.com
In reply to: David E. Wheeler (#1)
Re: Covering Indexes

On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david@justatheory.com> wrote:

Hackers,

Very interesting design document for SQLite 4:

 http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

   CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

   SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David

IRC MS SQL also allow unindexed columns in the index.

--
Rob Wultsch
wultsch@gmail.com

#4Jeff Janes
jeff.janes@gmail.com
In reply to: David E. Wheeler (#1)
Re: Covering Indexes

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david@justatheory.com> wrote:

Hackers,

Very interesting design document for SQLite 4:

 http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

   CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case. Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?

Cheers,

Jeff

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Jeff Janes (#4)
Re: Covering Indexes

On 06/28/2012 11:37 AM, Jeff Janes wrote:

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler<david@justatheory.com> wrote:

Hackers,

Very interesting design document for SQLite 4:

http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case. Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?

Presumably the comparison function will be faster the fewer attributes
it needs to compare.

cheers

andrew

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: Covering Indexes

Andrew Dunstan <andrew@dunslane.net> writes:

On 06/28/2012 11:37 AM, Jeff Janes wrote:

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler<david@justatheory.com> wrote:

I'm particularly intrigued by "covering indexes". For example:
CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

I don't see the virtue of this in this case. Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?

Presumably the comparison function will be faster the fewer attributes
it needs to compare.

Well, no, because queries will only be comparing the columns used in
the query. Insertions would likely actually be faster with the extra
columns considered significant, since we've known for a long time that
btree doesn't especially like large numbers of identical index entries.

When this came up a couple weeks ago, the argument that was made for it
was that you could attach non-significant columns to an index that *is*
unique. That might or might not be a wide enough use-case to justify
adding such a horrid kludge.

regards, tom lane

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#6)
Re: Covering Indexes

Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:

When this came up a couple weeks ago, the argument that was made for it
was that you could attach non-significant columns to an index that *is*
unique. That might or might not be a wide enough use-case to justify
adding such a horrid kludge.

The other question is whether such an index would prevent an update from
being HOT when the non-indexed values are touched. That could be a
significant difference.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Aidan Van Dyk
aidan@highrise.ca
In reply to: Alvaro Herrera (#7)
Re: Covering Indexes

On Thu, Jun 28, 2012 at 12:12 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

The other question is whether such an index would prevent an update from
being HOT when the non-indexed values are touched.  That could be a
significant difference.

I don't see Index-Only-Scans being something that will be used in
"high churn" tables.

So as long as the value of these "covering/included" fields is tied to
index-only scans, maybe it isn't a problem?

Of course, we have have a hard time convincing people that the "index
only" scans they want can't be "index only" because heap pages aren't
"all visible"...

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: Covering Indexes

Alvaro Herrera <alvherre@commandprompt.com> writes:

The other question is whether such an index would prevent an update from
being HOT when the non-indexed values are touched.

Surely it would *have* to, whether the columns are significant or not
for uniqueness purposes. Else an index-only scan gives the wrong value
after the update.

regards, tom lane

#10Jeff Janes
jeff.janes@gmail.com
In reply to: Alvaro Herrera (#7)
Re: Covering Indexes

On Thu, Jun 28, 2012 at 9:12 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Tom Lane's message of jue jun 28 12:07:58 -0400 2012:

When this came up a couple weeks ago, the argument that was made for it
was that you could attach non-significant columns to an index that *is*
unique.  That might or might not be a wide enough use-case to justify
adding such a horrid kludge.

The other question is whether such an index would prevent an update from
being HOT when the non-indexed values are touched.

That seems like an easy question to answer. How could it not disable
HOT and still work correctly?

 That could be a
significant difference.

True, adding the covering column would not always be a win. But
surely it more likely to be a win when it can be done without adding
yet another index that also needs to be maintained.

Cheers,

Jeff

#11Csaba Nagy
ncslists@googlemail.com
In reply to: Jeff Janes (#4)
Re: Covering Indexes

Hi all,

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david@justatheory.com> wrote:
I don't see the virtue of this in this case. Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?

Why not restrict it to UNIQUE indexes ?

For not unique indexes it has no advantages (it could be in fact indexed
on all columns anyway as an implementation detail).

For the unique case the problem of many identical entries mentioned by
Tom is not relevant, so the additional data will only bloat the index
but not otherwise affect the index performance.

Would this get close enough to index-covered table ? _That_ would be
interesting - I have a really big table (table/index size: 370G/320G,
~8*10^9 rows) which is basically using double space because it's primary
key is covering all columns of the table.

Cheers,
Csaba.

#12Eric McKeeth
eldin00@gmail.com
In reply to: Rob Wultsch (#3)
Re: Covering Indexes

On Thu, Jun 28, 2012 at 7:02 AM, Rob Wultsch <wultsch@gmail.com> wrote:

On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david@justatheory.com> wrote:

Hackers,

Very interesting design document for SQLite 4:

 http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

   CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

   SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

And I wonder if it would work well with expressions, too?

David

IRC MS SQL also allow unindexed columns in the index.

--
Rob Wultsch
wultsch@gmail.com

MS SQL Server does support including non-key columns in indexes,
allowing index only scans for queries returning these columns. Their
syntax is different from that proposed in the linked SQLite document.
To the best of my experience, the advantages in SQL Server to such an
index (as opposed to just adding the columns to the index normally)
are as follows:

1- You can include extra columns in a unique index which do not
participate in the uniqueness determination.
2- The non-key columns can be of types which normally cannot be
included in a b-tree index due to lack of proper sorting operators.
3- The resulting index is smaller, because the non-key columns are
only contained in leaf nodes, not in internal nodes.
4- The insert/update overhead is lower.

Of course, an implementation in a different database system, such as
Postgres, may or may not have the same set of benefits. Number 4 in
particular seems to be dependent on the details of the b-tree
implementation. In my mind numbers 1 and 2 are the most compelling
arguments in favor of this feature. Whether the it's worth the effort
of coding the feature would depend on how well the above benefits (or
any benefits I missed) hold true, and how useful such an index
actually proved for index only scans in Postgres.

#13Thomas Munro
thomas.munro@gmail.com
In reply to: Rob Wultsch (#3)
Re: Covering Indexes

On 28 June 2012 14:02, Rob Wultsch <wultsch@gmail.com> wrote:

On Thu, Jun 28, 2012 at 8:16 AM, David E. Wheeler <david@justatheory.com> wrote:

I'm particularly intrigued by "covering indexes". For example:

   CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

IRC MS SQL also allow unindexed columns in the index.

For what it's worth, DB2 also has this feature, written roughly the
same way as MS SQL Server: CREATE INDEX cover1 ON table1(a, b) INCLUDE
(c, d).

http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.sql.ref.doc/doc/r0000919.html

Oracle doesn't seem to have this feature (and the SQL standard doesn't
mention indexes at all).

#14Bruce Momjian
bruce@momjian.us
In reply to: Csaba Nagy (#11)
Re: Covering Indexes

On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote:

Hi all,

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler <david@justatheory.com> wrote:
I don't see the virtue of this in this case. Since the index is not
unique, why not just put the index on (a,b,c,d) and be done with it?
Is there some advantage to be had in inventing a way to store c and d
in the index without having them usable for indexing?

Why not restrict it to UNIQUE indexes ?

For not unique indexes it has no advantages (it could be in fact indexed
on all columns anyway as an implementation detail).

For the unique case the problem of many identical entries mentioned by
Tom is not relevant, so the additional data will only bloat the index
but not otherwise affect the index performance.

Would this get close enough to index-covered table ? _That_ would be
interesting - I have a really big table (table/index size: 370G/320G,
~8*10^9 rows) which is basically using double space because it's primary
key is covering all columns of the table.

I was wondering if there was some way to specify an expression index
that did a unique index check on some columns but included more columns
not part of the unique index.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#15Cédric Villemain
cedric@2ndquadrant.com
In reply to: Bruce Momjian (#14)
Re: Covering Indexes

Le vendredi 6 juillet 2012 15:41:01, Bruce Momjian a écrit :

On Fri, Jun 29, 2012 at 08:10:03AM +0200, Csaba Nagy wrote:

Hi all,

On Thu, Jun 28, 2012 at 5:16 AM, David E. Wheeler
<david@justatheory.com> wrote: I don't see the virtue of this in this
case. Since the index is not unique, why not just put the index on
(a,b,c,d) and be done with it? Is there some advantage to be had in
inventing a way to store c and d in the index without having them
usable for indexing?

Why not restrict it to UNIQUE indexes ?

For not unique indexes it has no advantages (it could be in fact indexed
on all columns anyway as an implementation detail).

For the unique case the problem of many identical entries mentioned by
Tom is not relevant, so the additional data will only bloat the index
but not otherwise affect the index performance.

Would this get close enough to index-covered table ? _That_ would be
interesting - I have a really big table (table/index size: 370G/320G,
~8*10^9 rows) which is basically using double space because it's primary
key is covering all columns of the table.

I was wondering if there was some way to specify an expression index
that did a unique index check on some columns but included more columns
not part of the unique index.

I haven't tryed it, but I suppose that Exclusion Constraint should allow that.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Csaba Nagy (#11)
Re: Covering Indexes

Csaba Nagy <ncslists@googlemail.com> writes:

Why not restrict it to UNIQUE indexes ?

What benefit would such a restriction provide? AFAICS it doesn't make
implementation any easier.

regards, tom lane

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: David E. Wheeler (#1)
Re: Covering Indexes

On 28 June 2012 13:16, David E. Wheeler <david@justatheory.com> wrote:

Very interesting design document for SQLite 4:

http://www.sqlite.org/src4/doc/trunk/www/design.wiki

I'm particularly intrigued by "covering indexes". For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

This allows the following query to do an index-only scan:

SELECT c, d FROM table1 WHERE a=? AND b=?;

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

Just to be clear, the ability to have covered indexes is already in
9.2, I updated the release notes to explain that a few months back.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18David E. Wheeler
david@kineticode.com
In reply to: Simon Riggs (#17)
Re: Covering Indexes

On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote:

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

Just to be clear, the ability to have covered indexes is already in
9.2, I updated the release notes to explain that a few months back.

You mean this?

Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)

This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visible tuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementing this feature.

That’s not how SQLite is using the term “covering index.” What they mean is the ability to have additional, unindexed columns in an index, so that they can *also* be returned in the event of an index-only scan.

Best,

David

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: David E. Wheeler (#18)
Re: Covering Indexes

On 17 July 2012 16:21, David E. Wheeler <david@justatheory.com> wrote:

On Jul 17, 2012, at 5:18 PM, Simon Riggs wrote:

Now that we have index-only scans in 9.2, I'm wondering if it would make sense to add covering index support, too, where additional, unindexed columns are stored alongside indexed columns.

Just to be clear, the ability to have covered indexes is already in
9.2, I updated the release notes to explain that a few months back.

You mean this?

Allow queries to retrieve data only from indexes, avoiding heap access (Robert Haas, Ibrar Ahmed, Heikki Linnakangas, Tom Lane)

This is often called "index-only scans" or "covering indexes". This is possible for heap pages with exclusively all-visible tuples, as reported by the visibility map. The visibility map was made crash-safe as a necessary part of implementing this feature.

That’s not how SQLite is using the term “covering index.” What they mean is the ability to have additional, unindexed columns in an index, so that they can *also* be returned in the event of an index-only scan.

CREATE INDEX ON foo (a, b, c, d);

allows

SELECT c, d FROM foo WHERE a = ? AND b = ?

to use an index only scan.

The phrase "unindexed" seems misleading since the data is clearly in
the index from the description on the URL you gave. And since the
index is non-unique, I don't see any gap between Postgres and
SQLliite4.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#20David E. Wheeler
david@kineticode.com
In reply to: Simon Riggs (#19)
Re: Covering Indexes

On Jul 17, 2012, at 5:32 PM, Simon Riggs wrote:

CREATE INDEX ON foo (a, b, c, d);

allows

SELECT c, d FROM foo WHERE a = ? AND b = ?

to use an index only scan.

The phrase "unindexed" seems misleading since the data is clearly in
the index from the description on the URL you gave. And since the
index is non-unique, I don't see any gap between Postgres and
SQLliite4.

Yeah, but that index is unnecessarily big if one will never use c or d in the search. The nice thing about covering indexes as described for SQLite 4 and implemented in MSSQL is that you can specify additional columns that just come along for the ride, but are not part of the indexed data:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d);

Yes, you can do that by also indexing c and d as of 9.2, but it might be nice to be able to include them in the index as additional row data without actually indexing them.

Best,

David

#21Simon Riggs
simon@2ndQuadrant.com
In reply to: David E. Wheeler (#20)
#22Vik Fearing
vik@postgresfriends.org
In reply to: Simon Riggs (#21)
#23David G. Johnston
david.g.johnston@gmail.com
In reply to: David E. Wheeler (#20)
#24Simon Riggs
simon@2ndQuadrant.com
In reply to: David G. Johnston (#23)
#25Andrew Dunstan
andrew@dunslane.net
In reply to: David G. Johnston (#23)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#20)
#27Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#24)
#28Merlin Moncure
mmoncure@gmail.com
In reply to: Bruce Momjian (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Merlin Moncure (#28)
#30Merlin Moncure
mmoncure@gmail.com
In reply to: Robert Haas (#29)
#31Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#27)
#32Jeff Davis
pgsql@j-davis.com
In reply to: Bruce Momjian (#27)
#33Merlin Moncure
mmoncure@gmail.com
In reply to: Jeff Davis (#32)
#34Jeff Davis
pgsql@j-davis.com
In reply to: Merlin Moncure (#33)
#35Jeff Janes
jeff.janes@gmail.com
In reply to: Merlin Moncure (#33)
#36Florian Weimer
fw@deneb.enyo.de
In reply to: Jeff Janes (#4)