the big picture for index-only scans

Started by Robert Haasalmost 15 years ago88 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

So, what do we need in order to find our way to index-only scans?

1. The visibility map needs to be crash-safe. The basic idea of
index-only scans is that, instead of checking the heap to find out
whether each tuple is visible, we first check the visibility map. If
the visibility map bit is set, then we know all tuples on the page are
visible to all transactions, and therefore the tuple of interest is
visible to our transaction. Assuming that a significant number of
visibility map bits are set, this should enable us to avoid a fair
amount of I/O, especially on large tables, because the visibility map
is roughly 8000 times smaller than the heap, and therefore far more
practical to keep in cache. However, before we can rely on the
visibility map for this purpose, we need to fix the problems that can
leave bits set inappropriately in the face of an inconveniently-timed
crash. I've been working on a patch for this on and off for a few
months now; my latest version is in need of review[1]http://archives.postgresql.org/pgsql-hackers/2011-05/msg00292.php.

2. Crash safe visibility map vs. pg_upgrade. Even if we make the
visibility map crash-safe in 9.2, people are going to want to use
pg_upgrade to migrate from older versions, bringing their
possibly-not-quite-correct visibility map forks along with them. How
should we handle that? We could (2A) arrange to have pg_upgrade nuke
all visibility forks when upgrading from a release where the
visibility map is not crash-safe to one where it is; (2B) keep a piece
of state somewhere indicating, for each relation, whether or not the
visibility map can be trusted, set it to false only if pg_upgrade
brings the relation over from and older version, and set it to true
after a successful vacuum that skips no intervening pages; or (2C)
advise the user to do a VACUUM FULL on each of their tables
pre-upgrade, and if they don't, treat wrong answers as their own
fault. (I doubt anyone will advocate for this option, but for the
sake of completeness...). (2A) seems like the simplest solution,
especially because it also avoids the overhead of checking the "is the
visibility map bit reliable?" flag every time we want to plan a query.

3. Statistics. I believe that in order to accurately estimate the
cost of an index-only scan, we're going to need to know the fraction
of tuples that are on pages whose visibility map bits are set. I
believe it should be fairly straightforward to have ANALYZE collect
this information; and I'm inclined to do that as a separate patch. It
seems like it would also be nice to know what fraction of tuples are
on pages that don't have the visibility map set but where, in fact,
all tuples on the page are visible to all transactions, so it would be
legal to set the bit. A large discrepancy between these two
percentages might be a good reason to auto-vacuum the table (perhaps
using a "really lazy vacuum"[2]http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php). I'm not sure if this can be added
cheaply, though, and in any case, any change to the set of criteria
that will trigger an auto-vacuum is probably a can of worms.
Thoughts?

4. Planner and executor changes. In contrast to Heikki's original
implementation, I'm inclined to not to try to split the Index Scan
node into index scan and heap fetch. Since there are many choices for
where to put the heap fetch node (any level of the join tree between
the index scan and the root), this seems likely to result in a
combinatorial explosion of paths[3]http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php, and I'm not real sure that the
payback will be adequate. Furthermore, the idea of allowing user code
to see tuples that will only later be determined not to have been
visible to that MVCC snapshot in the first place seems pretty scary
from a security perspective, though certainly there are possible
benefits[4]http://archives.postgresql.org/pgsql-hackers/2009-07/msg00675.php. Instead, I'm inclined to just have the planner evaluate
whether the necessary columns can be extracted purely from the index.
If not, we proceed as now. If so, we can use the "index only"
approach of using the visibility map to decide which heap fetches can
be skipped. It's not clear to me whether we need to compare the cost
of the standard approach with the cost of the "index only" approach:
in theory, if there aren't any visibility map bits anyway, the "index
only" approach could be slower. But I'm not sure whether that problem
is significant or common enough to be worth expending a lot of code
on. Either way, the number of actual paths doesn't need to increase,
because in this design, even if we apply a costing model, one approach
will dominate the other. Heikki also suggested considering index
scans in cases where we don't now[4, again] but I'm inclined to leave
this, too, for a later optimization, again because balancing the
increase in paths against the possible performance benefits of using
indexes in more situations seems finicky. In short, for a first cut
at this, I just want to look at this as a way to get cheaper index
scans, and leave everything else to future work.

Any thoughts welcome. Incidentally, if anyone else feels like working
on this, feel free to let me know and I'm happy to step away, from all
of it or from whatever part someone else wants to tackle. I'm mostly
working on this because it's something that I think we really need to
get done, more than having a burning desire to be the one who does it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: http://archives.postgresql.org/pgsql-hackers/2011-05/msg00292.php
[2]: http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php
[3]: http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php
[4]: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00675.php

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Robert Haas (#1)
Re: the big picture for index-only scans

On Mon, May 9, 2011 at 10:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

So, what do we need in order to find our way to index-only scans?

1. The visibility map needs to be crash-safe.  The basic idea of
index-only scans is that, instead of checking the heap to find out
whether each tuple is visible, we first check the visibility map.  If
the visibility map bit is set, then we know all tuples on the page are
visible to all transactions, and therefore the tuple of interest is
visible to our transaction.  Assuming that a significant number of
visibility map bits are set, this should enable us to avoid a fair
amount of I/O, especially on large tables, because the visibility map
is roughly 8000 times smaller than the heap, and therefore far more
practical to keep in cache.

hm, what are the implications for tuple hint bits, short and long
term? I'm particularly interested if you think any hint bit i/o
mitigation strategies are worth pursuing.

2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
visibility map crash-safe in 9.2, people are going to want to use
pg_upgrade to migrate from older versions, bringing their
possibly-not-quite-correct visibility map forks along with them.  How
should we handle that?  We could (2A) arrange to have pg_upgrade nuke
all visibility forks when upgrading from a release where the
visibility map is not crash-safe to one where it is;

+1 on 2A.

3. Statistics.  I believe that in order to accurately estimate the
cost of an index-only scan, we're going to need to know the fraction
of tuples that are on pages whose visibility map bits are set.

It would be helpful to know the performance benefit of index only
scans before knowing how much benefit to attribute here. Maybe a
system wide kludge would for starters anyway, like assuming 60% of
pages can be vis checked from the VM, or a single GUC, Then again,
maybe not.

merlin

#3Robert Haas
robertmhaas@gmail.com
In reply to: Merlin Moncure (#2)
Re: the big picture for index-only scans

On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

1. The visibility map needs to be crash-safe.  The basic idea of
index-only scans is that, instead of checking the heap to find out
whether each tuple is visible, we first check the visibility map.  If
the visibility map bit is set, then we know all tuples on the page are
visible to all transactions, and therefore the tuple of interest is
visible to our transaction.  Assuming that a significant number of
visibility map bits are set, this should enable us to avoid a fair
amount of I/O, especially on large tables, because the visibility map
is roughly 8000 times smaller than the heap, and therefore far more
practical to keep in cache.

hm, what are the implications for tuple hint bits, short and long
term?  I'm particularly interested if you think any hint bit i/o
mitigation strategies are worth pursuing.

Well, I don't really want to let this thread on my project get
hijacked to talk about your project (not that I haven't been guilty of
that myself!) but, in brief, I think the main effect of index-only
scans is that the performance difference between a vacuumed table and
an unvacuumed table is going to increase. It's already the case that
sequential scanning a table which has been vacuumed (and, therefore,
all the pages are marked all-visible) is noticeably faster than
sequential scanning a table which is not vacuumed (even if all the
hint bits are set). Index-only scans are going to extend that by
making index scans run faster on a table with lots of all-visible
tables than on one where no pages are all-visible. So the importance
of vacuuming an insert-only table occasionally (which autovacuum won't
do, at present, until it's needed to prevent XID wraparound) is
already more than zero, and it's going to go up. But the all-visible
bits aren't quite the same as hint bits: I don't think there's any
impact on hint bits per se.

2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
visibility map crash-safe in 9.2, people are going to want to use
pg_upgrade to migrate from older versions, bringing their
possibly-not-quite-correct visibility map forks along with them.  How
should we handle that?  We could (2A) arrange to have pg_upgrade nuke
all visibility forks when upgrading from a release where the
visibility map is not crash-safe to one where it is;

+1 on 2A.

OK. Anybody else?

3. Statistics.  I believe that in order to accurately estimate the
cost of an index-only scan, we're going to need to know the fraction
of tuples that are on pages whose visibility map bits are set.

It would be helpful to know the performance benefit of index only
scans before knowing how much benefit to attribute here.  Maybe a
system wide kludge would for starters anyway, like assuming 60% of
pages can be vis checked from the VM, or a single GUC, Then again,
maybe not.

Yeah, maybe I should try to beat the main patch into some kind of
shape before working too much on the statistics stuff. Then we could
actually benchmark it a bit, which would be good. I don't think that
a system-wide kludge or GUC is going to work for prime time, but it's
probably fine for initial performance testing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Merlin Moncure
mmoncure@gmail.com
In reply to: Robert Haas (#3)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 8:22 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

1. The visibility map needs to be crash-safe.  The basic idea of
index-only scans is that, instead of checking the heap to find out
whether each tuple is visible, we first check the visibility map.  If
the visibility map bit is set, then we know all tuples on the page are
visible to all transactions, and therefore the tuple of interest is
visible to our transaction.  Assuming that a significant number of
visibility map bits are set, this should enable us to avoid a fair
amount of I/O, especially on large tables, because the visibility map
is roughly 8000 times smaller than the heap, and therefore far more
practical to keep in cache.

hm, what are the implications for tuple hint bits, short and long
term?  I'm particularly interested if you think any hint bit i/o
mitigation strategies are worth pursuing.

Well, I don't really want to let this thread on my project get
hijacked to talk about your project (not that I haven't been guilty of
that myself!)

no, that wasn't my intent at all, except in the sense of wondering if
a crash-safe visibility map provides a route of displacing a lot of
hint bit i/o and by extension, making alternative approaches of doing
that, including mine, a lot less useful. that's a good thing.

meaning: since the vis map approach is going to be a fairly large win
over the classic approach to checking visibility in so many scenarios,
maybe the real long term goal should be just being as aggressive as
possible in terms of making sure it's set properly, and just give up
and be a bit more brute forcey when it's not set. it's a fair
question. that's a pretty broad statement, but that's what I'm
thinking about.

merlin

#5Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#1)
Re: the big picture for index-only scans

2011/5/10 Robert Haas <robertmhaas@gmail.com>:

So, what do we need in order to find our way to index-only scans?

3. Statistics.  I believe that in order to accurately estimate the
cost of an index-only scan, we're going to need to know the fraction
of tuples that are on pages whose visibility map bits are set.  I
believe it should be fairly straightforward to have ANALYZE collect
this information; and I'm inclined to do that as a separate patch.  It
seems like it would also be nice to know what fraction of tuples are
on pages that don't have the visibility map set but where, in fact,
all tuples on the page are visible to all transactions, so it would be
legal to set the bit.  A large discrepancy between these two
percentages might be a good reason to auto-vacuum the table (perhaps
using a "really lazy vacuum"[2]).  I'm not sure if this can be added
cheaply, though, and in any case, any change to the set of criteria
that will trigger an auto-vacuum is probably a can of worms.
Thoughts?

ANALYZE can do the stats job for 'free' on the pages it collects
anyway. So that looks like a good idea.
I believe the really lazy vacuum is another topic; even if it will
improve the performance of the index only scan to have tables already
vacuuumed, the stats should expose that and the function
cost_index(_only?)() taking care of that.

4. Planner and executor changes.  In contrast to Heikki's original
implementation, I'm inclined to not to try to split the Index Scan
node into index scan and heap fetch.  Since there are many choices for
where to put the heap fetch node (any level of the join tree between
the index scan and the root), this seems likely to result in a
combinatorial explosion of paths[3], and I'm not real sure that the
payback will be adequate.  Furthermore, the idea of allowing user code
to see tuples that will only later be determined not to have been
visible to that MVCC snapshot in the first place seems pretty scary
from a security perspective, though certainly there are possible
benefits[4].  Instead, I'm inclined to just have the planner evaluate
whether the necessary columns can be extracted purely from the index.

The temptation is high to estimate the cost of an "index_scan(only) +
ordered(by ctid) table pages fetch if heap required". (this is what I
understood from heikki suggestion 3-4. and it makes sense). It may be
easier to implement both at once but I didn't find the branch in the
Heikki's git repos. (probably removed since the long time)

If not, we proceed as now.  If so, we can use the "index only"
approach of using the visibility map to decide which heap fetches can
be skipped.  It's not clear to me whether we need to compare the cost
of the standard approach with the cost of the "index only" approach:
in theory, if there aren't any visibility map bits anyway, the "index
only" approach could be slower.  But I'm not sure whether that problem
is significant or common enough to be worth expending a lot of code
on.  Either way, the number of actual paths doesn't need to increase,
because in this design, even if we apply a costing model, one approach
will dominate the other.  Heikki also suggested considering index
scans in cases where we don't now[4, again] but I'm inclined to leave
this, too, for a later optimization, again because balancing the
increase in paths against the possible performance benefits of using
indexes in more situations seems finicky.  In short, for a first cut
at this, I just want to look at this as a way to get cheaper index
scans, and leave everything else to future work.

Based on ANALYZE stats for the visibility, I believe cost_index and
cost_index_only should be very similar functions (well, atm, I don't
see the point to split it in 2 functions).

Any thoughts welcome.  Incidentally, if anyone else feels like working
on this, feel free to let me know and I'm happy to step away, from all
of it or from whatever part someone else wants to tackle.  I'm mostly
working on this because it's something that I think we really need to
get done, more than having a burning desire to be the one who does it.

Indexonly scans are welcome!
I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://archives.postgresql.org/pgsql-hackers/2011-05/msg00292.php
[2] http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php
[3] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php
[4] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00675.php

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#6Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#5)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

ANALYZE can do the stats job for 'free' on the pages it collects
anyway. So that looks like a good idea.
I believe the really lazy vacuum is another topic; even if it will
improve the performance of the index only scan to have tables already
vacuuumed, the stats should expose that and the function
cost_index(_only?)() taking care of that.

I basically agree. The connection is that - as we use the all-visible
for more things, the performance penalty for failing to vacuum (say)
an insert-only table will continue to grow. Still, as you say,
clearly a separate topic.

The temptation is high to estimate the cost of an "index_scan(only) +
ordered(by ctid) table pages fetch if heap required". (this is what I
understood from heikki suggestion 3-4. and it makes sense). It may be
easier to implement both at once but I didn't find the branch in the
Heikki's git repos. (probably removed since the long time)

I was thinking about this as well, at least if I understand you
correctly. That would be similar to a bitmap index scan, and I think
it would be a great thing to have, not only because it would allow us
to get the advantages of index-only scans in situations that are
well-suited to our current bitmap scans, but also because it could be
batched. You could allocate a buffer of work_mem bytes and fill it up
with TIDs; then, when it's full, you sort the buffer and start doing
the necessary heap fetches in physical order. If you still need more
rows, you can clear the buffer and go around for another pass.

Based on ANALYZE stats for the visibility, I believe cost_index and
cost_index_only should be very similar functions (well, atm, I don't
see the point to split it in 2 functions).

Yeah, I would more imagine modifying the existing function.

Any thoughts welcome.  Incidentally, if anyone else feels like working
on this, feel free to let me know and I'm happy to step away, from all
of it or from whatever part someone else wants to tackle.  I'm mostly
working on this because it's something that I think we really need to
get done, more than having a burning desire to be the one who does it.

Indexonly scans are welcome!
I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

Well, I have code for #1, and just need reviews, and #2 shouldn't be
that hard, and with luck I'll twist Bruce's arm into doing it (*waves
to Bruce*). So #3 and #4 are the next thing to tackle. Any thoughts
on what/how you'd like to contribute there?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#6)
Re: the big picture for index-only scans

2011/5/10 Robert Haas <robertmhaas@gmail.com>:

On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

ANALYZE can do the stats job for 'free' on the pages it collects
anyway. So that looks like a good idea.
I believe the really lazy vacuum is another topic; even if it will
improve the performance of the index only scan to have tables already
vacuuumed, the stats should expose that and the function
cost_index(_only?)() taking care of that.

I basically agree.  The connection is that - as we use the all-visible
for more things, the performance penalty for failing to vacuum (say)
an insert-only table will continue to grow.  Still, as you say,
clearly a separate topic.

The temptation is high to estimate the cost of an "index_scan(only) +
ordered(by ctid) table pages fetch if heap required". (this is what I
understood from heikki suggestion 3-4. and it makes sense). It may be
easier to implement both at once but I didn't find the branch in the
Heikki's git repos. (probably removed since the long time)

I was thinking about this as well, at least if I understand you
correctly.  That would be similar to a bitmap index scan, and I think
it would be a great thing to have, not only because it would allow us
to get the advantages of index-only scans in situations that are
well-suited to our current bitmap scans, but also because it could be
batched.  You could allocate a buffer of work_mem bytes and fill it up
with TIDs; then, when it's full, you sort the buffer and start doing
the necessary heap fetches in physical order.  If you still need more
rows, you can clear the buffer and go around for another pass.

Based on ANALYZE stats for the visibility, I believe cost_index and
cost_index_only should be very similar functions (well, atm, I don't
see the point to split it in 2 functions).

Yeah, I would more imagine modifying the existing function.

Any thoughts welcome.  Incidentally, if anyone else feels like working
on this, feel free to let me know and I'm happy to step away, from all
of it or from whatever part someone else wants to tackle.  I'm mostly
working on this because it's something that I think we really need to
get done, more than having a burning desire to be the one who does it.

Indexonly scans are welcome!
I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

Well, I have code for #1, and just need reviews, and #2 shouldn't be
that hard, and with luck I'll twist Bruce's arm into doing it (*waves
to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
on what/how you'd like to contribute there?

I can provide initial patchs for cost and analyze, at least.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#1)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 3:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:

So, what do we need in order to find our way to index-only scans?

1. The visibility map needs to be crash-safe.  The basic idea of
index-only scans is that, instead of checking the heap to find out
whether each tuple is visible, we first check the visibility map.  If

This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.

This will be a complex addition to the codebase and one that could
introduce bugs into MVCC. It seems reasonable to look at what the
benefit of this would be, and what the use case/ benefit profile is
before we spend a long time adding this optimization.

I asked for this previously on earlier threads also.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#9Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#6)
Re: the big picture for index-only scans

The temptation is high to estimate the cost of an "index_scan(only) +
ordered(by ctid) table pages fetch if heap required". (this is what I
understood from heikki suggestion 3-4. and it makes sense). It may be
easier to implement both at once but I didn't find the branch in the
Heikki's git repos. (probably removed since the long time)

I was thinking about this as well, at least if I understand you

yes.

correctly.  That would be similar to a bitmap index scan, and I think
it would be a great thing to have, not only because it would allow us
to get the advantages of index-only scans in situations that are
well-suited to our current bitmap scans, but also because it could be
batched.  You could allocate a buffer of work_mem bytes and fill it up
with TIDs; then, when it's full, you sort the buffer and start doing
the necessary heap fetches in physical order.  If you still need more
rows, you can clear the buffer and go around for another pass.

Issue remaining here is that we don't have 'safe' Indexonly_scan, just
indexscan with probability on the 'only'.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#8)
Re: the big picture for index-only scans

Simon Riggs <simon@2ndQuadrant.com> wrote:

This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.

In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry. That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible. I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.

As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.

-Kevin

#11Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#7)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 11:27 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

2011/5/10 Robert Haas <robertmhaas@gmail.com>:

On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

ANALYZE can do the stats job for 'free' on the pages it collects
anyway. So that looks like a good idea.
I believe the really lazy vacuum is another topic; even if it will
improve the performance of the index only scan to have tables already
vacuuumed, the stats should expose that and the function
cost_index(_only?)() taking care of that.

I basically agree.  The connection is that - as we use the all-visible
for more things, the performance penalty for failing to vacuum (say)
an insert-only table will continue to grow.  Still, as you say,
clearly a separate topic.

The temptation is high to estimate the cost of an "index_scan(only) +
ordered(by ctid) table pages fetch if heap required". (this is what I
understood from heikki suggestion 3-4. and it makes sense). It may be
easier to implement both at once but I didn't find the branch in the
Heikki's git repos. (probably removed since the long time)

I was thinking about this as well, at least if I understand you
correctly.  That would be similar to a bitmap index scan, and I think
it would be a great thing to have, not only because it would allow us
to get the advantages of index-only scans in situations that are
well-suited to our current bitmap scans, but also because it could be
batched.  You could allocate a buffer of work_mem bytes and fill it up
with TIDs; then, when it's full, you sort the buffer and start doing
the necessary heap fetches in physical order.  If you still need more
rows, you can clear the buffer and go around for another pass.

Based on ANALYZE stats for the visibility, I believe cost_index and
cost_index_only should be very similar functions (well, atm, I don't
see the point to split it in 2 functions).

Yeah, I would more imagine modifying the existing function.

Any thoughts welcome.  Incidentally, if anyone else feels like working
on this, feel free to let me know and I'm happy to step away, from all
of it or from whatever part someone else wants to tackle.  I'm mostly
working on this because it's something that I think we really need to
get done, more than having a burning desire to be the one who does it.

Indexonly scans are welcome!
I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

Well, I have code for #1, and just need reviews, and #2 shouldn't be
that hard, and with luck I'll twist Bruce's arm into doing it (*waves
to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
on what/how you'd like to contribute there?

I can provide initial patchs for cost and analyze, at least.

OK, cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#10)
Re: the big picture for index-only scans

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Simon Riggs <simon@2ndQuadrant.com> wrote:

This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.

In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry. That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible. I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.

As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.

It's already the case that we'll flip over to a bitmap indexscan,
and thus get rid of most/all of the "random" page accesses, in
situations where this is likely to be a big win. Pointing to the
performance difference in databases that don't do that is therefore
not too convincing.

I'm inclined to agree that index-only scans might be worth the amount
of work that's involved ... but I share Simon's desire to see some proof
before anything gets committed.

regards, tom lane

#13Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#12)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 12:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Simon Riggs <simon@2ndQuadrant.com> wrote:

This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.

In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry.  That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible.  I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.

As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.

It's already the case that we'll flip over to a bitmap indexscan,
and thus get rid of most/all of the "random" page accesses, in
situations where this is likely to be a big win.  Pointing to the
performance difference in databases that don't do that is therefore
not too convincing.

I'm inclined to agree that index-only scans might be worth the amount
of work that's involved ... but I share Simon's desire to see some proof
before anything gets committed.

Well, we're not in the habit of committing performance patches without
performance numbers, so I doubt we'll reverse that trend now, and
certainly I had no intention of doing so.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#10)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Simon Riggs <simon@2ndQuadrant.com> wrote:

This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.

In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry.  That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible.  I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.

As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.

I can picture that. Regrettably, I can also picture the accesses to
the visibility map, the maintenance operations on the VM that are
needed for this and the contention that both of those will cause.

ISTM quite likely that we'll slow down writes to some extent in order
to improve this use case.

So I'm interested in knowing how broad the use case is and what the
overheads are, rather than have an "aw crap!" moment in the future
where we finish the code and only then realise its benefit footprint
is not useful. Best to start out with a clear benefit analysis other
than "other DBMS do it".

For example, will this be an index-specific tuning option
(manual/automatic), per table or an always-on feature?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#15Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#12)
Re: the big picture for index-only scans

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Simon Riggs <simon@2ndQuadrant.com> wrote:

This topic has been discussed many times, yet I have never seen
an assessment that explains WHY we would want to do index-only
scans.

In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for
each index entry. That seems like enough evidence of its
possible value in PostgreSQL to proceed to the point where
benchmarks become possible. I'm assuming that, like all other
features added as performance optimizations, it won't be
committed until there are benchmarks showing the net benefit.

As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.

It's already the case that we'll flip over to a bitmap indexscan,
and thus get rid of most/all of the "random" page accesses, in
situations where this is likely to be a big win. Pointing to the
performance difference in databases that don't do that is
therefore not too convincing.

Sure. Of course, if you're only accessing twenty thousand rows from
a table containing fifty million rows, bitmap index scans could come
out pretty close to random access times; but on the whole I agree
that the scale of benefit in PostgreSQL won't tend to match what
people see in other products. Note that my words were "enough
evidence of its possible value in PostgreSQL to proceed to the point
where benchmarks become possible."

In particular, we might want to somehow try to make clear to people
that the very wide indexes they are accustomed to creating to allow
this optimization in other products might be inefficient compared to
creating several one-column indexes which would enable bitmap
logical operations.

I'm inclined to agree that index-only scans might be worth the
amount of work that's involved

So we agree there.

... but I share Simon's desire to see some proof before anything
gets committed.

And we agree there. In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.

My overall gut feel is that there will be some circumstances where
the "covering index" optmization is much faster, and some where
people expect it to be, but it isn't. The trickiest part of this
might be developing a costing model which allows us to make the
right choice most of the time. And even if we get it perfect, we
can expect questions about why the covering index wasn't used, and
requests for hints so they can force it. :-(

-Kevin

#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#12)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 5:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's already the case that we'll flip over to a bitmap indexscan,
and thus get rid of most/all of the "random" page accesses, in
situations where this is likely to be a big win.  Pointing to the
performance difference in databases that don't do that is therefore
not too convincing.

The other major effect is row size. Many databases have very wide
rows, perhaps on the order of 1kB. So the table with a million rows
might be 8GB but the index on a few key columns might only be a few
megabytes. Even if you have to read the entire index in random order
it'll likely all be cached and scan faster than the table itself.

One problem with hanging on benchmarks is that database schema design
can actually change based on what performs well. People get in the
habit of creating indexes in Oracle that are only logical when you
realize they allow the database to do an index-only scan because they
contain extra columns that aren't actually used in where clauses but
are typically in the select list.

--
greg

#17Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#15)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 6:25 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

... but I share Simon's desire to see some proof before anything
gets committed.

And we agree there.  In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.

I'm not talking about eventual commit, I'm talking about the whole
process of development.

We should be focusing on improving a measurable performance issue, not
on implementing one exact design that someone thought might help.
How will we review the patch except by measuring it against the
declared performance goal? Otherwise all the various options along the
way will just be matters of opinion, instead of measurement.

From what has been said so far, the use case for this is related to
the practice of using "covered indexes", which makes me nervous
because that is an expert level tuning task on other DBMS, limiting
the range of people who get benefit. The typical speed up for
non-covered indexes will come when we access a very large table (not
in cache) via an index scan that is smaller than a bitmapindex scan.
Will we be able to gauge selectivities sufficiently accurately to be
able to pinpoint that during optimization? How will we know that the
table is not in cache? Or is this an optimisation in the executor for
a bitmapheap scan?

I'm not being negative, I'm trying to avoid arguments, blind alleys
and much wasted development if we focus on the wrong things or go to
design too early..

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#18Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#17)
Re: the big picture for index-only scans

Simon Riggs <simon@2ndQuadrant.com> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

... but I share Simon's desire to see some proof before anything
gets committed.

And we agree there. In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.

I'm not talking about eventual commit, I'm talking about the whole
process of development.

I'm confused -- you want to see proof that the concept works well in
PostgreSQL before development effort on it begins? Or there is some
alternative you would like to see pursued instead? Something else?

From what has been said so far, the use case for this is related
to the practice of using "covered indexes", which makes me nervous
because that is an expert level tuning task on other DBMS

What? On the versions of MS SQL Server and Sybase ASE I've used it
costs covered index plans against all the other plans automatically,
and picks this type of plan if the cost looks lower. Sure, DBAs
sometimes add indexes, or add columns to indexes, in hopes that such
a plan will be chosen -- but what's new and different there?

The typical speed up for non-covered indexes will come when we
access a very large table (not in cache) via an index scan that is
smaller than a bitmapindex scan. Will we be able to gauge
selectivities sufficiently accurately to be able to pinpoint that
during optimization? How will we know that the table is not in
cache? Or is this an optimisation in the executor for a bitmapheap
scan?

I would continue to object to using current cache contents for plan
choice because of plan instability and the fact that an odd initial
cache load could skew plans in a bad direction indefinitely. I do
agree (and have already posted) that I think the hardest part of
this might be developing a good cost model. I doubt that's an
insoluble problem, especially since it is something we can refine
over time as we gain experience with the edge cases.

-Kevin

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#18)
Re: the big picture for index-only scans

On Tue, May 10, 2011 at 8:35 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:

Simon Riggs <simon@2ndQuadrant.com> wrote:

Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

... but I share Simon's desire to see some proof before anything
gets committed.

And we agree there.  In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.

I'm not talking about eventual commit, I'm talking about the whole
process of development.

I'm confused -- you want to see proof that the concept works well in
PostgreSQL before development effort on it begins?  Or there is some
alternative you would like to see pursued instead?  Something else?

Well, I didn't ask for that and agree it would be foolish to demand
proof ahead of development.

I know this technique is effective in other DBMS, I just want to be
certain it will be effective for us before too much work is done. We
have the additional requirement for a crash safe vacuum map that needs
to be consulted, with possible contention effects. Sybase etc can
simply avoid the logical I/O, which is always a win, in or out of
cache. So the problem is quite different for us.

What I suggested was a assessment and benefit case because we normally
start with a problem and then work out how to solve it.

Normally, others come forward with the why? when? questions and it
feels like there's a bit of groupthink going on here. This looks to me
like its being approached like it was a feature, but it looks to me
like a possible optimisation, so suggest we treat it that way.

Out of concern, I don't want you to waste time on work that *may* not
be that useful in practice, and I don't want to miss improvements or
alternatives either.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Simon Riggs (#19)
Re: the big picture for index-only scans

Simon Riggs <simon@2ndQuadrant.com> wrote:

Normally, others come forward with the why? when? questions and it
feels like there's a bit of groupthink going on here. This looks
to me like its being approached like it was a feature, but it
looks to me like a possible optimisation, so suggest we treat it
that way.

This issue has come up a great many times over the years, and there
has been much discussion around it. The Wiki page is here:

http://wiki.postgresql.org/wiki/Index-only_scans

This currently references 11 threads on the topic. I'd bet that by
spending a couple hours at it I could quadruple that number of
threads. (I'd really rather not, though.)

The problem is that there are regular and fairly frequent complaints
on the list about queries which run slower than people expect
because the heap must be checked for visibility information when
matching index entries are found. It has become enough of a
conspicuous issue that a lot of people are interested in seeing if
something can be done about it. After much discussion, people are
trying to advance a plan to find an answer. I'm sure nobody
involved would ignore any suggestion on how it might be done better;
but at this point, I don't think it's fair to suggest that this is
not being pursued in response to a real problem, or that no serious
thought has been given to direction before people started moving.

-Kevin

#21Bruce Momjian
bruce@momjian.us
In reply to: Kevin Grittner (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#6)
#23Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#21)
#24Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#24)
#26Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#1)
#27Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#24)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#28)
#30Jesper Krogh
jesper@krogh.cc
In reply to: Bruce Momjian (#21)
#31Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#21)
#32Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#23)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Bruce Momjian (#26)
#34Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Simon Riggs (#14)
#35Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Bruce Momjian (#26)
#36Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Heikki Linnakangas (#34)
#37Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Kevin Grittner (#18)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#26)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#28)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#31)
#41Bruce Momjian
bruce@momjian.us
In reply to: Nicolas Barbier (#35)
#42Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#32)
#43Bruce Momjian
bruce@momjian.us
In reply to: Cédric Villemain (#37)
#44Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Bruce Momjian (#42)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#34)
#46Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#45)
#47Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#45)
#48Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#40)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#48)
#50Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#49)
#51Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Bruce Momjian (#43)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#50)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Cédric Villemain (#51)
#54Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Robert Haas (#53)
#55Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#53)
#56Robert Haas
robertmhaas@gmail.com
In reply to: Florian Pflug (#55)
#57Florian Pflug
fgp@phlo.org
In reply to: Robert Haas (#56)
#58Robert Haas
robertmhaas@gmail.com
In reply to: Florian Pflug (#57)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#22)
In reply to: Heikki Linnakangas (#34)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Gokulakannan Somasundaram (#60)
#62Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#59)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#62)
In reply to: Robert Haas (#61)
In reply to: Gokulakannan Somasundaram (#64)
#66Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Gokulakannan Somasundaram (#65)
In reply to: Heikki Linnakangas (#66)
In reply to: Gokulakannan Somasundaram (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Gokulakannan Somasundaram (#67)
#70Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#69)
#71Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#69)
#72Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#70)
In reply to: Heikki Linnakangas (#71)
In reply to: Robert Haas (#69)
In reply to: Gokulakannan Somasundaram (#73)
#76Robert Haas
robertmhaas@gmail.com
In reply to: Gokulakannan Somasundaram (#73)
#77Robert Haas
robertmhaas@gmail.com
In reply to: Gokulakannan Somasundaram (#74)
#78Robert Haas
robertmhaas@gmail.com
In reply to: Gokulakannan Somasundaram (#75)
In reply to: Robert Haas (#76)
In reply to: Robert Haas (#77)
In reply to: Robert Haas (#78)
#82Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Gokulakannan Somasundaram (#79)
#83Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Gokulakannan Somasundaram (#80)
In reply to: Heikki Linnakangas (#83)
In reply to: Gokulakannan Somasundaram (#84)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Gokulakannan Somasundaram (#79)
In reply to: Robert Haas (#86)
#88Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#83)