index-only scans
Please find attached a patch implementing a basic version of
index-only scans. This patch is the work of my colleague Ibrar Ahmed
and myself, and also incorporates some code from previous patches
posted by Heikki Linnakanagas.
I'm able to demonstrate a significant performance benefit from this
patch, even in a situation where it doesn't save any disk I/O, due to
reduced thrashing of shared_buffers. Configuration settings:
max_connections = 100
shared_buffers = 400MB
maintenance_work_mem = 256MB
synchronous_commit = off
checkpoint_segments = 100
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
checkpoint_warning = 90s
seq_page_cost = 1.0
random_page_cost = 1.0
effective_cache_size = 3GB
Test setup:
pgbench -i -s 50
create table sample_data as select (random()*5000000)::int as aid,
repeat('a', 1000) as filler from generate_series(1,100000);
Test queries:
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);
On my laptop, the first query executes in about 555 ms, while the
second one takes about 1125 ms. Inspection via pg_buffercache reveals
that the second one thrashes shared_buffers something fierce, while
the first one does not. You can actually get the time for the first
query down to about 450 ms if you can persuade PostgreSQL to cache the
entire sample_data table - which is difficult, due the
BufferAccessStrategy stuff - and as soon as you run the second query,
it blows the table out of cache, so practically speaking you're not
going to get that faster time very often. I expect that you could get
an even larger benefit from this type of query if you could avoid
actual disk I/O, rather than just buffer cache thrashing, but I
haven't come up with a suitable test cases for that yet (volunteers?).
There are several things about this patch that probably need further
thought and work, or at least some discussion.
1. The way that nodeIndexscan.c builds up the faux heap tuple is
perhaps susceptible to improvement. I thought about building a
virtual tuple, but then what do I do with an OID column, if I have
one? Or maybe this should be done some other way altogether.
2. Suppose we scan one tuple on a not-all-visible page followed by 99
tuples on all-visible pages. The code as written will hold the pin on
the first heap page for the entire scan. As soon as we hit the end of
the scan or another tuple where we have to actually visit the page,
the old pin will be released, but until then we hold onto it. This
isn't totally stupid: the next tuple that's on a not-all-visible page
could easily be on the same not-all-visible page we already have
pinned. And in 99% cases holding the pin for slightly longer is
probably completely harmless. On the flip side, it could increase the
chances of interfering with VACUUM. Then again, a non-index-only scan
would still interfere with the same VACUUM, so maybe we don't care.
3. The code in create_index_path() builds up a bitmapset of heap
attributes that get used for any purpose anywhere in the query, and
hangs it on the RelOptInfo so it doesn't need to be rebuilt for every
index under consideration. However, if it were somehow possible to
have the rel involved without using any attributes at all, we'd
rebuild the cache over and over, since it would never become non-NULL.
I don't think that can happen right now, but future planner changes
might make it possible.
4. There are a couple of cases that use index-only scans even though
the EXPLAIN output sort of makes it look like they shouldn't. For
example, in the above queries, an index-only scan is chosen even
though the query does "SELECT *" from the table being scanned. Even
though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems
that the target list of an EXISTS query is in fact discarded, e.g.:
create or replace function error() returns int as $$begin select 1/0;
end$$ language plpgsql;
select * from pgbench_accounts a where exists (select error() from
pgbench_branches b where b.bid = a.aid); -- doesn't error out!
Along similar lines, COUNT(*) does not preclude an index-only scan,
because the * is apparently just window dressing. You'll still get
just a seq-scan unless you have an indexable qual in the query
somewhere, because...
5. We haven't made any planner changes at all, not even for cost
estimation. It is not clear to me what the right way to do cost
estimation here is. It seems that it would be useful to know what
percentage of the table is all-visible at planning time, but even if
we did, there could (and probably often will) be correlation between
the pages the query is interested in and which visibility map bits are
set. So I'm concerned about being overly optimistic about the cost
savings. Also, there's the problem of figuring out how to keep the
percent-of-table-that-has-the-vm-bits-set statistic up to date. Maybe
that's a job for ANALYZE but I haven't thought about it much yet.
6. I'm sure there are probably some statements in the documentation
that need updating, but I haven't tracked 'em down yet.
Comments, testing, review appreciated...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
index-only-scans-v1.patchapplication/octet-stream; name=index-only-scans-v1.patchDownload+515-138
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
1. The way that nodeIndexscan.c builds up the faux heap tuple is
perhaps susceptible to improvement. I thought about building a
virtual tuple, but then what do I do with an OID column, if I have
one? Or maybe this should be done some other way altogether.
Maybe it's time to finally remove the been-deprecated-for-a-while OIDs?
- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201108111654
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAk5EQiEACgkQvJuQZxSWSsglcQCeKsLRvd958M5QJ8YC8aNqr/Ku
11QAn1Iwaz9GuGVOB28orAITCsSX4MOo
=JMag
-----END PGP SIGNATURE-----
On 08/11/2011 01:57 PM, Greg Sabino Mullane wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD1601. The way that nodeIndexscan.c builds up the faux heap tuple is
perhaps susceptible to improvement. I thought about building a
virtual tuple, but then what do I do with an OID column, if I have
one? Or maybe this should be done some other way altogether.Maybe it's time to finally remove the been-deprecated-for-a-while OIDs?
As a user space option certainly. However they are also used to track
system objects, I don't know that we need to get rid of them for that
purpose. We just need to remove WITH/WITHOUT OIDS for user tables.
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
The PostgreSQL Conference - http://www.postgresqlconference.org/
@cmdpromptinc - @postgresconf - 509-416-6579
On Thu, Aug 11, 2011 at 4:57 PM, Greg Sabino Mullane <greg@turnstep.com> wrote:
1. The way that nodeIndexscan.c builds up the faux heap tuple is
perhaps susceptible to improvement. I thought about building a
virtual tuple, but then what do I do with an OID column, if I have
one? Or maybe this should be done some other way altogether.Maybe it's time to finally remove the been-deprecated-for-a-while OIDs?
I thought about just not supporting that for index-only scans, but
system catalogs use them pretty extensively, and it doesn't seem out
of the question that that could matter to people who have lots of SQL
objects floating around. I am *definitely* not volunteering to
reengineer OIDs out of the system catalogs. For that amount of work,
I could implement probably half a dozen major features any single one
of which would be more valuable than the tiny amount of notational
convenience such a change would buy.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
2011/8/11 Robert Haas <robertmhaas@gmail.com>:
Please find attached a patch implementing a basic version of
index-only scans. This patch is the work of my colleague Ibrar Ahmed
and myself, and also incorporates some code from previous patches
posted by Heikki Linnakanagas.
Great!.
I'm able to demonstrate a significant performance benefit from this
patch, even in a situation where it doesn't save any disk I/O, due to
reduced thrashing of shared_buffers. Configuration settings:max_connections = 100
shared_buffers = 400MB
maintenance_work_mem = 256MB
synchronous_commit = off
checkpoint_segments = 100
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
checkpoint_warning = 90s
seq_page_cost = 1.0
random_page_cost = 1.0
effective_cache_size = 3GBTest setup:
pgbench -i -s 50
create table sample_data as select (random()*5000000)::int as aid,
repeat('a', 1000) as filler from generate_series(1,100000);Test queries:
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);On my laptop, the first query executes in about 555 ms, while the
second one takes about 1125 ms. Inspection via pg_buffercache reveals
that the second one thrashes shared_buffers something fierce, while
the first one does not. You can actually get the time for the first
query down to about 450 ms if you can persuade PostgreSQL to cache the
entire sample_data table - which is difficult, due the
BufferAccessStrategy stuff - and as soon as you run the second query,
it blows the table out of cache, so practically speaking you're not
going to get that faster time very often. I expect that you could get
an even larger benefit from this type of query if you could avoid
actual disk I/O, rather than just buffer cache thrashing, but I
haven't come up with a suitable test cases for that yet (volunteers?).There are several things about this patch that probably need further
thought and work, or at least some discussion.1. The way that nodeIndexscan.c builds up the faux heap tuple is
perhaps susceptible to improvement. I thought about building a
virtual tuple, but then what do I do with an OID column, if I have
one? Or maybe this should be done some other way altogether.
Can this faux heap tuple be appended by the data from another index
once it has been created ? ( if the query involves those 2 index)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160
Maybe it's time to finally remove the been-deprecated-for-a-while OIDs?
I thought about just not supporting that for index-only scans, but
system catalogs use them pretty extensively, and it doesn't seem out
of the question that that could matter to people who have lots of SQL
objects floating around.
Right - when I said remove, I meant for all but system catalogs. I
would think those are generally small enough that for most people
the lack of index-only scans on those would not matter. Heck, the
system catalogs are already "special" in lots of ways other than
having OIDs (most anyway), so it's not as though we'd be breaking
sacred ground with an index-only exception. :)
I guess the question that should be asked is "we are going to finally
remove OIDs someday, right?". If so, and if it's potentially blocking a
major new feature, why not now?
- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201108112140
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----
iEYEAREDAAYFAk5EhYEACgkQvJuQZxSWSsjnYQCgne81uKjiABVU3X3X+5cM/oFx
74YAoNX97hsOIxBx4Y1hcQHf/bWR813U
=hEl2
-----END PGP SIGNATURE-----
On 08/11/2011 09:44 PM, Greg Sabino Mullane wrote:
I guess the question that should be asked is "we are going to finally
remove OIDs someday, right?". If so, and if it's potentially blocking a
major new feature, why not now?
It seems a bit odd then that we added "ALTER TABLE SET WITH OIDS" just a
couple of years ago.
cheers
andrew
On Thu, Aug 11, 2011 at 5:39 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
2011/8/11 Robert Haas <robertmhaas@gmail.com>:
Please find attached a patch implementing a basic version of
index-only scans. This patch is the work of my colleague Ibrar Ahmed
and myself, and also incorporates some code from previous patches
posted by Heikki Linnakanagas.Great!
Glad you like it...!
Can this faux heap tuple be appended by the data from another index
once it has been created ? ( if the query involves those 2 index)
I don't see how to make that work. In general, a query like "SELECT
a, b FROM foo WHERE a = 1 AND b = 1" can only use both indexes if we
use a bitmap index scan on each followed by a bitmapand and then a
bitmap heap scan. However, this patch only touches the index-scan
path, which only knows how to use one index for any given query.
Actually, I can see a possible way to allow an index-only type
optimization to be used for bitmap scans. As you scan the index, any
tuples that can be handled index-only get returned immediately; the
rest are thrown into a bitmap. Once you're done examining the index,
you then do a bitmap heap scan to get the tuples that couldn't be
handled index-only. This seems like it might be our best hope for a
"fast count(*)" type optimization, especially if you could combine it
with some method of scanning the index in physical order rather than
logical order.
But even that trick would only work with a single index. I'm not sure
there's any good way to assemble pieces of tuples from two different
indexes, at least not efficiently.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Aug 11, 2011 at 9:44 PM, Greg Sabino Mullane <greg@turnstep.com> wrote:
Maybe it's time to finally remove the been-deprecated-for-a-while OIDs?
I thought about just not supporting that for index-only scans, but
system catalogs use them pretty extensively, and it doesn't seem out
of the question that that could matter to people who have lots of SQL
objects floating around.Right - when I said remove, I meant for all but system catalogs. I
would think those are generally small enough that for most people
the lack of index-only scans on those would not matter. Heck, the
system catalogs are already "special" in lots of ways other than
having OIDs (most anyway), so it's not as though we'd be breaking
sacred ground with an index-only exception. :)
Yeah, maybe. But since there's no evidence that we actually need that
exception for performance, I'm not in favor of adding it at this
point.
I guess the question that should be asked is "we are going to finally
remove OIDs someday, right?".
I don't necessarily see a reason to do that. I wouldn't object to
turning the system OID columns in the system catalogs into normal OID
columns, but that would be a lot of work and it doesn't seem to me to
be the most important problem we need to solve (or even in the top
25).
If so, and if it's potentially blocking a
major new feature, why not now?
It's not blocking a major new feature, except to the extent that we're
having a conversation about it, instead of talking about the major new
feature.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
2011/8/12 Robert Haas <robertmhaas@gmail.com>:
On Thu, Aug 11, 2011 at 5:39 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:2011/8/11 Robert Haas <robertmhaas@gmail.com>:
Please find attached a patch implementing a basic version of
index-only scans. This patch is the work of my colleague Ibrar Ahmed
and myself, and also incorporates some code from previous patches
posted by Heikki Linnakanagas.Great!
Glad you like it...!
Can this faux heap tuple be appended by the data from another index
once it has been created ? ( if the query involves those 2 index)I don't see how to make that work. In general, a query like "SELECT
a, b FROM foo WHERE a = 1 AND b = 1" can only use both indexes if we
use a bitmap index scan on each followed by a bitmapand and then a
bitmap heap scan. However, this patch only touches the index-scan
path, which only knows how to use one index for any given query.
I thought of something like that: 'select a,b from foo where a=1
order by b limit 100' (or: where a=1 and b< now() )
Actually, I can see a possible way to allow an index-only type
optimization to be used for bitmap scans. As you scan the index, any
tuples that can be handled index-only get returned immediately; the
rest are thrown into a bitmap. Once you're done examining the index,
you then do a bitmap heap scan to get the tuples that couldn't be
handled index-only. This seems like it might be our best hope for a
"fast count(*)" type optimization, especially if you could combine it
with some method of scanning the index in physical order rather than
logical order.
IIRC we expose some ideas around that, yes. (optimizing bitmap)
Maybe a question that will explain me more about the feature
limitation (if any):
Does an index-only scan used when the table has no vismap set will
cost (in duration, IO, ...) more than a normal Index scan ?
But even that trick would only work with a single index. I'm not sure
there's any good way to assemble pieces of tuples from two different
indexes, at least not efficiently.
okay.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
Robert,
I imagine we store positional information in gin index and return
tuples in relevant order - instant full-text search !
Great work, guys !
Oleg
On Thu, 11 Aug 2011, Robert Haas wrote:
Please find attached a patch implementing a basic version of
index-only scans. This patch is the work of my colleague Ibrar Ahmed
and myself, and also incorporates some code from previous patches
posted by Heikki Linnakanagas.I'm able to demonstrate a significant performance benefit from this
patch, even in a situation where it doesn't save any disk I/O, due to
reduced thrashing of shared_buffers. Configuration settings:max_connections = 100
shared_buffers = 400MB
maintenance_work_mem = 256MB
synchronous_commit = off
checkpoint_segments = 100
checkpoint_timeout = 30min
checkpoint_completion_target = 0.9
checkpoint_warning = 90s
seq_page_cost = 1.0
random_page_cost = 1.0
effective_cache_size = 3GBTest setup:
pgbench -i -s 50
create table sample_data as select (random()*5000000)::int as aid,
repeat('a', 1000) as filler from generate_series(1,100000);Test queries:
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.aid != 1234567);
select sum(aid) from sample_data a1 where exists (select * from
pgbench_accounts a where a.aid = a1.aid and a.bid != 1234567);On my laptop, the first query executes in about 555 ms, while the
second one takes about 1125 ms. Inspection via pg_buffercache reveals
that the second one thrashes shared_buffers something fierce, while
the first one does not. You can actually get the time for the first
query down to about 450 ms if you can persuade PostgreSQL to cache the
entire sample_data table - which is difficult, due the
BufferAccessStrategy stuff - and as soon as you run the second query,
it blows the table out of cache, so practically speaking you're not
going to get that faster time very often. I expect that you could get
an even larger benefit from this type of query if you could avoid
actual disk I/O, rather than just buffer cache thrashing, but I
haven't come up with a suitable test cases for that yet (volunteers?).There are several things about this patch that probably need further
thought and work, or at least some discussion.1. The way that nodeIndexscan.c builds up the faux heap tuple is
perhaps susceptible to improvement. I thought about building a
virtual tuple, but then what do I do with an OID column, if I have
one? Or maybe this should be done some other way altogether.2. Suppose we scan one tuple on a not-all-visible page followed by 99
tuples on all-visible pages. The code as written will hold the pin on
the first heap page for the entire scan. As soon as we hit the end of
the scan or another tuple where we have to actually visit the page,
the old pin will be released, but until then we hold onto it. This
isn't totally stupid: the next tuple that's on a not-all-visible page
could easily be on the same not-all-visible page we already have
pinned. And in 99% cases holding the pin for slightly longer is
probably completely harmless. On the flip side, it could increase the
chances of interfering with VACUUM. Then again, a non-index-only scan
would still interfere with the same VACUUM, so maybe we don't care.3. The code in create_index_path() builds up a bitmapset of heap
attributes that get used for any purpose anywhere in the query, and
hangs it on the RelOptInfo so it doesn't need to be rebuilt for every
index under consideration. However, if it were somehow possible to
have the rel involved without using any attributes at all, we'd
rebuild the cache over and over, since it would never become non-NULL.
I don't think that can happen right now, but future planner changes
might make it possible.4. There are a couple of cases that use index-only scans even though
the EXPLAIN output sort of makes it look like they shouldn't. For
example, in the above queries, an index-only scan is chosen even
though the query does "SELECT *" from the table being scanned. Even
though the EXPLAIN (VERBOSE) output makes it look otherwise, it seems
that the target list of an EXISTS query is in fact discarded, e.g.:create or replace function error() returns int as $$begin select 1/0;
end$$ language plpgsql;
select * from pgbench_accounts a where exists (select error() from
pgbench_branches b where b.bid = a.aid); -- doesn't error out!Along similar lines, COUNT(*) does not preclude an index-only scan,
because the * is apparently just window dressing. You'll still get
just a seq-scan unless you have an indexable qual in the query
somewhere, because...5. We haven't made any planner changes at all, not even for cost
estimation. It is not clear to me what the right way to do cost
estimation here is. It seems that it would be useful to know what
percentage of the table is all-visible at planning time, but even if
we did, there could (and probably often will) be correlation between
the pages the query is interested in and which visibility map bits are
set. So I'm concerned about being overly optimistic about the cost
savings. Also, there's the problem of figuring out how to keep the
percent-of-table-that-has-the-vm-bits-set statistic up to date. Maybe
that's a job for ANALYZE but I haven't thought about it much yet.6. I'm sure there are probably some statements in the documentation
that need updating, but I haven't tracked 'em down yet.Comments, testing, review appreciated...
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
On Fri, Aug 12, 2011 at 6:20 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
Can this faux heap tuple be appended by the data from another index
once it has been created ? ( if the query involves those 2 index)I don't see how to make that work. In general, a query like "SELECT
a, b FROM foo WHERE a = 1 AND b = 1" can only use both indexes if we
use a bitmap index scan on each followed by a bitmapand and then a
bitmap heap scan. However, this patch only touches the index-scan
path, which only knows how to use one index for any given query.I thought of something like that: 'select a,b from foo where a=1
order by b limit 100' (or: where a=1 and b< now() )
Well... PostgreSQL can only use the index on a or the index on b, not
both. This patch doesn't change that. I'm not trying to use indexes
in some completely new way; I'm just trying to make them faster by
optimizing away the heap access.
Actually, I can see a possible way to allow an index-only type
optimization to be used for bitmap scans. As you scan the index, any
tuples that can be handled index-only get returned immediately; the
rest are thrown into a bitmap. Once you're done examining the index,
you then do a bitmap heap scan to get the tuples that couldn't be
handled index-only. This seems like it might be our best hope for a
"fast count(*)" type optimization, especially if you could combine it
with some method of scanning the index in physical order rather than
logical order.IIRC we expose some ideas around that, yes. (optimizing bitmap)
Maybe a question that will explain me more about the feature
limitation (if any):
Does an index-only scan used when the table has no vismap set will
cost (in duration, IO, ...) more than a normal Index scan ?
Yeah, it'll do a bit of extra work - the btree AM will cough up the
tuple uselessly, and we'll check the visibility map, also uselessly.
Then we'll end up doing it the regular way anyhow. I haven't measured
that effect yet; hopefully it's fairly small.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
2011/8/12 Robert Haas <robertmhaas@gmail.com>:
On Fri, Aug 12, 2011 at 6:20 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:Can this faux heap tuple be appended by the data from another index
once it has been created ? ( if the query involves those 2 index)I don't see how to make that work. In general, a query like "SELECT
a, b FROM foo WHERE a = 1 AND b = 1" can only use both indexes if we
use a bitmap index scan on each followed by a bitmapand and then a
bitmap heap scan. However, this patch only touches the index-scan
path, which only knows how to use one index for any given query.I thought of something like that: 'select a,b from foo where a=1
order by b limit 100' (or: where a=1 and b< now() )Well... PostgreSQL can only use the index on a or the index on b, not
both. This patch doesn't change that. I'm not trying to use indexes
in some completely new way; I'm just trying to make them faster by
optimizing away the heap access.
For this kind of plan :
Bitmap Heap Scan
Recheck Cond
BitmapAnd
Bitmap Index Scan
Bitmap Index Scan
It may prevent useless Heap Fetch during "Bitmap Heap Scan", isn't it ?
Actually, I can see a possible way to allow an index-only type
optimization to be used for bitmap scans. As you scan the index, any
tuples that can be handled index-only get returned immediately; the
rest are thrown into a bitmap. Once you're done examining the index,
you then do a bitmap heap scan to get the tuples that couldn't be
handled index-only. This seems like it might be our best hope for a
"fast count(*)" type optimization, especially if you could combine it
with some method of scanning the index in physical order rather than
logical order.IIRC we expose some ideas around that, yes. (optimizing bitmap)
Maybe a question that will explain me more about the feature
limitation (if any):
Does an index-only scan used when the table has no vismap set will
cost (in duration, IO, ...) more than a normal Index scan ?Yeah, it'll do a bit of extra work - the btree AM will cough up the
tuple uselessly, and we'll check the visibility map, also uselessly.
Then we'll end up doing it the regular way anyhow. I haven't measured
that effect yet; hopefully it's fairly small.
If it is small, or if we can reduce it to be near absent.
Then... why do we need to distinguish Index Scan at all ? (or
Index*-Only* scan, which aren't 100% 'Only' btw)
It is then just an internal optimisation on how we can access/use an
index. (really good and promising one)
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Fri, Aug 12, 2011 at 9:31 AM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
Well... PostgreSQL can only use the index on a or the index on b, not
both. This patch doesn't change that. I'm not trying to use indexes
in some completely new way; I'm just trying to make them faster by
optimizing away the heap access.For this kind of plan :
Bitmap Heap Scan
Recheck Cond
BitmapAnd
Bitmap Index Scan
Bitmap Index ScanIt may prevent useless Heap Fetch during "Bitmap Heap Scan", isn't it ?
The input to the bitmap heap scan is just a TID bitmap. It's too late
to decide we want the index tuples at that point; we've forgotten
where they are, and even if we remembered it, it would necessarily be
any cheaper than checking the heap. We could optimize this to avoid a
heap fetch in the case where we don't need any of the tuple data at
all, but that's going to be somewhat rare, I think.
Actually, I can see a possible way to allow an index-only type
optimization to be used for bitmap scans. As you scan the index, any
tuples that can be handled index-only get returned immediately; the
rest are thrown into a bitmap. Once you're done examining the index,
you then do a bitmap heap scan to get the tuples that couldn't be
handled index-only. This seems like it might be our best hope for a
"fast count(*)" type optimization, especially if you could combine it
with some method of scanning the index in physical order rather than
logical order.IIRC we expose some ideas around that, yes. (optimizing bitmap)
Maybe a question that will explain me more about the feature
limitation (if any):
Does an index-only scan used when the table has no vismap set will
cost (in duration, IO, ...) more than a normal Index scan ?Yeah, it'll do a bit of extra work - the btree AM will cough up the
tuple uselessly, and we'll check the visibility map, also uselessly.
Then we'll end up doing it the regular way anyhow. I haven't measured
that effect yet; hopefully it's fairly small.If it is small, or if we can reduce it to be near absent.
Then... why do we need to distinguish Index Scan at all ? (or
Index*-Only* scan, which aren't 100% 'Only' btw)
It is then just an internal optimisation on how we can access/use an
index. (really good and promising one)
Right, that's kind of what I was going for. But the overhead isn't
going to be exactly zero, so I think it makes sense to disable it in
the cases where it clearly can't work. The question is really more
whether we need to get more fine-grained than that, and actually do a
cost-based comparison of an index-only scan vs. a regular index scan.
I hope not, but I haven't tested it yet.
One other thing to think about is that the choice to use an index-scan
is not free of externalities. The index-only scan is hopefully faster
than touching all the heap pages, but even if it weren't, not touching
all of the heap pages potentially means avoiding evicting things from
shared_buffers that some *other* query might need.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 11.08.2011 23:06, Robert Haas wrote:
Comments, testing, review appreciated...
I would've expected this to use an index-only scan:
postgres=# CREATE TABLE foo AS SELECT generate_series(1,100000) AS id;
SELECT 100000
postgres=# CREATE INDEX i_foo ON foo (id) WHERE id = 10;
CREATE INDEX
postgres=# VACUUM ANALYZE foo;
VACUUM
postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
QUERY PLAN
-----------------------------------------------------------------
Index Scan using i_foo on foo (cost=0.00..8.27 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)
If it's not a predicate index, then it works:
postgres=# DROP INDEX i_foo;
DROP INDEX
postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using i_foo2 on foo (cost=0.00..8.28 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Fri, Aug 12, 2011 at 4:03 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
On 11.08.2011 23:06, Robert Haas wrote:
Comments, testing, review appreciated...
I would've expected this to use an index-only scan:
postgres=# CREATE TABLE foo AS SELECT generate_series(1,100000) AS id;
Ugh. I think there's something wrong with this test:
+ if (index->amcanreturn && !index->predOK && !index->indexprs)
I think that should probably say something like (ind->indpred == NIL
|| ind->predOK).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Aug 12, 2011, at 10:03 PM, Heikki Linnakangas wrote:
On 11.08.2011 23:06, Robert Haas wrote:
Comments, testing, review appreciated...
I would've expected this to use an index-only scan:
postgres=# CREATE TABLE foo AS SELECT generate_series(1,100000) AS id;
SELECT 100000
postgres=# CREATE INDEX i_foo ON foo (id) WHERE id = 10;
CREATE INDEX
postgres=# VACUUM ANALYZE foo;
VACUUM
postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
QUERY PLAN
-----------------------------------------------------------------
Index Scan using i_foo on foo (cost=0.00..8.27 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)If it's not a predicate index, then it works:
postgres=# DROP INDEX i_foo;
DROP INDEX
postgres=# EXPLAIN SELECT id FROM foo WHERE id = 10;
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using i_foo2 on foo (cost=0.00..8.28 rows=1 width=4)
Index Cond: (id = 10)
(2 rows)
is there any plan to revise the cost for index only scans compared to what it is now?
many thanks,
hans
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
2011/8/12 PostgreSQL - Hans-Jürgen Schönig <postgres@cybertec.at>:
is there any plan to revise the cost for index only scans compared to what it is now?
That's one of the points I asked for feedback on in my original email.
"How should the costing be done?"
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote:
That's one of the points I asked for feedback on in my original
email. "How should the costing be done?"
It seems pretty clear that there should be some cost adjustment. If
you can't get good numbers somehow on what fraction of the heap
accesses will be needed, I would suggest using a magic number based
on the assumption that half the heap access otherwise necessary will
be done. It wouldn't be the worst magic number in the planner. Of
course, real numbers are always better if you can get them.
-Kevin
On Fri, Aug 12, 2011 at 5:39 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
That's one of the points I asked for feedback on in my original
email. "How should the costing be done?"It seems pretty clear that there should be some cost adjustment. If
you can't get good numbers somehow on what fraction of the heap
accesses will be needed, I would suggest using a magic number based
on the assumption that half the heap access otherwise necessary will
be done. It wouldn't be the worst magic number in the planner. Of
course, real numbers are always better if you can get them.
It wouldn't be that difficult (I think) to make VACUUM and/or ANALYZE
gather some statistics; what I'm worried about is that we'd have
correlation problems. Consider a wide table with an index on (id,
name), and the query:
SELECT name FROM tab WHERE id = 12345
Now, suppose that we know that 50% of the heap pages have their
visibility map bits set. What's the chance that this query won't need
a heap fetch? Well, the chance is 50% *if* you assume that a row
which has been quiescent for a long time is just as likely to be
queried as one that has been recently inserted or updated. However,
in many real-world use cases, nothing could be farther from the truth.
What do we do about that?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company