COUNT(*) and index-only scans
I talked to Robert Haas and he said that index-only scans do not
optimize COUNT(*). Is this something we can do for PG 9.2? Is anyone
working on this?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On 10 October 2011 18:23, Bruce Momjian <bruce@momjian.us> wrote:
I talked to Robert Haas and he said that index-only scans do not
optimize COUNT(*). Is this something we can do for PG 9.2? Is anyone
working on this?
Yes it does, provided that there is an appropriate WHERE clause. But
yes, I think we definitely want this if it's relatively easy. In
addition to this, it's not always easy to get it to use an index-only
scan even if it's going to significantly faster. I'm assuming some
supporting planner work needs to be added too.
--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Oct 10, 2011 at 6:23 PM, Bruce Momjian <bruce@momjian.us> wrote:
I talked to Robert Haas and he said that index-only scans do not
optimize COUNT(*). Is this something we can do for PG 9.2? Is anyone
working on this?
People usually conflate multiple problems when they talk about
"count(*). The usual case people are concerned about would require
materialized views, not index-only scans.
--
greg
Bruce Momjian <bruce@momjian.us> wrote:
I talked to Robert Haas and he said that index-only scans do not
optimize COUNT(*). Is this something we can do for PG 9.2? Is
anyone working on this?
Well, it's not that it doesn't optimize COUNT(*) -- it's that it
doesn't yet cost the index scan as cheaper than a table scan when
you're accessing every row.
create table t (id int not null primary key);
insert into t select generate_series(1, 1000000);
vacuum freeze analyze;
explain analyze select count(*) from t
where id between 500000 and 500010;
That gives you an index-only scan; but without the WHERE clause it
uses a seq scan. I think it's mainly a matter of doing enough
benchmarks to figure out how best to model the costs of the index
scan so that it can be picked for that case.
-Kevin
On Mon, Oct 10, 2011 at 1:36 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Bruce Momjian <bruce@momjian.us> wrote:
I talked to Robert Haas and he said that index-only scans do not
optimize COUNT(*). Is this something we can do for PG 9.2? Is
anyone working on this?Well, it's not that it doesn't optimize COUNT(*) -- it's that it
doesn't yet cost the index scan as cheaper than a table scan when
you're accessing every row.create table t (id int not null primary key);
insert into t select generate_series(1, 1000000);
vacuum freeze analyze;
explain analyze select count(*) from t
where id between 500000 and 500010;That gives you an index-only scan; but without the WHERE clause it
uses a seq scan. I think it's mainly a matter of doing enough
benchmarks to figure out how best to model the costs of the index
scan so that it can be picked for that case.
Right now, our costing model for index-only scans is pretty dumb. It
assumes that using an index-only scan will avoid 10% of the heap
fetches. That could easily be low, and on an insert-only table or one
where only the recently-updated rows are routinely accessed, it could
also be high. To use an index-only scan for a full-table COUNT(*),
we're going to have to be significantly smarter, because odds are good
that skipping 10% of the heap fetches won't be sufficient inducement
to the planner to go that route; we are going to need a real number.
This isn't just an exercise in costing, though: right now, we don't
even generate a plan to use an index for a full-table scan, because we
assume that it can never be cheaper. This is actually not quite true
even in previous releases (suppose the table is severely bloated but
the index is not) and it's going to be less true now that we have
index-only scans. So that's going to need some adjustment, too.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Bruce Momjian <bruce@momjian.us> wrote:
I talked to Robert Haas and he said that index-only scans do not
optimize COUNT(*). Is this something we can do for PG 9.2? Is
anyone working on this?
Well, it's not that it doesn't optimize COUNT(*) -- it's that it
doesn't yet cost the index scan as cheaper than a table scan when
you're accessing every row.
I think what Robert is complaining about is that we won't currently
consider an index that matches neither any WHERE clauses nor ORDER BY,
ie, count(*) over the whole table won't get considered for an index-only
scan, regardless of cost estimates.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> wrote:
Right now, our costing model for index-only scans is pretty dumb.
It assumes that using an index-only scan will avoid 10% of the
heap fetches. That could easily be low, and on an insert-only
table or one where only the recently-updated rows are routinely
accessed, it could also be high.
As a reality check, I just ran this query on a table in a statewide
copy of our data:
select count(*),
sum(case when xmin = '2'::xid then 0 else 1 end) as read_heap
from "CaseHist";
and got:
count | read_heap
-----------+-----------
205765311 | 3934924
So on our real-world database, it would skip something on the order
of 98% of the heap reads, right?
This isn't just an exercise in costing, though: right now, we
don't even generate a plan to use an index for a full-table scan,
because we assume that it can never be cheaper. This is actually
not quite true even in previous releases (suppose the table is
severely bloated but the index is not) and it's going to be less
true now that we have index-only scans. So that's going to need
some adjustment, too.
OK. Thanks for clarifying.
-Kevin
Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think what Robert is complaining about is that we won't
currently consider an index that matches neither any WHERE clauses
nor ORDER BY, ie, count(*) over the whole table won't get
considered for an index-only scan, regardless of cost estimates.
I guess the trick would be to get it to consider such plans only
under some conditions, to avoid explosive growth in planning time
for some types of queries. Some statistics bucket for the number of
non-frozen tuples in the relation, maybe?
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think what Robert is complaining about is that we won't
currently consider an index that matches neither any WHERE clauses
nor ORDER BY, ie, count(*) over the whole table won't get
considered for an index-only scan, regardless of cost estimates.
I guess the trick would be to get it to consider such plans only
under some conditions, to avoid explosive growth in planning time
for some types of queries. Some statistics bucket for the number of
non-frozen tuples in the relation, maybe?
My intention was to allow it to consider any covering index. You're
thinking about the cost estimate, which is really entirely different.
regards, tom lane
On Mon, Oct 10, 2011 at 10:36 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Bruce Momjian <bruce@momjian.us> wrote:
I talked to Robert Haas and he said that index-only scans do not
optimize COUNT(*). Is this something we can do for PG 9.2? Is
anyone working on this?Well, it's not that it doesn't optimize COUNT(*) -- it's that it
doesn't yet cost the index scan as cheaper than a table scan when
you're accessing every row.create table t (id int not null primary key);
insert into t select generate_series(1, 1000000);
vacuum freeze analyze;
explain analyze select count(*) from t
where id between 500000 and 500010;That gives you an index-only scan; but without the WHERE clause it
uses a seq scan.
If you convert the where clause to "where id is not null" it uses the
index only scan again, but only if you nudge it too with
enable_seqscan=off.
I'm not sure why it needs the nudge in one case but not the other.
Cheers,
Jeff
On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My intention was to allow it to consider any covering index. You're
thinking about the cost estimate, which is really entirely different.
Is there any reason to consider more than one? I would have expected
the narrowest one to be the best choice. There's something to be said
for using the same index consistently but we already have that problem
and make no attempt to do that. And partial indexes might be better
but then we would already be considering them if their constraints are
satisfied.
--
greg
Jeff Janes wrote:
Kevin Grittner wrote:
create table t (id int not null primary key);
insert into t select generate_series(1, 1000000);
vacuum freeze analyze;
explain analyze select count(*) from t
where id between 500000 and 500010;That gives you an index-only scan; but without the WHERE clause it
uses a seq scan.If you convert the where clause to "where id is not null" it uses
the index only scan again, but only if you nudge it too with
enable_seqscan=off.
Clever way to get a full-table test.
It turns out that for the above, with your trick to use the index
only scan, it comes out 12% faster to do a seqscan, even when the
table and index are fully cached (based on the average time of ten
runs each way). There's very little overlap, so the difference looks
real. But that's on a very narrow record, having just the one column
used in the index. I added one wide column like this:
alter table t add column x text;
update t set x = (repeat(random()::text, (random() * 100)::int));
cluster t USING t_pkey;
vacuum freeze analyze;
With that change the index-only scan time remained unchanged, while
the seqscan time grew to about 2.6 times the index only scan time.
That was mildly surprising for me, considering it was all still
cached.
-Kevin
Import Notes
Resolved by subject fallback
Jeff Janes wrote:
Kevin Grittner wrote:create table t (id int not null primary key);
insert into t select generate_series(1, 1000000);
vacuum freeze analyze;
explain analyze select count(*) from t
where id between 500000 and 500010;That gives you an index-only scan; but without the WHERE clause it
uses a seq scan.If you convert the where clause to "where id is not null" it uses
the index only scan again, but only if you nudge it too with
enable_seqscan=off.Clever way to get a full-table test.
It turns out that for the above, with your trick to use the index
only scan, it comes out 12% faster to do a seqscan, even when the
table and index are fully cached (based on the average time of ten
runs each way). There's very little overlap, so the difference looks
real. But that's on a very narrow record, having just the one column
used in the index. I added one wide column like this:alter table t add column x text;
update t set x = (repeat(random()::text, (random() * 100)::int));
cluster t USING t_pkey;
vacuum freeze analyze;With that change the index-only scan time remained unchanged, while
the seqscan time grew to about 2.6 times the index only scan time.
That was mildly surprising for me, considering it was all still
cached.
Moving data around in memory isn't that cheap, and you have probably made
the row size 5 times larger by the above thing. However if you added 4000
bytes instead of 100, you should be back to the previous results since
data then ends up (most likely,otherwise add a bit more) in a toast table.
Jesper
On Mon, Oct 10, 2011 at 3:15 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
Robert Haas <robertmhaas@gmail.com> wrote:
Right now, our costing model for index-only scans is pretty dumb.
It assumes that using an index-only scan will avoid 10% of the
heap fetches. That could easily be low, and on an insert-only
table or one where only the recently-updated rows are routinely
accessed, it could also be high.As a reality check, I just ran this query on a table in a statewide
copy of our data:select count(*),
sum(case when xmin = '2'::xid then 0 else 1 end) as read_heap
from "CaseHist";and got:
count | read_heap
-----------+-----------
205765311 | 3934924So on our real-world database, it would skip something on the order
of 98% of the heap reads, right?
Yeah, if it's scanning the whole table.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greg Stark wrote:
On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My intention was to allow it to consider any covering index. ?You're
thinking about the cost estimate, which is really entirely different.Is there any reason to consider more than one? I would have expected
the narrowest one to be the best choice. There's something to be said
for using the same index consistently but we already have that problem
and make no attempt to do that. And partial indexes might be better
but then we would already be considering them if their constraints are
satisfied.
Actually, I think the smallest non-partial one on disk might be the best
--- that is very easy to find out.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On Tue, Oct 11, 2011 at 12:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
Greg Stark wrote:
On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My intention was to allow it to consider any covering index. ?You're
thinking about the cost estimate, which is really entirely different.Is there any reason to consider more than one? I would have expected
the narrowest one to be the best choice. There's something to be said
for using the same index consistently but we already have that problem
and make no attempt to do that. And partial indexes might be better
but then we would already be considering them if their constraints are
satisfied.Actually, I think the smallest non-partial one on disk might be the best --- that is very easy to find out.
I doubt there is any need to write special-purpose code to decide
which index ought to be used for a full table scan. We can just throw
all of the otherwise-useless indexes into the costing machinery with
empty pathkeys, and let them duke it out. All but the best one will
be instantly discarded, and the best one will either beat or lose to a
sequential scan. All of this will happen before we start trying to
build join paths, so there's no combinatorial explosion in planning
time - it'll just be a straightforward cost comparison between plans
with identical pathkeys, and should be quite fast.
The real issue is that the costing estimates need to be accurate, and
that's where the rubber hits the road. Otherwise, even if we pick the
right way to scan the table, we may do silly things up the line when
we go to start constructing the join order. I think we need to beef
up ANALYZE to gather statistics on the fraction of the pages that are
marked all-visible, or maybe VACUUM should gather that information.
The trouble is that if we VACUUM and then ANALYZE, we'll often get
back a value very close to 100%, but then the real value may diminish
quite a bit before the next auto-analyze fires. I think if we can
figure out what to do about that problem we'll be well on our way...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
The trouble is that if we VACUUM and then ANALYZE, we'll often get
back a value very close to 100%, but then the real value may diminish
quite a bit before the next auto-analyze fires. I think if we can
figure out what to do about that problem we'll be well on our way...
It's not so much an issue of when the last auto-analyze was as an issue
of the number of rows in write transactions against that table in the
last X minutes. This is where it really hurts us that
pg_stat_user_tables is not time-based.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
On Tue, Oct 11, 2011 at 5:45 AM, Greg Stark <stark@mit.edu> wrote:
On Mon, Oct 10, 2011 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
My intention was to allow it to consider any covering index. You're
thinking about the cost estimate, which is really entirely different.Is there any reason to consider more than one? I would have expected
the narrowest one to be the best choice. There's something to be said
for using the same index consistently but we already have that problem
and make no attempt to do that. And partial indexes might be better
but then we would already be considering them if their constraints are
satisfied.
You raise a fantastic idea. Use the frequency of use as a factor of an
index in the cost of optimising a query.
We have previously discussed the idea of using the RAM residency of an
index to control the cost. That is difficult to judge.
Using the long term prevalence of usage as a weighting factor makes a
great deal of sense for queries that could potentially utilise
multiple indexes. That information is readily available and directly
applicable. The prevalence of use directly drives RAM residency, so it
makes sense to use the causal factor as input to the cost.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 11, 2011 at 3:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
The trouble is that if we VACUUM and then ANALYZE, we'll often get
back a value very close to 100%, but then the real value may diminish
quite a bit before the next auto-analyze fires. I think if we can
figure out what to do about that problem we'll be well on our way...It's not so much an issue of when the last auto-analyze was as an issue
of the number of rows in write transactions against that table in the
last X minutes. This is where it really hurts us that
pg_stat_user_tables is not time-based.
The number of write transactions in the last X minutes seems pretty
much irrelevant.
What matters is the number of previously-all-visible pages written
since the last vacuum.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Tue, 2011-10-11 at 13:22 -0400, Robert Haas wrote:
The real issue is that the costing estimates need to be accurate, and
that's where the rubber hits the road. Otherwise, even if we pick the
right way to scan the table, we may do silly things up the line when
we go to start constructing the join order. I think we need to beef
up ANALYZE to gather statistics on the fraction of the pages that are
marked all-visible, or maybe VACUUM should gather that information.
The trouble is that if we VACUUM and then ANALYZE, we'll often get
back a value very close to 100%, but then the real value may diminish
quite a bit before the next auto-analyze fires. I think if we can
figure out what to do about that problem we'll be well on our way...
Can you send stats messages to keep track when you unset a bit in the
VM? That might allow it to be more up-to-date.
Regards,
Jeff Davis