So, is COUNT(*) fast now?

Started by Robert Haasover 14 years ago49 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Laments at:

http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F
http://wiki.postgresql.org/wiki/Slow_Counting

I tried this on my MacBook Pro this morning, using pgbench -i -s 500
to create a database about 7.5GB in size, and then using "SELECT
sum(1) FROM pgbench_accounts" as a test query, on a build WITHOUT
--enable-cassert. This machine has 4GB of memory, and I set
shared_buffers = 400MB. (No, I'm not sure whether that's the optimal
setting for shared_buffers for this machine.)

With enable_indexonlyscan = false, times for this query are: 96799.747
ms, 89108.987 ms, 114433.664 ms.
With enable_indexonlyscan = true, times were: 16779.325 ms, 16537.726
ms, 16703.229 ms.

That's about six times faster. It's worth noting that the
pgbench_accounts table has relatively narrow rows. On a table with
wider rows (but not so wide that they get toasted and become narrow
again), the benefit might be more. On the other hand, this is also a
table that's just been vacuumed, and you've got the happy case where
the table (6404 MB) does not fit in memory but but the index (1071 MB)
does. What happens on a smaller test case? Here are the results at
scale factor = 100:

enable_indexonlyscan = false times: 1774.945 ms, 1784.985 ms, 1836.099 ms
enable_indexonlyscan = true times: 1450.445 ms, 1452.407 ms, 1452.426 ms

That's about a 23% speedup. At this scale factor, everything fits
into memory, but the index by itself (214 MB) fits into memory while
the table (1281 MB) does not. Let's crank the scale factor down some
more. Here are the results at scale_factor = 20:

enable_indexonlyscan = false times: 352.213 ms, 353.988 ms, 350.859 ms
enable_indexonlyscan = true times: 300.623 ms, 301.355 ms, 302.346 ms

Now the entire database fits into shared_buffers, but we're still
seeing a 17% speedup. But this turns out to misleading. The
ring-buffer logic is actually preventing shared_buffers from getting
all of the heap blocks in cache quickly. If I run the query over and
over again until the whole table actually makes it into shared
buffers, the sequential scan gets much faster:

enable_indexonlyscan = false times after lots and lots of prewarming:
215.487 ms, 219.006 ms, 218.490 ms

That's a bit disappointing - it's now more than a third faster to do
the sequential scan, even though the sequential scan has to touch six
times as many blocks (at scale factor 20, index is 43 MB, table is 256
MB) all of which are in cache. Of course, touching that many fewer
blocks does have some advantages if there is concurrent activity on
the system, but it still seems unfortunate that the ratio of runtime
to blocks touched is more than 8x higher for the index-only case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: So, is COUNT(*) fast now?

Robert Haas <robertmhaas@gmail.com> writes:

That's a bit disappointing - it's now more than a third faster to do
the sequential scan, even though the sequential scan has to touch six
times as many blocks (at scale factor 20, index is 43 MB, table is 256
MB) all of which are in cache. Of course, touching that many fewer
blocks does have some advantages if there is concurrent activity on
the system, but it still seems unfortunate that the ratio of runtime
to blocks touched is more than 8x higher for the index-only case.

I don't know why you'd imagine that touching an index is free, or even
cheap, CPU-wise. The whole point of the index-only optimization is to
avoid I/O. When you try it on a case where there's no I/O to be saved,
*and* no shared-buffers contention to be avoided, there's no way it's
going to be a win.

regards, tom lane

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: So, is COUNT(*) fast now?

On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

That's a bit disappointing - it's now more than a third faster to do
the sequential scan, even though the sequential scan has to touch six
times as many blocks (at scale factor 20, index is 43 MB, table is 256
MB) all of which are in cache.  Of course, touching that many fewer
blocks does have some advantages if there is concurrent activity on
the system, but it still seems unfortunate that the ratio of runtime
to blocks touched is more than 8x higher for the index-only case.

I don't know why you'd imagine that touching an index is free, or even
cheap, CPU-wise.  The whole point of the index-only optimization is to
avoid I/O.  When you try it on a case where there's no I/O to be saved,
*and* no shared-buffers contention to be avoided, there's no way it's
going to be a win.

Well, call me naive, but I would have thought touching six times less
data would make the operation run faster, not slower.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: So, is COUNT(*) fast now?

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know why you'd imagine that touching an index is free, or even
cheap, CPU-wise. �The whole point of the index-only optimization is to
avoid I/O. �When you try it on a case where there's no I/O to be saved,
*and* no shared-buffers contention to be avoided, there's no way it's
going to be a win.

Well, call me naive, but I would have thought touching six times less
data would make the operation run faster, not slower.

It's not "touching six times less data". It's touching the exact same
number of tuples either way, just index tuples in one case and heap
tuples in the other. You've arranged the test case so that all these
tuples are in shared buffers already, so there's no data movement to be
avoided. What this test case proves is that btree's overhead per index
tuple touched is significantly more than the cost of the fastest path
through HeapTupleSatisfiesMVCC, which I don't find surprising
considering how much sweat has been expended on that code path over the
years.

(It may well be that it's not even btree at fault but the per-tuple
visits to the visibility map ... did you do any oprofiling yet?)

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: So, is COUNT(*) fast now?

On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know why you'd imagine that touching an index is free, or even
cheap, CPU-wise.  The whole point of the index-only optimization is to
avoid I/O.  When you try it on a case where there's no I/O to be saved,
*and* no shared-buffers contention to be avoided, there's no way it's
going to be a win.

Well, call me naive, but I would have thought touching six times less
data would make the operation run faster, not slower.

It's not "touching six times less data".  It's touching the exact same
number of tuples either way, just index tuples in one case and heap
tuples in the other.

Yeah, but it works out to fewer pages.

You've arranged the test case so that all these
tuples are in shared buffers already, so there's no data movement to be
avoided.  What this test case proves is that btree's overhead per index
tuple touched is significantly more than the cost of the fastest path
through HeapTupleSatisfiesMVCC, which I don't find surprising
considering how much sweat has been expended on that code path over the
years.

I think HeapTupleSatisfiesMVCC is probably being skipped anyway in
this case, since all the heap pages should be PD_ALL_VISIBLE.

(It may well be that it's not even btree at fault but the per-tuple
visits to the visibility map ... did you do any oprofiling yet?)

No, but I think that might be a good idea. Maybe I'll go do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: So, is COUNT(*) fast now?

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What this test case proves is that btree's overhead per index
tuple touched is significantly more than the cost of the fastest path
through HeapTupleSatisfiesMVCC, which I don't find surprising
considering how much sweat has been expended on that code path over the
years.

I think HeapTupleSatisfiesMVCC is probably being skipped anyway in
this case, since all the heap pages should be PD_ALL_VISIBLE.

Proves my point ;-) ... you're comparing a code path that's been beat on
for *years* with one that just got written.

regards, tom lane

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: So, is COUNT(*) fast now?

On Fri, Oct 21, 2011 at 2:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I think HeapTupleSatisfiesMVCC is probably being skipped anyway in
this case, since all the heap pages should be PD_ALL_VISIBLE.

Proves my point ;-) ... you're comparing a code path that's been beat on
for *years* with one that just got written.

I know. I wrote a chunk of it. :-) My point is just that it'd be
nice to make it better.

Anyhow, here's the scoop. On my desktop machine running F14, running
SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of
oprofile data:

176830 13.0801 postgres postgres ExecProject
170028 12.5770 postgres postgres
IndexOnlyNext
96631 7.1478 postgres postgres
visibilitymap_test
86019 6.3628 postgres postgres
advance_aggregates
74366 5.5009 postgres postgres ExecScan
72428 5.3575 postgres postgres
ExecClearTuple
68483 5.0657 postgres postgres btgettuple
60614 4.4836 postgres postgres
advance_transition_function
59680 4.4145 postgres postgres ExecProcNode
52295 3.8683 postgres postgres
_bt_checkkeys
52078 3.8522 libc-2.12.90.so libc-2.12.90.so
__memcpy_sse2
49548 3.6651 postgres postgres
index_getnext_tid
48265 3.5702 postgres postgres
ExecEvalConst
42989 3.1799 postgres postgres _bt_next
40544 2.9990 postgres postgres _bt_readpage
35162 2.6009 no-vmlinux no-vmlinux /no-vmlinux
33639 2.4883 postgres postgres
MemoryContextReset

And without index-only scans. but everything in shared_buffers:

169515 18.4261 postgres postgres ExecProject
94827 10.3076 postgres postgres
heapgettup_pagemode
84850 9.2231 postgres postgres
advance_aggregates
57998 6.3043 postgres postgres
advance_transition_function
55638 6.0478 postgres postgres
ExecEvalConst
53684 5.8354 postgres postgres heapgetpage
51411 5.5883 postgres postgres ExecScan
48387 5.2596 postgres postgres ExecProcNode
44129 4.7968 postgres postgres
ExecStoreTuple
30759 3.3435 postgres postgres heap_getnext
25923 2.8178 postgres postgres SeqNext
24145 2.6245 postgres postgres
CheckForSerializableConflictOut
23155 2.5169 postgres postgres ExecAgg
18864 2.0505 postgres postgres
heap_page_prune_opt
18784 2.0418 no-vmlinux no-vmlinux /no-vmlinux

The index-only scan takes about 385 ms, while the non-index-only
version takes about 284 ms.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#7)
Re: So, is COUNT(*) fast now?

On Fri, Oct 21, 2011 at 3:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:

[ oprofile results ]

*grovels through the line-by-line results*

Hmm, I guess there is a bit of a hotspot in StoreIndexTuple, which is
probably being folded into IndexOnlyNext in the per-function timings:

ExecClearTuple(slot);
for (i = 0; i < nindexatts; i++)
values[i] = index_getattr(itup, i + 1, itupdesc, &isnull[i]);
ExecStoreVirtualTuple(slot);

If I'm reading these results right, that section is about 3% of the
total number of samples.

Also, this line is kind of expensive:

if (!visibilitymap_test(scandesc->heapRelation,
ItemPointerGetBlockNumber(tid),
&node->ioss_VMBuffer))

Around 2%. But I don't see any way to avoid that, or even make it cheaper.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: So, is COUNT(*) fast now?

Robert Haas <robertmhaas@gmail.com> writes:

Anyhow, here's the scoop. On my desktop machine running F14, running
SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of
oprofile data:

176830 13.0801 postgres postgres ExecProject

Hm, that's weird. In both these cases, I'd have expected that
ExecProject would get optimized away thanks to selection of a physical
tlist for the scan node. Wonder if that got broken ...

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
Re: So, is COUNT(*) fast now?

Robert Haas <robertmhaas@gmail.com> writes:

Hmm, I guess there is a bit of a hotspot in StoreIndexTuple, which is
probably being folded into IndexOnlyNext in the per-function timings:

ExecClearTuple(slot);
for (i = 0; i < nindexatts; i++)
values[i] = index_getattr(itup, i + 1, itupdesc, &isnull[i]);
ExecStoreVirtualTuple(slot);

I had wondered whether it'd be worth optimizing that along the lines of
slot_getallattrs(). But most indexes probably have only one column,
or anyway not enough to make for a useful savings.

regards, tom lane

#11Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: So, is COUNT(*) fast now?

On Fri, Oct 21, 2011 at 3:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Anyhow, here's the scoop.  On my desktop machine running F14, running
SELECT sum(1) FROM pgbench_accounts in a tight loop, 60 s worth of
oprofile data:

176830   13.0801  postgres                 postgres                 ExecProject

Hm, that's weird.  In both these cases, I'd have expected that
ExecProject would get optimized away thanks to selection of a physical
tlist for the scan node.  Wonder if that got broken ...

If it did, it looks like it wasn't recent. I set up the same test
case on my MacBook using REL9_1_STABLE and REL9_0_STABLE and set a
breakpoint on ExecProject(). Both back-branches appear to also call
ExecProject() for every tuple.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#5)
Re: So, is COUNT(*) fast now?

On Fri, Oct 21, 2011 at 11:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know why you'd imagine that touching an index is free, or even
cheap, CPU-wise.  The whole point of the index-only optimization is to
avoid I/O.  When you try it on a case where there's no I/O to be saved,
*and* no shared-buffers contention to be avoided, there's no way it's
going to be a win.

Well, call me naive, but I would have thought touching six times less
data would make the operation run faster, not slower.

It's not "touching six times less data".  It's touching the exact same
number of tuples either way, just index tuples in one case and heap
tuples in the other.

Yeah, but it works out to fewer pages.

But since those pages are already in RAM, why would it matter all that
much? (Other than in the case of highly concurrent access, which you
don't seem to be testing?)

One of Tom's commits that made it not lock the same index page over
and over again (once for each tuple on it) made me think it should be
much faster than the seq scan, but a bit of random flailing about
convinced me that any saving from this were compensated for by the
high over head of FunctionCall2Coll and all of the hokey-pokey that
that call entails, which a seqscan can skip entirely.

If count(*) could cause the index-only scan to happen in physical
order of the index, rather than logical order, that might be a big
win. Both for all in memory and for not-all-in-memory.

Cheers,

Jeff

#13Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: So, is COUNT(*) fast now?

On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote:

On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know why you'd imagine that touching an index is free, or even
cheap, CPU-wise. The whole point of the index-only optimization is to
avoid I/O. When you try it on a case where there's no I/O to be saved,
and no shared-buffers contention to be avoided, there's no way it's
going to be a win.

Well, call me naive, but I would have thought touching six times less
data would make the operation run faster, not slower.

It's not "touching six times less data". It's touching the exact same
number of tuples either way, just index tuples in one case and heap
tuples in the other.

Yeah, but it works out to fewer pages.

But access to those is not sequential. I guess if you measure cache hit ratios
the index scan will come out significantly worse.

Andres

#14desmodemone
desmodemone@gmail.com
In reply to: Andres Freund (#13)
Re: So, is COUNT(*) fast now?

2011/10/22 Andres Freund <andres@anarazel.de>

On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote:

On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't know why you'd imagine that touching an index is free, or

even

cheap, CPU-wise. The whole point of the index-only optimization is

to

avoid I/O. When you try it on a case where there's no I/O to be

saved,

and no shared-buffers contention to be avoided, there's no way it's
going to be a win.

Well, call me naive, but I would have thought touching six times less
data would make the operation run faster, not slower.

It's not "touching six times less data". It's touching the exact same
number of tuples either way, just index tuples in one case and heap
tuples in the other.

Yeah, but it works out to fewer pages.

But access to those is not sequential. I guess if you measure cache hit
ratios
the index scan will come out significantly worse.

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

"But access to those is not sequential" yes, I am agree.
In my opinion the problem is that. If the query needs to scan all the b-tree
index without to
access the table rows, the better way to read the index is like sequential
one,
in fact , query like count(*) or other not need the data are in "order" so I
think we
could read all blocks (better, "only" the leaf blocks) without to touching
too much the branch blocks.

For example query like this :

select column_a from table ;

is better to read the data from indexes like sequential

For query like this :

select column_a from table order by column_a ;

is better to read the data from indexes in range scan from root block to
first branch blocks and their leaf blocks, so we could "save"
the sorting.

Mat

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#13)
Re: So, is COUNT(*) fast now?

Andres Freund <andres@anarazel.de> writes:

On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote:

On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's not "touching six times less data". It's touching the exact same
number of tuples either way, just index tuples in one case and heap
tuples in the other.

Yeah, but it works out to fewer pages.

But access to those is not sequential. I guess if you measure cache hit ratios
the index scan will come out significantly worse.

Huh? In the case he's complaining about, the index is all in RAM.
Sequentiality of access is not an issue (at least not at the page
level --- within a page I suppose there could be cache-line effects).

regards, tom lane

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#15)
Re: So, is COUNT(*) fast now?

On Saturday, October 22, 2011 05:20:26 PM Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote:

On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's not "touching six times less data". It's touching the exact same
number of tuples either way, just index tuples in one case and heap
tuples in the other.

Yeah, but it works out to fewer pages.

But access to those is not sequential. I guess if you measure cache hit
ratios the index scan will come out significantly worse.

Huh? In the case he's complaining about, the index is all in RAM.
Sequentiality of access is not an issue (at least not at the page
level --- within a page I suppose there could be cache-line effects).

I was talking about L2/L3 caches...

Andres

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#16)
Re: So, is COUNT(*) fast now?

Andres Freund <andres@anarazel.de> writes:

On Saturday, October 22, 2011 05:20:26 PM Tom Lane wrote:

Huh? In the case he's complaining about, the index is all in RAM.
Sequentiality of access is not an issue (at least not at the page
level --- within a page I suppose there could be cache-line effects).

I was talking about L2/L3 caches...

Yeah, but unless you think cache lines cross page boundaries (and we do
take pains to align the buffers on 8K addresses), there's not going to
be any sequentiality effect. Even if there were, it would only apply
if the pages chanced to be adjacent in the buffer array, and there is no
reason to expect that to be the case, for either seqscans or indexscans.

regards, tom lane

#18Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#12)
Re: So, is COUNT(*) fast now?

On Fri, Oct 21, 2011 at 10:57 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Yeah, but it works out to fewer pages.

But since those pages are already in RAM, why would it matter all that
much?  (Other than in the case of highly concurrent access, which you
don't seem to be testing?)

Well, because memory access takes time, and accessing more memory
takes more time. In the testing that I've done recently, performance
on in-memory workloads seems to be extremely sensitive to memory
speed, so you'd think that cutting down on memory access would be a
win.

One of Tom's commits that made it not lock the same index page over
and over again (once for each tuple on it) made me think it should be
much faster than the seq scan, but a bit of random flailing about
convinced me that any saving from this were compensated for by the
high over head of FunctionCall2Coll and all of the hokey-pokey that
that call entails, which a seqscan can skip entirely.

Huh. Not sure whether that's significant or not.

If count(*) could cause the index-only scan to happen in physical
order of the index, rather than logical order, that might be a big
win.  Both for all in memory and for not-all-in-memory.

That's an interesting point. I sort of assumed that would only help
for not-all-in-memory, but maybe not. The trouble is that I think
there are some problematic concurrency issues there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In reply to: Robert Haas (#18)
Re: So, is COUNT(*) fast now?

----- Цитат от Tom Lane (tgl@sss.pgh.pa.us), на 22.10.2011 в 19:19 -----

Andres Freund <andres@anarazel.de> writes:

On Saturday, October 22, 2011 05:20:26 PM Tom Lane wrote:

Huh? In the case he's complaining about, the index is all in RAM.
Sequentiality of access is not an issue (at least not at the page
level --- within a page I suppose there could be cache-line effects).

I was talking about L2/L3 caches...

Yeah, but unless you think cache lines cross page boundaries (and we do
take pains to align the buffers on 8K addresses), there's not going to
be any sequentiality effect. Even if there were, it would only apply
if the pages chanced to be adjacent in the buffer array, and there is no
reason to expect that to be the case, for either seqscans or indexscans.

regards, tom lane

I worked on in-memory hash stables of parrot project. It is not the same as
btrees but the structure and memory layout are not that different - tupples are
going into pages etc.

I have benchmarked iterating over such hash tables - sequential scan
of the same table comes 20-30% faster than scan ordered by the hash value
of the key. And this is overhead only of CPU cache lines - the numbers of
instructions executed on the processor are pretty much the same (counted by
valgrind).

So I do think that if we have sequential scan of indexes (physical order) it
will help even when all the data is in the buffercache.

Best regards

--
Luben Karavelov

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#18)
Re: So, is COUNT(*) fast now?

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Oct 21, 2011 at 10:57 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

If count(*) could cause the index-only scan to happen in physical
order of the index, rather than logical order, that might be a big
win. �Both for all in memory and for not-all-in-memory.

That's an interesting point. I sort of assumed that would only help
for not-all-in-memory, but maybe not. The trouble is that I think
there are some problematic concurrency issues there.

Yeah. We managed to make physical-order scanning work for VACUUM
because it's okay if VACUUM sometimes sees the same index tuple twice;
it'll just make the same decision about (not) deleting it. That will
not fly for regular querying.

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#11)
#22Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#7)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#22)
#24Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#8)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#24)
#26Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#10)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#26)
#28Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#27)
#29Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Kevin Grittner (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#28)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#30)
#32Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#31)
#33Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#31)
#34Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Josh Berkus (#32)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#33)
#36Wolfgang Wilhelm
wolfgang20121964@yahoo.de
In reply to: Tom Lane (#31)
#37Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#31)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#35)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#38)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#39)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#41)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#42)
#44Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#43)
#45Anssi Kääriäinen
anssi.kaariainen@thl.fi
In reply to: Robert Haas (#1)
#46Anssi Kääriäinen
anssi.kaariainen@thl.fi
In reply to: Anssi Kääriäinen (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Anssi Kääriäinen (#45)
#48Anssi Kääriäinen
anssi.kaariainen@thl.fi
In reply to: Robert Haas (#47)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Anssi Kääriäinen (#48)