Improving count(*)

Started by Simon Riggsover 20 years ago49 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

One of the major complaints is always "Select count(*) is slow".

I have a somewhat broadbrush idea as to how we might do this (for larger
tables).

Previously, we've said that maintaining a running count of the tuples in
a table is hard, or at least costly. However, maintaining a partial
count may be somewhat easier and offer the performance required.

Bearing in mind other RDBMS' approach is to count the number of rows in
an index, their cost is probably about the same as scanning table
blocks/10 very roughly - so the cost is far from zero for them. In order
to get similar performance we would need a technique that involved
scanning a small percentage of the data blocks in a table, typically
less than 10%.

Prelims: As tuples are inserted they go into blocks in a varied
sequence. However, since only VACUUM ever reduces the size of the data
in a block, we notice that blocks eventually fill up. If they are on the
FSM, they are removed. At that point, we could begin keeping track of
the number of live tuples in the block.

So we could imagine an on-disk data structure that is an array of this
struct:
blockid - the block in question
count - the number of live tuples in that block
txnid - the txnid of the last tuple written

When we run a SELECT COUNT(*) we would read this data structure and use
it to build a bitmap of blocks that still need to be scanned to get the
count(*) result. So this would give us a partially cached result for
count(*), which then would require scanning only the last few blocks of
a table to get at the correct answer.

This is beginning to sound very much like the same sort of solution as
the one recently proposed for VACUUM: we keep track of the status of
each block, then use it as a way of speeding things up.

For VACUUM we want to keep a data structure that notes which blocks
require vacuums. For count(*) we want to keep track of which blocks do
not require vacuums, nearly. There are some blocks in the middle that
wouldn't go on either list, true.

So, why not combine the two purposes into a single solution?

We would be able to manage at least 800 data blocks for each block of
this structure, which would mean 1.25 MB per GB of data. For tables with
a reasonably non-random block write pattern the key parts of that could
reside in memory with reasonable ease even for busy tables. In those
cases, it also seems like this technique might lead to the need to scan
only a small percentage of heap blocks - so this would give roughly
equivalent performance to other RDBMS for SELECT count(*). If its a data
structure that we were thinking of maintaining for VACUUM anyway, this
improves the value of the cost we would have to pay to maintain the a
cache data structure.

This is thinking only, I'm not proposing this as a fully worked proposal
nor am I pursuing this myself. I can see immediately that there are some
major obstacles in the way, like MVCC, autovacuum, overheads etc but it
seems worth pointing out a possible angle of attack, since this looks
like it might hit two birds with one stone.

I'm not aiming to start "open season" on crazy ideas for this; this idea
may take months to ruminate into a solution.

Best Regards, Simon Riggs

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Simon Riggs (#1)
Re: Improving count(*)

On Thu, Nov 17, 2005 at 07:28:10PM +0000, Simon Riggs wrote:

One of the major complaints is always "Select count(*) is slow".

I have a somewhat broadbrush idea as to how we might do this (for larger
tables).

It's an interesting idea, but you still run into the issue of
visibility. If two people start a transaction, one of them inserts a
row and then both run a select count(*), they should get different
answers. I just don't see a way that your suggestion could possibly
lead to that result...

There is no unique answer to count(*), it all depends on who is looking
(sounds like relativity :) ). If you can sort that, you're well over
90% of the way.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#3Jonah H. Harris
jonah.harris@gmail.com
In reply to: Martijn van Oosterhout (#2)
Re: Improving count(*)

Simon,
Nice suggestion, I think it's workable but (like all other methods) has
some technical/pseudo-political challenges.
I'm still voting for my old, "Much Ado About COUNT(*)" topic; adding
visibiility to the indexes and counting them like the other RDBMS vendors.
True, it would add storage overhead that several people don't want, but such
as the life of the COUNT(*) discussion for PostgreSQL...
-Jonah

Show quoted text

On 11/17/05, Martijn van Oosterhout <kleptog@svana.org> wrote:

On Thu, Nov 17, 2005 at 07:28:10PM +0000, Simon Riggs wrote:

One of the major complaints is always "Select count(*) is slow".

I have a somewhat broadbrush idea as to how we might do this (for larger
tables).

It's an interesting idea, but you still run into the issue of
visibility. If two people start a transaction, one of them inserts a
row and then both run a select count(*), they should get different
answers. I just don't see a way that your suggestion could possibly
lead to that result...

There is no unique answer to count(*), it all depends on who is looking
(sounds like relativity :) ). If you can sort that, you're well over
90% of the way.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for

someone

else to do the other 95% so you can sue them.

#4Rod Taylor
rbt@rbt.ca
In reply to: Martijn van Oosterhout (#2)
Re: Improving count(*)

On Thu, 2005-11-17 at 20:38 +0100, Martijn van Oosterhout wrote:

On Thu, Nov 17, 2005 at 07:28:10PM +0000, Simon Riggs wrote:

One of the major complaints is always "Select count(*) is slow".

I have a somewhat broadbrush idea as to how we might do this (for larger
tables).

It's an interesting idea, but you still run into the issue of
visibility. If two people start a transaction, one of them inserts a
row and then both run a select count(*), they should get different
answers. I just don't see a way that your suggestion could possibly
lead to that result...

The instant someone touches a block it would no longer be marked as
frozen (vacuum or analyze or other is not required) and count(*) would
visit the tuples in the block making the correct decision at that time.

--

#5Martijn van Oosterhout
kleptog@svana.org
In reply to: Rod Taylor (#4)
Re: Improving count(*)

On Thu, Nov 17, 2005 at 02:55:09PM -0500, Rod Taylor wrote:

On Thu, 2005-11-17 at 20:38 +0100, Martijn van Oosterhout wrote:

It's an interesting idea, but you still run into the issue of
visibility. If two people start a transaction, one of them inserts a
row and then both run a select count(*), they should get different
answers. I just don't see a way that your suggestion could possibly
lead to that result...

The instant someone touches a block it would no longer be marked as
frozen (vacuum or analyze or other is not required) and count(*) would
visit the tuples in the block making the correct decision at that time.

Hmm, so the idea would be that if a block no longer contained any
tuples hidden from any active transaction, you could store the count
and skip reading that page. Ofcourse, as soon as someone UPDATEs a
tuple, that block comes into play again because it would be visible
from some but not other transactions. Then again, for count(*) UPDATEs
are irrelevent.

The other way, storing visibility in the index seems awfully expensive,
since any changes to the tuple would require updating the index. Still,
people have thought about this already, I'm sure the issues are
known...

Have a niceday,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#1)
Re: Improving count(*)

Simon Riggs <simon@2ndquadrant.com> writes:

Bearing in mind other RDBMS' approach is to count the number of rows in
an index, their cost is probably about the same as scanning table
blocks/10 very roughly - so the cost is far from zero for them.

Really? The impression I get is that people who ask for this expect the
answer to be instantaneous, ie they think the system will maintain a
running net total for each table. (In a non-MVCC system that isn't
necessarily an unreasonable thing to do.)

I really can't get excited about adding this level of complexity and
overhead to the system just to support COUNT(*)-with-no-WHERE slightly
better than we do now.

The triggers-and-deltas approach previously proposed seems considerably
more attractive to me, because (1) it's not invasive and (2) you only
have to pay the overhead on tables where you want it.

regards, tom lane

#7Simon Riggs
simon@2ndQuadrant.com
In reply to: Jonah H. Harris (#3)
Re: Improving count(*)

On Thu, 2005-11-17 at 14:46 -0500, Jonah H. Harris wrote:

Nice suggestion, I think it's workable but (like all other methods)
has some technical/pseudo-political challenges.

I'm still voting for my old, "Much Ado About COUNT(*)" topic; adding
visibiility to the indexes and counting them like the other RDBMS
vendors. True, it would add storage overhead that several people
don't want, but such as the life of the COUNT(*) discussion for
PostgreSQL...

[When no idea is good, we take the least-worst path. When we only have
one bad idea, nobody does anything. So we need a couple of ugly but
workable alternatives to flush out the one we will pick to resolve
things.]

As Martijn points out *any* solution must take account of visibility
rules. So abstracting from both these ideas gives shape to the solution,
which must be:
- a data structure smaller than the table itself
- including visibility data, explicitly/implicitly, exactly/lossily
- must serve multiple purposes to ensure the overhead of maintaining the
structure is amortised across many potential benefits, since various
needs share similar solution requirements

Would having visibility on an index be of use to a VACUUM?
Yes, I guess it could be. If we knew when the table was last vacuumed,
we could build a bitmap of changed blocks by scanning the index. We
would only need visibility on *one* of the indexes on a table, so
perhaps index visibility could be an option rather than a
one-size-fits-all.

Adding visibility to an index would add substantial bulk to any index.
If we could do this at the same time as adding leading key, full field
compression (*not* prefix compression), then it might be worth doing.

I would also note that DELETE would need to touch all visible index
rows, which currently is not required for btree indexes. (But as we
already noted, any solution must include visibility data and so any
solution must update some data structure on delete).

Index-only plans could help with various GROUP BY and join queries also,
so it certainly is attractive, though costly.

Best Regards, Simon Riggs

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#6)
Re: Improving count(*)

On Thu, 2005-11-17 at 16:34 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

Bearing in mind other RDBMS' approach is to count the number of rows in
an index, their cost is probably about the same as scanning table
blocks/10 very roughly - so the cost is far from zero for them.

Really? The impression I get is that people who ask for this expect the
answer to be instantaneous, ie they think the system will maintain a
running net total for each table. (In a non-MVCC system that isn't
necessarily an unreasonable thing to do.)

Yeh. I think Informix keeps a running total, IIRC, but certainly Oracle
and various others do not.

People probably have given the impression that count(*) is
instantaneous, but that doesn't mean it actually is - they're just
talking up the problems of pg.

I really can't get excited about adding this level of complexity and
overhead to the system just to support COUNT(*)-with-no-WHERE slightly
better than we do now.

Well, I was pointing out the cross-over with the requirements for a
faster VACUUM also. Taken together, it might be a winner.

The triggers-and-deltas approach previously proposed seems considerably
more attractive to me, because (1) it's not invasive and (2) you only
have to pay the overhead on tables where you want it.

This would need to either be optional whichever way we did it, just as
with the creation of an index. I also think that taking the Oracle path
of adding new features in functions/packages would be a good thing,
rather than over-burdening the parser constantly with new syntax to cope
with.

Did the triggers-and-deltas approach cope with MVCC correctly?

Best Regards, Simon Riggs

#9Martijn van Oosterhout
kleptog@svana.org
In reply to: Simon Riggs (#7)
Re: Improving count(*)

On Thu, Nov 17, 2005 at 09:34:08PM +0000, Simon Riggs wrote:

Adding visibility to an index would add substantial bulk to any index.
If we could do this at the same time as adding leading key, full field
compression (*not* prefix compression), then it might be worth doing.

I think the single biggest problem with visibility-in-index is that
there is no link from the tuple to the index. So if you update a tuple,
the only way to update the index is the start from the top and go down
until you find it. If your table/index is of any size, you can imagine
the overhead will kill you.

Now, lets say you add a field to the tuple which you the position of
the index entry. You can only reasonably do this for one index, say the
primary key. Now you have a two-way link the updating becomes much
quicker, at the cost of even more overhead.

Doing it only for one index per table may be sensible anyway since you
don't really want to store visibility any more times than necessary.

I would also note that DELETE would need to touch all visible index
rows, which currently is not required for btree indexes. (But as we
already noted, any solution must include visibility data and so any
solution must update some data structure on delete).

Remember, UPDATE = DELETE + UPDATE, so you have to handle all updates
too. Inserts are the only easy case (well, except the fact that they
have to point to eachother. locking nastyness).

Index-only plans could help with various GROUP BY and join queries also,
so it certainly is attractive, though costly.

Only in cases where you don't need the data (ie EXISTS), otherwise you
still need the tuple.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Martijn van Oosterhout (#9)
Re: Improving count(*)

In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL
Server) the leaf level of the narrowest index on the table is scanned,
following a linked list of leaf pages. Leaf pages can be pretty dense
under Sybase, because they do use prefix compression. A count(*)
on a table with 100 million rows is going to take a few minutes, but it
is going to be at least an order of magnitude faster than a data page
scan -- maybe two orders of magnitude faster.

What I don't understand is why people need to do such things so
frequently that it's a major issue, rather than an occassional
annoyance. A solution which not only helped the count(*) issue
but also allowed index scans to skip the trip to the data page to
see if it's an active version seems like it would boost performance
overall. As pointed out elsewhere, it could also allow new
techniques for vacuum which could be beneficial.

My view is that when tables get so big that a count(*) takes that
much time, you don't typiclally need an EXACT count anyway --
you could normally check the statistics from your nightly analyze.

-Kevin

Tom Lane <tgl@sss.pgh.pa.us> >>>

Simon Riggs <simon@2ndquadrant.com> writes:

Bearing in mind other RDBMS' approach is to count the number of rows

in

an index, their cost is probably about the same as scanning table
blocks/10 very roughly - so the cost is far from zero for them.

Really? The impression I get is that people who ask for this expect the
answer to be instantaneous, ie they think the system will maintain a
running net total for each table. (In a non-MVCC system that isn't
necessarily an unreasonable thing to do.)

I really can't get excited about adding this level of complexity and
overhead to the system just to support COUNT(*)-with-no-WHERE slightly
better than we do now.

The triggers-and-deltas approach previously proposed seems considerably
more attractive to me, because (1) it's not invasive and (2) you only
have to pay the overhead on tables where you want it.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#9)
Re: Improving count(*)

Martijn van Oosterhout <kleptog@svana.org> writes:

Now, lets say you add a field to the tuple which you the position of
the index entry. You can only reasonably do this for one index, say the
primary key. Now you have a two-way link the updating becomes much
quicker, at the cost of even more overhead.

I think this is fairly infeasible --- consider what it does to the cost
and (lack of) atomicity of an index page split, for instance.

regards, tom lane

#12Simon Riggs
simon@2ndQuadrant.com
In reply to: Kevin Grittner (#10)
Re: Improving count(*)

On Thu, 2005-11-17 at 16:30 -0600, Kevin Grittner wrote:

In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL
Server) the leaf level of the narrowest index on the table is scanned,
following a linked list of leaf pages. Leaf pages can be pretty dense
under Sybase, because they do use prefix compression. A count(*)
on a table with 100 million rows is going to take a few minutes, but it
is going to be at least an order of magnitude faster than a data page
scan -- maybe two orders of magnitude faster.

What I don't understand is why people need to do such things so
frequently that it's a major issue, rather than an occassional
annoyance.

Agreed, completely. (And it galls me to agree with multiple, potentially
opposed opinions on my own thread).

The trouble is, people moan and constantly. Perhaps we should stick to
our guns and say, why do you care? From here, I think we should say,
"show me an application package that needs this so badly we'll change
PostgreSQL just for them". Prove it and we'll do it. Kinda polite in the
TODO, but I think we should put something in there that says "things we
haven't yet had any good reason to improve".

A solution which not only helped the count(*) issue
but also allowed index scans to skip the trip to the data page to
see if it's an active version seems like it would boost performance
overall. As pointed out elsewhere, it could also allow new
techniques for vacuum which could be beneficial.

My view is that when tables get so big that a count(*) takes that
much time, you don't typiclally need an EXACT count anyway --
you could normally check the statistics from your nightly analyze.

Amen.

From here, another proposal. We have a GUC called count_uses_estimate

that is set to off by default. If set to true, then a count(*) will use
the planner logic to estimate number of rows in the table and return
that as the answer, rather than actually count the row. Unless analyze
statistics are not available, in which case it does the real count.

Best Regards, Simon Riggs

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#12)
Re: Improving count(*)

Simon Riggs <simon@2ndquadrant.com> writes:

From here, another proposal. We have a GUC called count_uses_estimate
that is set to off by default. If set to true, then a count(*) will use
the planner logic to estimate number of rows in the table and return
that as the answer, rather than actually count the row.

Ugh. Why not just provide a function to retrieve the planner estimate,
but *not* call it count(*)? It would fit nicely with the contrib/dbsize
stuff (or I should say, the stuff formerly in dbsize...)

regards, tom lane

#14Dann Corbit
DCorbit@connx.com
In reply to: Tom Lane (#13)
Re: Improving count(*)

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Tom Lane
Sent: Thursday, November 17, 2005 4:17 PM
To: Simon Riggs
Cc: Kevin Grittner; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Improving count(*)

Simon Riggs <simon@2ndquadrant.com> writes:

From here, another proposal. We have a GUC called

count_uses_estimate

that is set to off by default. If set to true, then a count(*) will

use

the planner logic to estimate number of rows in the table and return
that as the answer, rather than actually count the row.

Ugh. Why not just provide a function to retrieve the planner

estimate,

but *not* call it count(*)? It would fit nicely with the

contrib/dbsize

stuff (or I should say, the stuff formerly in dbsize...)

An estimate of the number of rows would be nice to have.
A function called cardinality_estimate() or something of that nature
seems more natural than count(*)

#15Mark Mielke
mark@mark.mielke.cc
In reply to: Martijn van Oosterhout (#5)
Re: Improving count(*)

Probably obvious, and already mentioned, count(*) isn't the only query
that would benefit from visibility information in the index. It's
rather unfortunate that MVCC requires table lookups, when all values
queried or matched are found in the index key itself. The idea of an
all index table is appealing to me for some applications (Oracle
supports this, I believe?). In effect, a sorted, and searchable table,
that doesn't double in size, just because it is indexed.

So what's the real cost here? Larger index size to include the
visibility information (optional?) and UPDATE/DELETE need to
set expirations on the index rows as well as the table rows,
for only the indexes that have visibility information? A flag
in the table structure in memory to know whether the table has
any indexes with visibility information that require updating?

It doesn't sound that bad to me. Perhaps I just don't know better? :-)

The per-block counts idea, seem useful to me. A database that
frequently modifies every page of an index, would seem
inefficient. What if the per-block counts were kept, but associated
with index blocks instead of table blocks, for indexes that maintain
visibility information? The per-block counts only need to be able to
provide enough information for the reader to know whether the count is
valid, or invalid, perhaps updated at vacuum time?

The idea of a partial index, that keeps this information (visibility
information + per-block live row count cache) seems fascinating to
me. Many new optimization opportunities to hang myself with... :-)

Maybe PostgreSQL could be FASTER than other databases?

Or are we just dreaming?

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#16Gavin Sherry
swm@linuxworld.com.au
In reply to: Simon Riggs (#12)
Re: Improving count(*)

On Fri, 18 Nov 2005, Simon Riggs wrote:

From here, another proposal. We have a GUC called count_uses_estimate

that is set to off by default. If set to true, then a count(*) will use
the planner logic to estimate number of rows in the table and return
that as the answer, rather than actually count the row. Unless analyze
statistics are not available, in which case it does the real count.

I'm finishing off a tablesample patch a grad student on #postgresql was
working on.

template1=# select count(*)*100 from a tablesample system(1) repeatable
(2);
?column?
----------
8371100
(1 row)

Time: 6366.757 ms
template1=# select count(*)*50 from a tablesample system(2) repeatable
(11);
?column?
----------
8453550
(1 row)

Time: 10521.871 ms
template1=# select count(*)*10 from a tablesample system(10) repeatable
(3);
?column?
----------
8314350
(1 row)

Time: 28744.498 ms
template1=# select count(*) from a;
count
---------
8388608
(1 row)

Time: 33897.857 ms

Seems like a better solution. I can finish the patch pretty soon. I need
to contact the original author, who has disappeared, but I'll send it over
to you.

Gavin

#17Bruce Momjian
bruce@momjian.us
In reply to: Simon Riggs (#1)
Re: Improving count(*)

I think the important thing to keep track of is a single bit:

Which the following apply?

a) all of the tuples in this block are visible

b) at least one tuple in this block is in-doubt and may need to be vacuumed

That isn't enough to calculate count(*) on its own but it means you could scan
an index like any other database and avoid checking every single tuple. If the
tuple lies in a block that is known not to contain any in-doubt records then
the tuple can be counted immediately.

This has the advantage that it helps with a lot more cases than a simple
"select count(*) from tab". As Tom pointed out that case can be tackled more
directly with a O(1) solution anyways. More complex cases are where fast index
scans are really important.

So you could do "SELECT count(*) FROM tab WHERE a > ?" and have it scan an
index on <a> without having to check the visibility of every single tuple. It
only has to check the visibility of tuples that lie on blocks that contain at
least one in-doubt tuple.

You could even imagine using the new bitmap index scan machinery to combine
these bits of information too.

And this is exactly the same information that vacuum needs. Once vacuum has
run and cleaned out a block it knows whether there are any records that are
still in-doubt or whether every record it left is universally visible. It can
note that and allow future vacuums to skip that block if no deletes or inserts
have changed that bit since.

--
greg

#18Jeff Davis
pgsql@j-davis.com
In reply to: Mark Mielke (#15)
Re: Improving count(*)

mark@mark.mielke.cc wrote:

Probably obvious, and already mentioned, count(*) isn't the only query
that would benefit from visibility information in the index. It's
rather unfortunate that MVCC requires table lookups, when all values
queried or matched are found in the index key itself. The idea of an
all index table is appealing to me for some applications (Oracle
supports this, I believe?). In effect, a sorted, and searchable table,
that doesn't double in size, just because it is indexed.

I've been thinking about that lately also. It seems like it would be
useful to have the entire table in a Btree in some situations, but there
are some drawbacks:
(1) probably hard to implement
(2) only works with one key
(3) since tuples would not be at a fixed location on disk, you can't
just use a noraml secondary index. The secondary index would have to
point to the key of the tuple in the Btree table, and then do another
lookup in the actual table.
(4) of course, insert performance goes down due to btree maintenence

Range queries (or queries on equality when there are many duplicates)
might benefit a lot. But I would think that in many situations, the fact
that you could only have one key indexed on the table would counteract
those benefits.

I haven't noticed any recent comments by the hackers on this subject.
Maybe they have some more details? I think MS SQL has something similar
to that also.

Regards,
Jeff Davis

#19Simon Riggs
simon@2ndQuadrant.com
In reply to: Gavin Sherry (#16)
Re: Improving count(*)

On Fri, 2005-11-18 at 11:51 +1100, Gavin Sherry wrote:

Seems like a better solution. I can finish the patch pretty soon. I need
to contact the original author, who has disappeared, but I'll send it over
to you.

Sounds good. I wondered where he'd gone to.

Sampling would be useful for 8,2

Best Regards, Simon Riggs

#20Varun Kacholia
kacholia@gmail.com
In reply to: Simon Riggs (#19)
Re: Improving count(*)

Seems like a better solution. I can finish the patch pretty soon. I need
to contact the original author, who has disappeared, but I'll send it over
to you.

Sounds good. I wondered where he'd gone to.

Still here :-)
Just got swamped with too much work that the tablesample
patch got paged out. Gavin has a working version of the patch.
I can give a hand if need be.

Thanks

#21Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Varun Kacholia (#20)
#22Tino Wildenhain
tino@wildenhain.de
In reply to: Zeugswetter Andreas SB SD (#21)
#23Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tino Wildenhain (#22)
#24Tino Wildenhain
tino@wildenhain.de
In reply to: Zeugswetter Andreas SB SD (#23)
#25Steve Wampler
swampler@noao.edu
In reply to: Tom Lane (#13)
#26Richard Huxton
dev@archonet.com
In reply to: Simon Riggs (#1)
#27Merlin Moncure
merlin.moncure@rcsonline.com
In reply to: Richard Huxton (#26)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Huxton (#26)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Merlin Moncure (#27)
#30Gregory Maxwell
gmaxwell@gmail.com
In reply to: Merlin Moncure (#27)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#28)
#32Mark Mielke
mark@mark.mielke.cc
In reply to: Richard Huxton (#26)
#33Josh Berkus
josh@agliodbs.com
In reply to: Alvaro Herrera (#31)
#34Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Josh Berkus (#33)
#35Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Heikki Linnakangas (#34)
#36Josh Berkus
josh@agliodbs.com
In reply to: Heikki Linnakangas (#34)
#37Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Josh Berkus (#36)
#38Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Nicolas Barbier (#35)
#39Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: Heikki Linnakangas (#38)
#40Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Simon Riggs (#12)
#41Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Gregory Maxwell (#30)
#42Gregory Maxwell
gmaxwell@gmail.com
In reply to: Jim Nasby (#41)
#43Bruce Momjian
bruce@momjian.us
In reply to: Gregory Maxwell (#42)
#44Bruce Momjian
bruce@momjian.us
In reply to: Mark Mielke (#32)
#45Bruce Momjian
bruce@momjian.us
In reply to: Jim Nasby (#41)
#46Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#44)
#47Mark Mielke
mark@mark.mielke.cc
In reply to: Bruce Momjian (#44)
#48Josh Berkus
josh@agliodbs.com
In reply to: Nicolas Barbier (#39)
#49Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#44)