Quick estimate of num of rows & table size

Started by Thalis Kalfigkopoulosover 13 years ago6 messagesgeneral
Jump to latest
#1Thalis Kalfigkopoulos
tkalfigo@gmail.com

Hi all,

I read somewhere that the following query gives a quick estimate of the #
of rows in a table regardless of the table's size (which would matter in a
simple SELECT count(*)?):

SELECT (CASE WHEN reltuples > 0 THEN
pg_relation_size('mytable')/(8192*relpages/reltuples)
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

If relpages & reltuples are recorded accurately each time VACUUM is run,
wouldn't it be the same to just grab directly the value of reltuples like:

SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

In the same manner, are pg_relation_size('mytable') and 8192*relpages the
same?

I run both assumptions against a freshly VACUUMed table and they seem
correct.

TIA,
Thalis K.

#2Lonni J Friedman
netllama@gmail.com
In reply to: Thalis Kalfigkopoulos (#1)
Re: Quick estimate of num of rows & table size

On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:

Hi all,

I read somewhere that the following query gives a quick estimate of the # of
rows in a table regardless of the table's size (which would matter in a
simple SELECT count(*)?):

SELECT (CASE WHEN reltuples > 0 THEN
pg_relation_size('mytable')/(8192*relpages/reltuples)
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

If relpages & reltuples are recorded accurately each time VACUUM is run,
wouldn't it be the same to just grab directly the value of reltuples like:

SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

In the same manner, are pg_relation_size('mytable') and 8192*relpages the
same?

I run both assumptions against a freshly VACUUMed table and they seem
correct.

This doesn't seem to work for me. I get an estimated row_count of 0
on a table that I know has millions of rows.

#3Thalis Kalfigkopoulos
tkalfigo@gmail.com
In reply to: Lonni J Friedman (#2)
Re: Quick estimate of num of rows & table size

On Mon, Nov 5, 2012 at 7:14 PM, Lonni J Friedman <netllama@gmail.com> wrote:

On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:

Hi all,

I read somewhere that the following query gives a quick estimate of the

# of

rows in a table regardless of the table's size (which would matter in a
simple SELECT count(*)?):

SELECT (CASE WHEN reltuples > 0 THEN
pg_relation_size('mytable')/(8192*relpages/reltuples)
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

If relpages & reltuples are recorded accurately each time VACUUM is run,
wouldn't it be the same to just grab directly the value of reltuples

like:

SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

In the same manner, are pg_relation_size('mytable') and 8192*relpages the
same?

I run both assumptions against a freshly VACUUMed table and they seem
correct.

This doesn't seem to work for me. I get an estimated row_count of 0
on a table that I know has millions of rows.

Which one doesn't work exactly? The larger query? Are you on a 9.x?

regards,
thalis k.

#4Lonni J Friedman
netllama@gmail.com
In reply to: Thalis Kalfigkopoulos (#3)
Re: Quick estimate of num of rows & table size

On Mon, Nov 5, 2012 at 3:56 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:

On Mon, Nov 5, 2012 at 7:14 PM, Lonni J Friedman <netllama@gmail.com> wrote:

On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:

Hi all,

I read somewhere that the following query gives a quick estimate of the
# of
rows in a table regardless of the table's size (which would matter in a
simple SELECT count(*)?):

SELECT (CASE WHEN reltuples > 0 THEN
pg_relation_size('mytable')/(8192*relpages/reltuples)
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

If relpages & reltuples are recorded accurately each time VACUUM is run,
wouldn't it be the same to just grab directly the value of reltuples
like:

SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

In the same manner, are pg_relation_size('mytable') and 8192*relpages
the
same?

I run both assumptions against a freshly VACUUMed table and they seem
correct.

This doesn't seem to work for me. I get an estimated row_count of 0
on a table that I know has millions of rows.

Which one doesn't work exactly? The larger query? Are you on a 9.x?

doh, sorry. The first/larger doesn't work. As it turns out the 2nd
actually does work well. I'm on 9.1.x.

#5Thalis Kalfigkopoulos
tkalfigo@gmail.com
In reply to: Lonni J Friedman (#4)
Re: Quick estimate of num of rows & table size

On Mon, Nov 5, 2012 at 9:04 PM, Lonni J Friedman <netllama@gmail.com> wrote:

On Mon, Nov 5, 2012 at 3:56 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:

On Mon, Nov 5, 2012 at 7:14 PM, Lonni J Friedman <netllama@gmail.com>

wrote:

On Mon, Nov 5, 2012 at 2:02 PM, Thalis Kalfigkopoulos
<tkalfigo@gmail.com> wrote:

Hi all,

I read somewhere that the following query gives a quick estimate of

the

# of
rows in a table regardless of the table's size (which would matter in

a

simple SELECT count(*)?):

SELECT (CASE WHEN reltuples > 0 THEN
pg_relation_size('mytable')/(8192*relpages/reltuples)
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

If relpages & reltuples are recorded accurately each time VACUUM is

run,

wouldn't it be the same to just grab directly the value of reltuples
like:

SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

In the same manner, are pg_relation_size('mytable') and 8192*relpages
the
same?

I run both assumptions against a freshly VACUUMed table and they seem
correct.

This doesn't seem to work for me. I get an estimated row_count of 0
on a table that I know has millions of rows.

Which one doesn't work exactly? The larger query? Are you on a 9.x?

doh, sorry. The first/larger doesn't work. As it turns out the 2nd
actually does work well. I'm on 9.1.x.

That's weird. I'm on a 9.1.5 as well and it works fine (!?)
Still the question remains: are they equivalent?

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thalis Kalfigkopoulos (#1)
Re: Quick estimate of num of rows & table size

Thalis Kalfigkopoulos <tkalfigo@gmail.com> writes:

I read somewhere that the following query gives a quick estimate of the #
of rows in a table regardless of the table's size (which would matter in a
simple SELECT count(*)?):

SELECT (CASE WHEN reltuples > 0 THEN
pg_relation_size('mytable')/(8192*relpages/reltuples)
ELSE 0
END)::bigint AS estimated_row_count
FROM pg_class
WHERE oid = 'mytable'::regclass;

This seems a bit dubious, as it's protecting against only one of two
possible zero-divide conditions, and ignoring the risk of integer
overflow of 8192*relpages. It also seems rather inconvenient to have to
specify 'mytable' twice. I'd try something like

select
case when relpages > 0 then
(pg_relation_size(oid)::float8 * reltuples / relpages / 8192)::bigint
else 0::bigint end
from pg_class
where oid = 'mytable'::regclass;

If relpages & reltuples are recorded accurately each time VACUUM is run,
wouldn't it be the same to just grab directly the value of reltuples like:
SELECT reltuples FROM pg_class WHERE oid='mytable'::regclass;

Uh, no. The whole point of the more complicated query is to scale the
reltuples/relpages tuple-density ratio up to the table's current
physical size, so that you get an estimate that is not too far off even
if vacuum hasn't been run lately. It's also worth noting that in recent
PG versions, reltuples and relpages are themselves only moving-average
estimates, and so your premise is faulty anyway: they are *not*
necessarily exact even immediately after a vacuum.

regards, tom lane