PL/PgSQL for counting all rows in all tables.

Started by David Fetterover 21 years ago10 messages
#1David Fetter
david@fetter.org

Folks,

I've noticed that when coming into an organization, I need to do some
counting on what's in all the tables in a db. This shortens that
process considerably, with the usual caveat that count(*) is a heavy
operation.

By the way, the 3 lines following "godawful hack" point to something
PL/PgSQL ought (imho) to be able to do, namely something like

EXECUTE INTO [ record | rowtype | type ] [sql TEXT string returning a single row];

Here 'tis: version 0.01...

CREATE TYPE table_count AS (table_name TEXT, num_rows INTEGER);

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count
AS '
DECLARE
the_count RECORD;
t_name RECORD;
r table_count%ROWTYPE;
BEGIN
FOR t_name IN
SELECT c.relname
FROM
pg_catalog.pg_class c
LEFT JOIN
pg_namespace n
ON
n.oid = c.relnamespace
WHERE
c.relkind = ''r''
AND
n.nspname = ''public''
ORDER BY 1
LOOP
-- The next 3 lines are a godawful hack. :P
FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.relname
LOOP
END LOOP;
r.table_name := t_name.relname;
r.num_rows := the_count.count;
RETURN NEXT r;
END LOOP;
RETURN;
END;
' LANGUAGE plpgsql;

COMMENT ON FUNCTION count_em_all () IS 'Spits out all tables in the public schema and the exact row counts for each.';

--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#2Greg Sabino Mullane
greg@turnstep.com
In reply to: David Fetter (#1)
Re: PL/PgSQL for counting all rows in all tables.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

My version:

ANALYZE;

SELECT n.nspname, relname, reltuples
FROM pg_class c, pg_namespace n
WHERE c.relnamespace=n.oid
AND relkind='r'
AND NOT n.nspname ~ '^pg_'
ORDER BY 1,2;

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200409282138

-----BEGIN PGP SIGNATURE-----

iD8DBQFBWhIivJuQZxSWSsgRAm3cAJwLySzR5mpOHDT06LN6vj9M5zyyOQCgt4j1
gWDPcXJoUZbTlZvxxrE7fcY=
=MOMs
-----END PGP SIGNATURE-----

#3Ian FREISLICH
if@hetzner.co.za
In reply to: Greg Sabino Mullane (#2)
Re: PL/PgSQL for counting all rows in all tables.

"Greg Sabino Mullane" wrote:

ANALYZE;

SELECT n.nspname, relname, reltuples
FROM pg_class c, pg_namespace n
WHERE c.relnamespace=n.oid
AND relkind='r'
AND NOT n.nspname ~ '^pg_'
ORDER BY 1,2;

Maybe this gem should be passed onto the pgadmin folks. When you
click on a table name in the interface it does what I can only
presume is a count(*) from relation, which takes forever on enormous
relations. It then claims this to be a row estimate anyway, so
they could probably drop the analyze.

Ian

--
Ian Freislich

#4Dave Page
dpage@vale-housing.co.uk
In reply to: Ian FREISLICH (#3)
Re: PL/PgSQL for counting all rows in all tables.

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ian FREISLICH
Sent: 05 October 2004 09:57
To: Greg Sabino Mullane
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables.

"Greg Sabino Mullane" wrote:

ANALYZE;

SELECT n.nspname, relname, reltuples
FROM pg_class c, pg_namespace n
WHERE c.relnamespace=n.oid
AND relkind='r'
AND NOT n.nspname ~ '^pg_'
ORDER BY 1,2;

Maybe this gem should be passed onto the pgadmin folks. When
you click on a table name in the interface it does what I can
only presume is a count(*) from relation, which takes forever
on enormous relations. It then claims this to be a row
estimate anyway, so they could probably drop the analyze.

The 'Rows (counted)' value is taken from a count(*), but only if the
'Rows (estimated)' value (which comes from pg_class.reltuples, as above,
but without the costly analyze) is less than the cut-off value set in
the options dialogue. So, if you never want to wait for the exact row
count, just set the appropriate option to zero.

Regards Dave

#5Robert Treat
xzilla@users.sourceforge.net
In reply to: Dave Page (#4)
Re: PL/PgSQL for counting all rows in all tables.

On Tue, 2004-10-05 at 05:09, Dave Page wrote:

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Ian FREISLICH
Sent: 05 October 2004 09:57
To: Greg Sabino Mullane
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables.

"Greg Sabino Mullane" wrote:

ANALYZE;

SELECT n.nspname, relname, reltuples
FROM pg_class c, pg_namespace n
WHERE c.relnamespace=n.oid
AND relkind='r'
AND NOT n.nspname ~ '^pg_'
ORDER BY 1,2;

Maybe this gem should be passed onto the pgadmin folks. When
you click on a table name in the interface it does what I can
only presume is a count(*) from relation, which takes forever
on enormous relations. It then claims this to be a row
estimate anyway, so they could probably drop the analyze.

The 'Rows (counted)' value is taken from a count(*), but only if the
'Rows (estimated)' value (which comes from pg_class.reltuples, as above,
but without the costly analyze) is less than the cut-off value set in
the options dialogue. So, if you never want to wait for the exact row
count, just set the appropriate option to zero.

How do you handle table growth that makes the reltuples value out of
whack since the last analyze? Seems unfortunate that people may not
realize that the numbers they are looking at are incorrect but I don't
see much way to avoid it.

Seems new tables would have that problem too since they would default to
1000... do you analyze after table creation?

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#6Dave Page
dpage@vale-housing.co.uk
In reply to: Robert Treat (#5)
Re: PL/PgSQL for counting all rows in all tables.

-----Original Message-----
From: Robert Treat [mailto:xzilla@users.sourceforge.net]
Sent: 11 October 2004 22:30
To: Dave Page
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables.

How do you handle table growth that makes the reltuples value
out of whack since the last analyze? Seems unfortunate that
people may not realize that the numbers they are looking at
are incorrect but I don't see much way to avoid it.

Right-click the table object and select 'Count' on the current versions.
Previously, iirc it showed the message 'Refresh table to count' in the
actual count field, so you did a right-click -> Refresh.

Seems new tables would have that problem too since they would
default to 1000... do you analyze after table creation?

No. 1000 is below the default threshold, so it'll get counted. If you
lower the threshold, it will say 'not counted' in the actual count
field.

Regards, Dave.

#7Robert Treat
xzilla@users.sourceforge.net
In reply to: Dave Page (#6)
Re: PL/PgSQL for counting all rows in all tables.

On Tuesday 12 October 2004 03:22, Dave Page wrote:

-----Original Message-----
From: Robert Treat [mailto:xzilla@users.sourceforge.net]
Sent: 11 October 2004 22:30
To: Dave Page
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables.

How do you handle table growth that makes the reltuples value
out of whack since the last analyze? Seems unfortunate that
people may not realize that the numbers they are looking at
are incorrect but I don't see much way to avoid it.

Right-click the table object and select 'Count' on the current versions.
Previously, iirc it showed the message 'Refresh table to count' in the
actual count field, so you did a right-click -> Refresh.

Maybe I didn't phrase that quite right. How would a user know that he needs to
do a real count? For example, if I have a table with est 1 million rows, and
I load another 1 million rows into it, wont pgadmin show me 1 million rows
until I run an analyze? Even if I run a manual count, wont it show 1 million
next time I come into the application, and that time I may not realize that
the table is off by 1 million rows so I take the estimated count at face
value.

BTW The reason I'm asking about this is we're trying to come up with a good
scheme for phppgadmin to show estimated counts without showing incorrect
numbers to users... or at least giving them a clue that the numbers might be
really off.

--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#8Rod Taylor
pg@rbt.ca
In reply to: Robert Treat (#7)
Re: PL/PgSQL for counting all rows in all tables.

BTW The reason I'm asking about this is we're trying to come up with a good
scheme for phppgadmin to show estimated counts without showing incorrect
numbers to users... or at least giving them a clue that the numbers might be
really off.

The vacuum daemon should know how far off the count might be. You can
display 100 +- 15%, where the 100 is collected by ANALYZE and the 15% is
determined based on the vacuum daemon estimation.

Of course, AVD needs to be implemented and the interface created (a
pg_count view?)...

#9Andreas Pflug
pgadmin@pse-consulting.de
In reply to: Robert Treat (#7)
Re: PL/PgSQL for counting all rows in all tables.

Robert Treat wrote:

Right-click the table object and select 'Count' on the current versions.
Previously, iirc it showed the message 'Refresh table to count' in the
actual count field, so you did a right-click -> Refresh.

Maybe I didn't phrase that quite right. How would a user know that he needs to
do a real count? For example, if I have a table with est 1 million rows, and
I load another 1 million rows into it, wont pgadmin show me 1 million rows
until I run an analyze? Even if I run a manual count, wont it show 1 million
next time I come into the application, and that time I may not realize that
the table is off by 1 million rows so I take the estimated count at face
value.

pgAdmin will either show the true row count from a SELECT COUNT(*), or
"not counted". The last rowcount from vacuum is named "estimated row
count", so it's clearly distinguishable which number is counted and
which is estimated.

Regards,
Andreas

#10Dave Page
dpage@vale-housing.co.uk
In reply to: Andreas Pflug (#9)
Re: PL/PgSQL for counting all rows in all tables.

-----Original Message-----
From: Robert Treat [mailto:xzilla@users.sourceforge.net]
Sent: 12 October 2004 21:21
To: Dave Page
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] PL/PgSQL for counting all rows in all tables.

Maybe I didn't phrase that quite right. How would a user know
that he needs to do a real count? For example, if I have a
table with est 1 million rows, and I load another 1 million
rows into it, wont pgadmin show me 1 million rows until I run
an analyze? Even if I run a manual count, wont it show 1
million next time I come into the application, and that time
I may not realize that the table is off by 1 million rows so
I take the estimated count at face value.

Yeah, that's exactly what can happen. If it makes you feel any better
about doing it in phppgadmin, then we never got any complaints about it!

We simply rely on the fact that it's labelled as an estimate. I can't
see that there's any other way around it that doesn't require running a
potentially very expensive ANALYZE to update the stats.

Regards, Dave