Tupple statistics function

Started by Tatsuo Ishiiover 24 years ago10 messages
#1Tatsuo Ishii
t-ishii@sra.co.jp

Hi,

I have written a small function that show how many tuples are dead
etc. in a specified table. Example output is:

test=# select pgstattuple('tellers');
NOTICE: physical length: 0.02MB live tuples: 200 (0.01MB, 58.59%) dead tuples: 100 (0.00MB, 29.30%) overhead: 12.11%
pgstattuple
-------------
29.296875
(1 row)

Shall I add this function into contrib directory?
--
Tatsuo Ishii

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tatsuo Ishii (#1)
Re: Tupple statistics function

Hi,

I have written a small function that show how many tuples are dead
etc. in a specified table. Example output is:

test=# select pgstattuple('tellers');
NOTICE: physical length: 0.02MB live tuples: 200 (0.01MB, 58.59%) dead tuples: 100 (0.00MB, 29.30%) overhead: 12.11%
pgstattuple
-------------
29.296875
(1 row)

Shall I add this function into contrib directory?

I have been wanting this for a long time. In fact, I wanted it linked
to VACUUM so you could vacuum a table only if it had >X% dead tuples.
Seems we can find a place for this in the existing commands. Not sure
where, though. Ideas?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#1)
Re: Tupple statistics function

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I have written a small function that show how many tuples are dead
etc. in a specified table.

Dead according to whose viewpoint? Under MVCC this seems to be
in the eye of the beholder...

Shall I add this function into contrib directory?

No real objection, but you should carefully document exactly what
the results mean.

BTW, I'd suggest accounting for free, reusable space separately from
"overhead".

regards, tom lane

#4Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#3)
Re: Tupple statistics function

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I have written a small function that show how many tuples are dead
etc. in a specified table.

Dead according to whose viewpoint? Under MVCC this seems to be
in the eye of the beholder...

You can know if the tuple is visible to other backends, or at least take
a good guess like VACUUM does. Maybe he has coded that in there.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: Tupple statistics function

Bruce Momjian <pgman@candle.pha.pa.us> writes:

... Maybe he has coded that in there.

Maybe so, but he didn't say. That's why I was asking for exact
documentation.

regards, tom lane

#6Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#3)
Re: Tupple statistics function

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

I have written a small function that show how many tuples are dead
etc. in a specified table.

Dead according to whose viewpoint? Under MVCC this seems to be
in the eye of the beholder...

Shall I add this function into contrib directory?

No real objection, but you should carefully document exactly what
the results mean.

BTW, I'd suggest accounting for free, reusable space separately from
"overhead".

regards, tom lane

Ok, here are the source code...

/*
* $Header: /home/t-ishii/repository/pgstattuple/pgstattuple.c,v 1.2 2001/08/30 06:21:48 t-ishii Exp $
*
* Copyright (c) 2001 Tatsuo Ishii
*
* Permission to use, copy, modify, and distribute this software and
* its documentation for any purpose, without fee, and without a
* written agreement is hereby granted, provided that the above
* copyright notice and this paragraph and the following two
* paragraphs appear in all copies.
*/

#include "postgres.h"
#include "fmgr.h"
#include "access/heapam.h"
#include "access/transam.h"

PG_FUNCTION_INFO_V1(pgstattuple);

extern Datum pgstattuple(PG_FUNCTION_ARGS);

/* ----------
* pgstattuple:
* returns the percentage of dead tuples
*
* C FUNCTION definition
* pgstattuple(NAME) returns FLOAT8
* ----------
*/
Datum
pgstattuple(PG_FUNCTION_ARGS)
{
Name p = PG_GETARG_NAME(0);

Relation rel;
HeapScanDesc scan;
HeapTuple tuple;
BlockNumber nblocks;
double table_len;
uint64 tuple_len = 0;
uint64 dead_tuple_len = 0;
uint32 tuple_count = 0;
uint32 dead_tuple_count = 0;
double tuple_percent;
double dead_tuple_percent;

rel = heap_openr(NameStr(*p), NoLock);
nblocks = RelationGetNumberOfBlocks(rel);
scan = heap_beginscan(rel, false, SnapshotAny, 0, NULL);

while ((tuple = heap_getnext(scan,0)))
{
if (HeapTupleSatisfiesNow(tuple->t_data))
{
tuple_len += tuple->t_len;
tuple_count++;
}
else
{
dead_tuple_len += tuple->t_len;
dead_tuple_count++;
}
}
heap_endscan(scan);
heap_close(rel, NoLock);

table_len = (double)nblocks*BLCKSZ;

if (nblocks == 0)
{
tuple_percent = 0.0;
dead_tuple_percent = 0.0;
}
else
{
tuple_percent = (double)tuple_len*100.0/table_len;
dead_tuple_percent = (double)dead_tuple_len*100.0/table_len;
}

elog(NOTICE,"physical length: %.2fMB live tuples: %u (%.2fMB, %.2f%%) dead tuples: %u (%.2fMB, %.2f%%) overhead: %.2f%%",

table_len/1024/1024,

tuple_count,
(double)tuple_len/1024/1024,
tuple_percent,

dead_tuple_count,
(double)dead_tuple_len/1024/1024,
dead_tuple_percent,

(nblocks == 0)?0.0: 100.0 - tuple_percent - dead_tuple_percent);

PG_RETURN_FLOAT8(dead_tuple_percent);
}

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#5)
Re: Tupple statistics function

Bruce Momjian <pgman@candle.pha.pa.us> writes:

... Maybe he has coded that in there.

Maybe so, but he didn't say. That's why I was asking for exact
documentation.

As you can see from the source code, it just use
HeapTupleSatisfiesNow(). I wrote this function for the admin
use. He/she should know if active transactions are touching the table,
I think. But more precise guess might be interesting for some cases.
--
Tatsuo Ishii

#8Thurstan R. McDougle
trmcdougle@my-deja.com
In reply to: Bruce Momjian (#2)
Re: [HACKERS] Tupple statistics function

Bruce Momjian wrote:

Hi,

I have written a small function that show how many tuples are dead
etc. in a specified table. Example output is:

test=# select pgstattuple('tellers');
NOTICE: physical length: 0.02MB live tuples: 200 (0.01MB, 58.59%) dead tuples: 100 (0.00MB, 29.30%) overhead: 12.11%
pgstattuple
-------------
29.296875
(1 row)

Shall I add this function into contrib directory?

I have been wanting this for a long time. In fact, I wanted it linked
to VACUUM so you could vacuum a table only if it had >X% dead tuples.
Seems we can find a place for this in the existing commands. Not sure
where, though. Ideas?

If you mean the reporting of stats how about EXPLAIN VACUMN (with other
info as well?) or EXPLAIN [VERBOSE] TABLE (see below).

In general EXPLAIN could be expanded to be a command to return an
explanation and stats of many items.
There could also be EXPLAIN that only shows fields and EXPLAIN VERBOSE
that also shows more detail such as stats (as that tends to take more
time to collect).

Examples:
EXPLAIN TABLE ttt show table fields and indexes/rules
VERBOSE:stats (inc tuple stats)
EXPLAIN INDEX iii show index description and stats
EXPLAIN USER/GROUP uuu show user name (and the users groups)
VERBOSE:list GRANTs
EXPLAIN FUNCTION/AGGREGATE/OPERATOR fff
show arguments of user functions
VERBOSE:show source code

These might be useful, easier to remember, unchanging between versions
alternatives to the SELECT * from pg_ttt methods used at present.
It it probably worth checking the security options for these (not every
user should have function source code access in some business apps).

--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 853-3000
+  If your life is a hard drive,     |  830 Blythe Avenue
+  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
This is the identity that I use for NewsGroups. Email to
this will just sit there. If you wish to email me replace
the domain with knightpiesold . co . uk (no spaces).

#9Noname
merlyn@stonehenge.com
In reply to: Thurstan R. McDougle (#8)
Re: [HACKERS] Tupple statistics function

"Thurstan" == Thurstan R McDougle <trmcdougle@my-deja.com> writes:

Thurstan> In general EXPLAIN could be expanded to be a command to
Thurstan> return an explanation and stats of many items. There could
Thurstan> also be EXPLAIN that only shows fields and EXPLAIN VERBOSE
Thurstan> that also shows more detail such as stats (as that tends to
Thurstan> take more time to collect).

It would also be interesting to take everything that psql does and put
each of them into a view so that it could be queried directly.
There's no reason that the magic should reside in client-side code.
It'd also make psql much simpler. :) I mean, why is "\d" anything
other than "select * from pg_table_view;", with all the logic to
compute that table in the view code?

Unless having a view on the server is expensive. Is a server view
expensive if nobody calls it? I mean, it's not maintained like an
index, is it?

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/&gt;
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Noname (#9)
Re: [HACKERS] Tupple statistics function

"Thurstan" == Thurstan R McDougle <trmcdougle@my-deja.com> writes:

Thurstan> In general EXPLAIN could be expanded to be a command to
Thurstan> return an explanation and stats of many items. There could
Thurstan> also be EXPLAIN that only shows fields and EXPLAIN VERBOSE
Thurstan> that also shows more detail such as stats (as that tends to
Thurstan> take more time to collect).

It would also be interesting to take everything that psql does and put
each of them into a view so that it could be queried directly.
There's no reason that the magic should reside in client-side code.
It'd also make psql much simpler. :) I mean, why is "\d" anything
other than "select * from pg_table_view;", with all the logic to
compute that table in the view code?

Unless having a view on the server is expensive. Is a server view
expensive if nobody calls it? I mean, it's not maintained like an
index, is it?

Added to TODO:

* Move psql backslash information into views

Makes sense.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026