pgstattuple extension for indexes
This is an extension of pgstattuple to query information from indexes.
It supports btree, hash and gist. Gin is not supported.
It scans only index pages and does not read corresponding heap tuples.
Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.
Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.
Suggestions welcome.
----
$ pgbench -i
$ pgbench -n -t 100 -c 10
# select * from pgstattuple('accounts_pkey');
NOTICE: 0.36% fragmented
HINT: continuous=273, forward=1, backward=0
-[ RECORD 1 ]------+--------
table_len | 2260992
tuple_count | 100996 -- 996 tuples are dead practically,
tuple_len | 1615936 but no LP_DELETE yet.
tuple_percent | 71.47
dead_tuple_count | 4
dead_tuple_len | 64 -- 64 tuples are marked as LP_DELETE.
dead_tuple_percent | 0
free_space | 208188
free_percent | 9.21
----
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Attachments:
pgstattuple.patchapplication/octet-stream; name=pgstattuple.patchDownload
diff -cpr pgsql-orig/contrib/pgstattuple/README.pgstattuple pgsql/contrib/pgstattuple/README.pgstattuple
*** pgsql-orig/contrib/pgstattuple/README.pgstattuple Tue Feb 28 01:09:49 2006
--- pgsql/contrib/pgstattuple/README.pgstattuple Wed Jun 28 15:08:07 2006
*************** pgstattuple README 2002/08/29 Tatsuo I
*** 2,9 ****
1. What is pgstattuple?
! pgstattuple returns the table length, percentage of the "dead"
! tuples of a table and other info. This may help users to determine
whether vacuum is necessary or not. Here is an example session:
test=# \x
--- 2,9 ----
1. What is pgstattuple?
! pgstattuple returns the relation length, percentage of the "dead"
! tuples of a relation and other info. This may help users to determine
whether vacuum is necessary or not. Here is an example session:
test=# \x
*************** free_percent | 1.95
*** 23,29 ****
Here are explanations for each column:
! table_len -- physical table length in bytes
tuple_count -- number of live tuples
tuple_len -- total tuples length in bytes
tuple_percent -- live tuples in %
--- 23,29 ----
Here are explanations for each column:
! table_len -- physical relation length in bytes
tuple_count -- number of live tuples
tuple_len -- total tuples length in bytes
tuple_percent -- live tuples in %
*************** free_percent -- free space in %
*** 40,46 ****
3. Using pgstattuple
! pgstattuple may be called as a table function and is
defined as follows:
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
--- 40,46 ----
3. Using pgstattuple
! pgstattuple may be called as a relation function and is
defined as follows:
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
*************** free_percent -- free space in %
*** 51,64 ****
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;
! The argument is the table name (optionally it may be qualified)
! or the OID of the table. Note that pgstattuple only returns
one row.
4. Notes
! pgstattuple acquires only a read lock on the table. So concurrent
update may affect the result.
pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.
--- 51,70 ----
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;
! The argument is the relation name (optionally it may be qualified)
! or the OID of the relation. Note that pgstattuple only returns
one row.
4. Notes
! pgstattuple acquires only a read lock on the relation. So concurrent
update may affect the result.
pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.
+
+ 5. History
+
+ 2006/06/28
+
+ Extended to work against indexes.
diff -cpr pgsql-orig/contrib/pgstattuple/README.pgstattuple.euc_jp pgsql/contrib/pgstattuple/README.pgstattuple.euc_jp
*** pgsql-orig/contrib/pgstattuple/README.pgstattuple.euc_jp Tue Feb 28 01:09:49 2006
--- pgsql/contrib/pgstattuple/README.pgstattuple.euc_jp Wed Jun 28 15:08:08 2006
*************** pgstattuple README 2002/08/22 ��������
*** 4,11 ****
1. pgstattuple����
! pgstattuple����UPDATE��DELETE����������������������������������������
! ������������������������������������������������������������������
��������������������������������������������������vacuum����������
������������������������������������������������������������������
������������
--- 4,11 ----
1. pgstattuple����
! pgstattuple����UPDATE��DELETE��������������������������������������������
! ����������������������������������������������������������������������
��������������������������������������������������vacuum����������
������������������������������������������������������������������
������������
*************** free_percent | 1.95
*** 26,32 ****
������������������
! table_len -- ������������������������(������)
tuple_count -- ��������
tuple_len -- ��������������(������)
tuple_percent -- ��������������table_len��������tuple_len��������
--- 26,32 ----
������������������
! table_len -- ����������������������������(������)
tuple_count -- ��������
tuple_len -- ��������������(������)
tuple_percent -- ��������������table_len��������tuple_len��������
*************** free_percent -- ������������������table
*** 56,67 ****
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;
! ��������: ����������
������������pgstattuple_type��������
! pgstattuple������������AccessShareLock������������������
! pgstattuple ������������������������������������������������������
��������������������������������
pgstattuple������������������������������������
--- 56,67 ----
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;
! ��������: ��������������
������������pgstattuple_type��������
! pgstattuple����������������AccessShareLock������������������
! pgstattuple ����������������������������������������������������������
��������������������������������
pgstattuple������������������������������������
*************** free_percent -- ������������������table
*** 74,79 ****
--- 74,83 ----
��������������������
5. ��������
+
+ 2006/06/28
+
+ ������������������������������������������
2002/09/04
diff -cpr pgsql-orig/contrib/pgstattuple/pgstattuple.c pgsql/contrib/pgstattuple/pgstattuple.c
*** pgsql-orig/contrib/pgstattuple/pgstattuple.c Fri Jun 2 07:36:35 2006
--- pgsql/contrib/pgstattuple/pgstattuple.c Tue Jun 27 13:17:13 2006
***************
*** 27,32 ****
--- 27,35 ----
#include "fmgr.h"
#include "funcapi.h"
#include "access/heapam.h"
+ #include "access/nbtree.h"
+ #include "access/gist_private.h"
+ #include "access/hash.h"
#include "access/transam.h"
#include "catalog/namespace.h"
#include "utils/builtins.h"
*************** PG_FUNCTION_INFO_V1(pgstattuplebyid);
*** 40,47 ****
extern Datum pgstattuple(PG_FUNCTION_ARGS);
extern Datum pgstattuplebyid(PG_FUNCTION_ARGS);
! static Datum pgstattuple_real(Relation rel, FunctionCallInfo fcinfo);
/* ----------
* pgstattuple:
--- 43,166 ----
extern Datum pgstattuple(PG_FUNCTION_ARGS);
extern Datum pgstattuplebyid(PG_FUNCTION_ARGS);
! /*
! * struct pgstattuple_type
! *
! * tuple_percent, dead_tuple_percent and free_percent are computable,
! * so not defined here.
! */
! typedef struct pgstattuple_type
! {
! uint64 table_len;
! uint64 tuple_count;
! uint64 tuple_len;
! uint64 dead_tuple_count;
! uint64 dead_tuple_len;
! uint64 free_space; /* free/reusable space in bytes */
! } pgstattuple_type;
!
! /*
! * struct pgstat_btree_type
! */
! typedef struct pgstat_btree_type
! {
! pgstattuple_type base; /* inherits pgstattuple_type */
!
! uint64 continuous;
! uint64 forward;
! uint64 backward;
! } pgstat_btree_type;
!
! typedef void (*pgstat_page)(pgstattuple_type *, Relation, BlockNumber);
!
! static Datum build_pgstattuple_type(pgstattuple_type *stat,
! FunctionCallInfo fcinfo);
! static Datum pgstat_relation(Relation rel, FunctionCallInfo fcinfo);
! static Datum pgstat_heap(Relation rel, FunctionCallInfo fcinfo);
! static Datum pgstat_btree(Relation rel, FunctionCallInfo fcinfo);
! static void pgstat_btree_page(pgstattuple_type *stat,
! Relation rel, BlockNumber blkno);
! static Datum pgstat_hash(Relation rel, FunctionCallInfo fcinfo);
! static void pgstat_hash_page(pgstattuple_type *stat,
! Relation rel, BlockNumber blkno);
! static Datum pgstat_gist(Relation rel, FunctionCallInfo fcinfo);
! static void pgstat_gist_page(pgstattuple_type *stat,
! Relation rel, BlockNumber blkno);
! static Datum pgstat_index(pgstattuple_type *stat,
! Relation rel, BlockNumber start,
! pgstat_page pagefn, FunctionCallInfo fcinfo);
! static void pgstat_index_page(pgstattuple_type *stat, Page page,
! OffsetNumber minoff, OffsetNumber maxoff);
!
! /*
! * build_pgstattuple_type -- build a pgstattuple_type tuple
! */
! static Datum
! build_pgstattuple_type(pgstattuple_type *stat, FunctionCallInfo fcinfo)
! {
! #define NCOLUMNS 9
! #define NCHARS 32
!
! HeapTuple tuple;
! char *values[NCOLUMNS];
! char values_buf[NCOLUMNS][NCHARS];
! int i;
! double tuple_percent;
! double dead_tuple_percent;
! double free_percent; /* free/reusable space in % */
! TupleDesc tupdesc;
! AttInMetadata *attinmeta;
!
! /* Build a tuple descriptor for our result type */
! if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
! elog(ERROR, "return type must be a row type");
!
! /* make sure we have a persistent copy of the tupdesc */
! tupdesc = CreateTupleDescCopy(tupdesc);
!
! /*
! * Generate attribute metadata needed later to produce tuples from raw C
! * strings
! */
! attinmeta = TupleDescGetAttInMetadata(tupdesc);
!
! if (stat->table_len == 0)
! {
! tuple_percent = 0.0;
! dead_tuple_percent = 0.0;
! free_percent = 0.0;
! }
! else
! {
! tuple_percent = 100.0 * stat->tuple_len / stat->table_len;
! dead_tuple_percent = 100.0 * stat->dead_tuple_len / stat->table_len;
! free_percent = 100.0 * stat->free_space / stat->table_len;
! }
!
! /*
! * Prepare a values array for constructing the tuple. This should be an
! * array of C strings which will be processed later by the appropriate
! * "in" functions.
! */
! for (i = 0; i < NCOLUMNS; i++)
! values[i] = values_buf[i];
! i = 0;
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->table_len);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_count);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_len);
! snprintf(values[i++], NCHARS, "%.2f", tuple_percent);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_count);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_len);
! snprintf(values[i++], NCHARS, "%.2f", dead_tuple_percent);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->free_space);
! snprintf(values[i++], NCHARS, "%.2f", free_percent);
+ /* build a tuple */
+ tuple = BuildTupleFromCStrings(attinmeta, values);
+
+ /* make the tuple into a datum */
+ return HeapTupleGetDatum(tuple);
+ }
/* ----------
* pgstattuple:
*************** static Datum pgstattuple_real(Relation r
*** 53,76 ****
* ----------
*/
- #define NCOLUMNS 9
- #define NCHARS 32
-
Datum
pgstattuple(PG_FUNCTION_ARGS)
{
text *relname = PG_GETARG_TEXT_P(0);
RangeVar *relrv;
Relation rel;
- Datum result;
/* open relation */
relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
! rel = heap_openrv(relrv, AccessShareLock);
!
! result = pgstattuple_real(rel, fcinfo);
! PG_RETURN_DATUM(result);
}
Datum
--- 172,189 ----
* ----------
*/
Datum
pgstattuple(PG_FUNCTION_ARGS)
{
text *relname = PG_GETARG_TEXT_P(0);
RangeVar *relrv;
Relation rel;
/* open relation */
relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
! rel = relation_openrv(relrv, AccessShareLock);
! PG_RETURN_DATUM(pgstat_relation(rel, fcinfo));
}
Datum
*************** pgstattuplebyid(PG_FUNCTION_ARGS)
*** 78,134 ****
{
Oid relid = PG_GETARG_OID(0);
Relation rel;
- Datum result;
/* open relation */
! rel = heap_open(relid, AccessShareLock);
!
! result = pgstattuple_real(rel, fcinfo);
! PG_RETURN_DATUM(result);
}
/*
! * pgstattuple_real
! *
! * The real work occurs here
*/
static Datum
! pgstattuple_real(Relation rel, FunctionCallInfo fcinfo)
{
! HeapScanDesc scan;
! HeapTuple tuple;
! BlockNumber nblocks;
! BlockNumber block = 0; /* next block to count free space in */
! BlockNumber tupblock;
! Buffer buffer;
! uint64 table_len;
! uint64 tuple_len = 0;
! uint64 dead_tuple_len = 0;
! uint64 tuple_count = 0;
! uint64 dead_tuple_count = 0;
! double tuple_percent;
! double dead_tuple_percent;
! uint64 free_space = 0; /* free/reusable space in bytes */
! double free_percent; /* free/reusable space in % */
! TupleDesc tupdesc;
! AttInMetadata *attinmeta;
! char **values;
! int i;
! Datum result;
! /* Build a tuple descriptor for our result type */
! if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
! elog(ERROR, "return type must be a row type");
! /* make sure we have a persistent copy of the tupdesc */
! tupdesc = CreateTupleDescCopy(tupdesc);
! /*
! * Generate attribute metadata needed later to produce tuples from raw C
! * strings
! */
! attinmeta = TupleDescGetAttInMetadata(tupdesc);
scan = heap_beginscan(rel, SnapshotAny, 0, NULL);
--- 191,266 ----
{
Oid relid = PG_GETARG_OID(0);
Relation rel;
/* open relation */
! rel = relation_open(relid, AccessShareLock);
! PG_RETURN_DATUM(pgstat_relation(rel, fcinfo));
}
/*
! * pgstat_relation
*/
static Datum
! pgstat_relation(Relation rel, FunctionCallInfo fcinfo)
{
! const char *err;
! switch(rel->rd_rel->relkind)
! {
! case RELKIND_RELATION:
! case RELKIND_TOASTVALUE:
! case RELKIND_UNCATALOGED:
! case RELKIND_SEQUENCE:
! return pgstat_heap(rel, fcinfo);
! case RELKIND_INDEX:
! switch(rel->rd_rel->relam)
! {
! case BTREE_AM_OID:
! return pgstat_btree(rel, fcinfo);
! case HASH_AM_OID:
! return pgstat_hash(rel, fcinfo);
! case GIST_AM_OID:
! return pgstat_gist(rel, fcinfo);
! case GIN_AM_OID:
! err = "gin index";
! break;
! default:
! err = "unknown index";
! break;
! }
! break;
! case RELKIND_VIEW:
! err = "view";
! break;
! case RELKIND_COMPOSITE_TYPE:
! err = "composite type";
! break;
! default:
! err = "unknown";
! break;
! }
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("\"%s\" (%s) is not supported",
! RelationGetRelationName(rel), err)));
! return 0; /* should not happen */
! }
! /*
! * pgstat_heap -- returns live/dead tuples info in a heap
! */
! static Datum
! pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
! {
! HeapScanDesc scan;
! HeapTuple tuple;
! BlockNumber nblocks;
! BlockNumber block = 0; /* next block to count free space in */
! BlockNumber tupblock;
! Buffer buffer;
! pgstattuple_type stat = { 0 };
scan = heap_beginscan(rel, SnapshotAny, 0, NULL);
*************** pgstattuple_real(Relation rel, FunctionC
*** 142,154 ****
if (HeapTupleSatisfiesNow(tuple->t_data, scan->rs_cbuf))
{
! tuple_len += tuple->t_len;
! tuple_count++;
}
else
{
! dead_tuple_len += tuple->t_len;
! dead_tuple_count++;
}
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
--- 274,286 ----
if (HeapTupleSatisfiesNow(tuple->t_data, scan->rs_cbuf))
{
! stat.tuple_len += tuple->t_len;
! stat.tuple_count++;
}
else
{
! stat.dead_tuple_len += tuple->t_len;
! stat.dead_tuple_count++;
}
LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK);
*************** pgstattuple_real(Relation rel, FunctionC
*** 165,171 ****
{
buffer = ReadBuffer(rel, block);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
! free_space += PageGetFreeSpace((Page) BufferGetPage(buffer));
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
ReleaseBuffer(buffer);
block++;
--- 297,303 ----
{
buffer = ReadBuffer(rel, block);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
! stat.free_space += PageGetFreeSpace((Page) BufferGetPage(buffer));
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
ReleaseBuffer(buffer);
block++;
*************** pgstattuple_real(Relation rel, FunctionC
*** 176,232 ****
while (block < nblocks)
{
buffer = ReadBuffer(rel, block);
! free_space += PageGetFreeSpace((Page) BufferGetPage(buffer));
ReleaseBuffer(buffer);
block++;
}
! heap_close(rel, AccessShareLock);
! table_len = (uint64) nblocks *BLCKSZ;
! if (nblocks == 0)
{
! tuple_percent = 0.0;
! dead_tuple_percent = 0.0;
! free_percent = 0.0;
}
else
{
! tuple_percent = (double) tuple_len *100.0 / table_len;
! dead_tuple_percent = (double) dead_tuple_len *100.0 / table_len;
! free_percent = (double) free_space *100.0 / table_len;
}
! /*
! * Prepare a values array for constructing the tuple. This should be an
! * array of C strings which will be processed later by the appropriate
! * "in" functions.
! */
! values = (char **) palloc(NCOLUMNS * sizeof(char *));
! for (i = 0; i < NCOLUMNS; i++)
! values[i] = (char *) palloc(NCHARS * sizeof(char));
! i = 0;
! snprintf(values[i++], NCHARS, INT64_FORMAT, table_len);
! snprintf(values[i++], NCHARS, INT64_FORMAT, tuple_count);
! snprintf(values[i++], NCHARS, INT64_FORMAT, tuple_len);
! snprintf(values[i++], NCHARS, "%.2f", tuple_percent);
! snprintf(values[i++], NCHARS, INT64_FORMAT, dead_tuple_count);
! snprintf(values[i++], NCHARS, INT64_FORMAT, dead_tuple_len);
! snprintf(values[i++], NCHARS, "%.2f", dead_tuple_percent);
! snprintf(values[i++], NCHARS, INT64_FORMAT, free_space);
! snprintf(values[i++], NCHARS, "%.2f", free_percent);
! /* build a tuple */
! tuple = BuildTupleFromCStrings(attinmeta, values);
! /* make the tuple into a datum */
! result = HeapTupleGetDatum(tuple);
! /* Clean up */
! for (i = 0; i < NCOLUMNS; i++)
! pfree(values[i]);
! pfree(values);
! return (result);
}
--- 308,547 ----
while (block < nblocks)
{
buffer = ReadBuffer(rel, block);
! stat.free_space += PageGetFreeSpace((Page) BufferGetPage(buffer));
ReleaseBuffer(buffer);
block++;
}
! relation_close(rel, AccessShareLock);
!
! stat.table_len = (uint64) nblocks * BLCKSZ;
!
! return build_pgstattuple_type(&stat, fcinfo);
! }
!
! /*
! * pgstat_btree -- returns live/dead tuples info in a btree index
! */
! static Datum
! pgstat_btree(Relation rel, FunctionCallInfo fcinfo)
! {
! pgstat_btree_type stat = { { 0 } };
! Datum datum;
!
! datum = pgstat_index((pgstattuple_type *) &stat, rel,
! BTREE_METAPAGE + 1, pgstat_btree_page, fcinfo);
!
! ereport(NOTICE,
! (errmsg("%.2f%% fragmented",
! 100.0 * (stat.forward + stat.backward) /
! (stat.continuous + stat.forward + stat.backward)),
! errhint("continuous=%llu, forward=%llu, backward=%llu",
! stat.continuous, stat.forward, stat.backward)));
!
! return datum;
! }
!
! /*
! * pgstat_btree_page
! */
! static void
! pgstat_btree_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
! {
! Buffer buf;
! Page page;
! pgstat_btree_type *btstat = (pgstat_btree_type *)stat;
!
! buf = ReadBuffer(rel, blkno);
! LockBuffer(buf, BT_READ);
! page = BufferGetPage(buf);
!
! /* Page is valid, see what to do with it */
! if (PageIsNew(page))
! {
! /* fully empty page */
! stat->free_space += BLCKSZ;
! }
! else
! {
! BTPageOpaque opaque;
! opaque = (BTPageOpaque) PageGetSpecialPointer(page);
! if (opaque->btpo_flags & (BTP_DELETED | BTP_HALF_DEAD))
! {
! /* recyclable page */
! stat->free_space += BLCKSZ;
! }
! else if (P_ISLEAF(opaque))
! {
! /* check fragmentation */
! if (P_RIGHTMOST(opaque))
! btstat->continuous++;
! else if (opaque->btpo_next < blkno)
! btstat->backward++;
! else if (opaque->btpo_next > blkno + 1)
! btstat->forward++;
! else
! btstat->continuous++;
!
! pgstat_index_page(stat, page, P_FIRSTDATAKEY(opaque),
! PageGetMaxOffsetNumber(page));
! }
! else
! {
! /* root or node */
! }
! }
! _bt_relbuf(rel, buf);
! }
! /*
! * pgstat_hash -- returns live/dead tuples info in a hash index
! */
! static Datum
! pgstat_hash(Relation rel, FunctionCallInfo fcinfo)
! {
! pgstattuple_type stat = { 0 };
! return pgstat_index(&stat, rel, HASH_METAPAGE + 1, pgstat_hash_page, fcinfo);
! }
!
! /*
! * pgstat_hash_page
! */
! static void
! pgstat_hash_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
! {
! Buffer buf;
! Page page;
!
! _hash_getlock(rel, blkno, HASH_SHARE);
! buf = _hash_getbuf(rel, blkno, HASH_READ);
! page = BufferGetPage(buf);
!
! if (PageGetSpecialSize(page) == MAXALIGN(sizeof(HashPageOpaqueData)))
{
! HashPageOpaque opaque;
! opaque = (HashPageOpaque) PageGetSpecialPointer(page);
! switch (opaque->hasho_flag)
! {
! case LH_UNUSED_PAGE:
! stat->free_space += BLCKSZ;
! break;
! case LH_BUCKET_PAGE:
! case LH_OVERFLOW_PAGE:
! pgstat_index_page(stat, page, FirstOffsetNumber,
! PageGetMaxOffsetNumber(page));
! break;
! case LH_BITMAP_PAGE:
! case LH_META_PAGE:
! default:
! break;
! }
}
else
{
! /* maybe corrupted */
}
! _hash_relbuf(rel, buf);
! _hash_droplock(rel, blkno, HASH_SHARE);
! }
! /*
! * pgstat_gist -- returns live/dead tuples info in a gist index
! */
! static Datum
! pgstat_gist(Relation rel, FunctionCallInfo fcinfo)
! {
! pgstattuple_type stat = { 0 };
! return pgstat_index(&stat, rel, GIST_ROOT_BLKNO + 1, pgstat_gist_page, fcinfo);
! }
! /*
! * pgstat_gist_page
! */
! static void
! pgstat_gist_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
! {
! Buffer buf;
! Page page;
! buf = ReadBuffer(rel, blkno);
! LockBuffer(buf, GIST_SHARE);
! gistcheckpage(rel, buf);
! page = BufferGetPage(buf);
! if (GistPageIsLeaf(page))
! {
! pgstat_index_page(stat, page, FirstOffsetNumber,
! PageGetMaxOffsetNumber(page));
! }
! else
! {
! /* root or node */
! }
!
! UnlockReleaseBuffer(buf);
! }
!
! /*
! * pgstat_index -- returns live/dead tuples info in a generic index
! */
! static Datum
! pgstat_index(pgstattuple_type *stat, Relation rel, BlockNumber start,
! pgstat_page pagefn, FunctionCallInfo fcinfo)
! {
! BlockNumber nblocks;
! BlockNumber blkno;
!
! blkno = start;
! for (;;)
! {
! /* Get the current relation length */
! LockRelationForExtension(rel, ExclusiveLock);
! nblocks = RelationGetNumberOfBlocks(rel);
! UnlockRelationForExtension(rel, ExclusiveLock);
!
! /* Quit if we've scanned the whole relation */
! if (blkno >= nblocks)
! {
! stat->table_len = (uint64) nblocks * BLCKSZ;
! break;
! }
!
! for (; blkno < nblocks; blkno++)
! pagefn(stat, rel, blkno);
! }
!
! relation_close(rel, AccessShareLock);
!
! return build_pgstattuple_type(stat, fcinfo);
! }
!
! /*
! * pgstat_index_page -- for generic index page
! */
! static void
! pgstat_index_page(pgstattuple_type *stat, Page page,
! OffsetNumber minoff, OffsetNumber maxoff)
! {
! OffsetNumber i;
!
! stat->free_space += PageGetFreeSpace(page);
!
! for (i = minoff; i <= maxoff; i = OffsetNumberNext(i))
! {
! ItemId itemid = PageGetItemId(page, i);
!
! if (ItemIdDeleted(itemid))
! {
! stat->dead_tuple_count++;
! stat->dead_tuple_len += ItemIdGetLength(itemid);
! }
! else
! {
! stat->tuple_count++;
! stat->tuple_len += ItemIdGetLength(itemid);
! }
! }
}
Patch applied. Thanks.
---------------------------------------------------------------------------
ITAGAKI Takahiro wrote:
This is an extension of pgstattuple to query information from indexes.
It supports btree, hash and gist. Gin is not supported.
It scans only index pages and does not read corresponding heap tuples.
Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.Suggestions welcome.
----
$ pgbench -i
$ pgbench -n -t 100 -c 10
# select * from pgstattuple('accounts_pkey');
NOTICE: 0.36% fragmented
HINT: continuous=273, forward=1, backward=0
-[ RECORD 1 ]------+--------
table_len | 2260992
tuple_count | 100996 -- 996 tuples are dead practically,
tuple_len | 1615936 but no LP_DELETE yet.
tuple_percent | 71.47
dead_tuple_count | 4
dead_tuple_len | 64 -- 64 tuples are marked as LP_DELETE.
dead_tuple_percent | 0
free_space | 208188
free_percent | 9.21
----Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
[ Attachment, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Patch applied. Thanks.
Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.
This was done in an entirely unacceptable way, to wit
ereport(NOTICE,
(errmsg("%.2f%% fragmented",
100.0 * (stat.forward + stat.backward) /
(stat.continuous + stat.forward + stat.backward)),
errhint("continuous=%llu, forward=%llu, backward=%llu",
stat.continuous, stat.forward, stat.backward)));
The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.
The lesser problem that drew my attention is that %llu is unportable
(and in fact draws gcc warnings for me; did you ignore that?). But
using UINT64_FORMAT instead would create a headache for translation
because the string would vary across platforms.
I won't bother correcting the violations of message style guidelines,
because this code simply has to go away.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes.
The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.
Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.
BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Attachments:
pgstattuple-0724.patchapplication/octet-stream; name=pgstattuple-0724.patchDownload
diff -cpr pgsql-orig/contrib/pgstattuple/README.pgstattuple pgsql/contrib/pgstattuple/README.pgstattuple
*** pgsql-orig/contrib/pgstattuple/README.pgstattuple Wed Jul 12 08:27:09 2006
--- pgsql/contrib/pgstattuple/README.pgstattuple Mon Jul 24 15:36:21 2006
*************** dead_tuple_len | 3157
*** 19,24 ****
--- 19,25 ----
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
+ fragmented_percent |
Here are explanations for each column:
*************** dead_tuple_len -- total dead tuples len
*** 31,36 ****
--- 32,38 ----
dead_tuple_percent -- dead tuples in %
free_space -- free space in bytes
free_percent -- free space in %
+ fragmented_percent -- fragmented factor in % (only available for btree)
2. Installing pgstattuple
diff -cpr pgsql-orig/contrib/pgstattuple/README.pgstattuple.euc_jp pgsql/contrib/pgstattuple/README.pgstattuple.euc_jp
*** pgsql-orig/contrib/pgstattuple/README.pgstattuple.euc_jp Wed Jul 12 08:27:09 2006
--- pgsql/contrib/pgstattuple/README.pgstattuple.euc_jp Mon Jul 24 15:36:36 2006
*************** dead_tuple_len | 3157
*** 23,28 ****
--- 23,29 ----
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95
+ fragmented_percent |
������������������
*************** dead_tuple_len -- ��������������
*** 34,39 ****
--- 35,41 ----
dead_tuple_percent -- ��������������������table_len��������tuple_len��������
free_space -- ����������������(������)
free_percent -- ������������������table_len��������free_space��������
+ fragmented_percent -- ��������(btree������������)
2. pgstattuple��������������
diff -cpr pgsql-orig/contrib/pgstattuple/pgstattuple.c pgsql/contrib/pgstattuple/pgstattuple.c
*** pgsql-orig/contrib/pgstattuple/pgstattuple.c Wed Jul 12 08:27:09 2006
--- pgsql/contrib/pgstattuple/pgstattuple.c Mon Jul 24 14:02:50 2006
*************** typedef struct pgstattuple_type
*** 57,62 ****
--- 57,63 ----
uint64 dead_tuple_count;
uint64 dead_tuple_len;
uint64 free_space; /* free/reusable space in bytes */
+ double fragmented_percent;
} pgstattuple_type;
/*
*************** typedef struct pgstattuple_type
*** 65,74 ****
typedef struct pgstat_btree_type
{
pgstattuple_type base; /* inherits pgstattuple_type */
-
uint64 continuous;
! uint64 forward;
! uint64 backward;
} pgstat_btree_type;
typedef void (*pgstat_page)(pgstattuple_type *, Relation, BlockNumber);
--- 66,73 ----
typedef struct pgstat_btree_type
{
pgstattuple_type base; /* inherits pgstattuple_type */
uint64 continuous;
! uint64 discontinuous;
} pgstat_btree_type;
typedef void (*pgstat_page)(pgstattuple_type *, Relation, BlockNumber);
*************** static void pgstat_hash_page(pgstattuple
*** 86,94 ****
static Datum pgstat_gist(Relation rel, FunctionCallInfo fcinfo);
static void pgstat_gist_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
! static Datum pgstat_index(pgstattuple_type *stat,
! Relation rel, BlockNumber start,
! pgstat_page pagefn, FunctionCallInfo fcinfo);
static void pgstat_index_page(pgstattuple_type *stat, Page page,
OffsetNumber minoff, OffsetNumber maxoff);
--- 85,92 ----
static Datum pgstat_gist(Relation rel, FunctionCallInfo fcinfo);
static void pgstat_gist_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
! static void pgstat_index(pgstattuple_type *stat,
! Relation rel, BlockNumber start, pgstat_page pagefn);
static void pgstat_index_page(pgstattuple_type *stat, Page page,
OffsetNumber minoff, OffsetNumber maxoff);
*************** static void pgstat_index_page(pgstattupl
*** 98,115 ****
static Datum
build_pgstattuple_type(pgstattuple_type *stat, FunctionCallInfo fcinfo)
{
! #define NCOLUMNS 9
! #define NCHARS 32
HeapTuple tuple;
! char *values[NCOLUMNS];
! char values_buf[NCOLUMNS][NCHARS];
int i;
double tuple_percent;
double dead_tuple_percent;
double free_percent; /* free/reusable space in % */
TupleDesc tupdesc;
- AttInMetadata *attinmeta;
/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
--- 96,111 ----
static Datum
build_pgstattuple_type(pgstattuple_type *stat, FunctionCallInfo fcinfo)
{
! #define NCOLUMNS 10
HeapTuple tuple;
! char nulls[NCOLUMNS];
! Datum values[NCOLUMNS];
int i;
double tuple_percent;
double dead_tuple_percent;
double free_percent; /* free/reusable space in % */
TupleDesc tupdesc;
/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
*************** build_pgstattuple_type(pgstattuple_type
*** 118,129 ****
/* make sure we have a persistent copy of the tupdesc */
tupdesc = CreateTupleDescCopy(tupdesc);
- /*
- * Generate attribute metadata needed later to produce tuples from raw C
- * strings
- */
- attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
if (stat->table_len == 0)
{
tuple_percent = 0.0;
--- 114,119 ----
*************** build_pgstattuple_type(pgstattuple_type
*** 137,162 ****
free_percent = 100.0 * stat->free_space / stat->table_len;
}
! /*
! * Prepare a values array for constructing the tuple. This should be an
! * array of C strings which will be processed later by the appropriate
! * "in" functions.
! */
! for (i = 0; i < NCOLUMNS; i++)
! values[i] = values_buf[i];
i = 0;
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->table_len);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_count);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->tuple_len);
! snprintf(values[i++], NCHARS, "%.2f", tuple_percent);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_count);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->dead_tuple_len);
! snprintf(values[i++], NCHARS, "%.2f", dead_tuple_percent);
! snprintf(values[i++], NCHARS, INT64_FORMAT, stat->free_space);
! snprintf(values[i++], NCHARS, "%.2f", free_percent);
! /* build a tuple */
! tuple = BuildTupleFromCStrings(attinmeta, values);
/* make the tuple into a datum */
return HeapTupleGetDatum(tuple);
--- 127,152 ----
free_percent = 100.0 * stat->free_space / stat->table_len;
}
! /* build a tuple */
! MemSet(values, 0, sizeof(values));
! MemSet(nulls, ' ', sizeof(nulls));
!
i = 0;
! values[i++] = Int64GetDatum(stat->table_len);
! values[i++] = Int64GetDatum(stat->tuple_count);
! values[i++] = Int64GetDatum(stat->tuple_len);
! values[i++] = Float8GetDatum(tuple_percent);
! values[i++] = Int64GetDatum(stat->dead_tuple_count);
! values[i++] = Int64GetDatum(stat->dead_tuple_len);
! values[i++] = Float8GetDatum(dead_tuple_percent);
! values[i++] = Int64GetDatum(stat->free_space);
! values[i++] = Float8GetDatum(free_percent);
! if (stat->fragmented_percent >= 0.0)
! values[i++] = Float8GetDatum(stat->fragmented_percent);
! else
! nulls[i++] = 'n';
! tuple = heap_formtuple(tupdesc, values, nulls);
/* make the tuple into a datum */
return HeapTupleGetDatum(tuple);
*************** pgstat_heap(Relation rel, FunctionCallIn
*** 316,321 ****
--- 306,312 ----
relation_close(rel, AccessShareLock);
stat.table_len = (uint64) nblocks * BLCKSZ;
+ stat.fragmented_percent = -1; /* not supported */
return build_pgstattuple_type(&stat, fcinfo);
}
*************** static Datum
*** 327,345 ****
pgstat_btree(Relation rel, FunctionCallInfo fcinfo)
{
pgstat_btree_type stat = { { 0 } };
- Datum datum;
! datum = pgstat_index((pgstattuple_type *) &stat, rel,
! BTREE_METAPAGE + 1, pgstat_btree_page, fcinfo);
! ereport(NOTICE,
! (errmsg("%.2f%% fragmented",
! 100.0 * (stat.forward + stat.backward) /
! (stat.continuous + stat.forward + stat.backward)),
! errhint("continuous=%llu, forward=%llu, backward=%llu",
! stat.continuous, stat.forward, stat.backward)));
! return datum;
}
/*
--- 318,331 ----
pgstat_btree(Relation rel, FunctionCallInfo fcinfo)
{
pgstat_btree_type stat = { { 0 } };
! pgstat_index((pgstattuple_type *) &stat, rel,
! BTREE_METAPAGE + 1, pgstat_btree_page);
! stat.base.fragmented_percent = 100.0 * stat.discontinuous /
! (stat.continuous + stat.discontinuous);
! return build_pgstattuple_type((pgstattuple_type *) &stat, fcinfo);
}
/*
*************** pgstat_btree_page(pgstattuple_type *stat
*** 374,387 ****
else if (P_ISLEAF(opaque))
{
/* check fragmentation */
! if (P_RIGHTMOST(opaque))
btstat->continuous++;
- else if (opaque->btpo_next < blkno)
- btstat->backward++;
- else if (opaque->btpo_next > blkno + 1)
- btstat->forward++;
else
! btstat->continuous++;
pgstat_index_page(stat, page, P_FIRSTDATAKEY(opaque),
PageGetMaxOffsetNumber(page));
--- 360,369 ----
else if (P_ISLEAF(opaque))
{
/* check fragmentation */
! if (P_RIGHTMOST(opaque) || opaque->btpo_next == blkno + 1)
btstat->continuous++;
else
! btstat->discontinuous++;
pgstat_index_page(stat, page, P_FIRSTDATAKEY(opaque),
PageGetMaxOffsetNumber(page));
*************** static Datum
*** 402,408 ****
pgstat_hash(Relation rel, FunctionCallInfo fcinfo)
{
pgstattuple_type stat = { 0 };
! return pgstat_index(&stat, rel, HASH_METAPAGE + 1, pgstat_hash_page, fcinfo);
}
/*
--- 384,392 ----
pgstat_hash(Relation rel, FunctionCallInfo fcinfo)
{
pgstattuple_type stat = { 0 };
! pgstat_index(&stat, rel, HASH_METAPAGE + 1, pgstat_hash_page);
! stat.fragmented_percent = -1; /* not supported */
! return build_pgstattuple_type(&stat, fcinfo);
}
/*
*************** static Datum
*** 454,460 ****
pgstat_gist(Relation rel, FunctionCallInfo fcinfo)
{
pgstattuple_type stat = { 0 };
! return pgstat_index(&stat, rel, GIST_ROOT_BLKNO + 1, pgstat_gist_page, fcinfo);
}
/*
--- 438,446 ----
pgstat_gist(Relation rel, FunctionCallInfo fcinfo)
{
pgstattuple_type stat = { 0 };
! pgstat_index(&stat, rel, GIST_ROOT_BLKNO + 1, pgstat_gist_page);
! stat.fragmented_percent = -1; /* not supported */
! return build_pgstattuple_type(&stat, fcinfo);
}
/*
*************** pgstat_gist_page(pgstattuple_type *stat,
*** 487,495 ****
/*
* pgstat_index -- returns live/dead tuples info in a generic index
*/
! static Datum
pgstat_index(pgstattuple_type *stat, Relation rel, BlockNumber start,
! pgstat_page pagefn, FunctionCallInfo fcinfo)
{
BlockNumber nblocks;
BlockNumber blkno;
--- 473,481 ----
/*
* pgstat_index -- returns live/dead tuples info in a generic index
*/
! static void
pgstat_index(pgstattuple_type *stat, Relation rel, BlockNumber start,
! pgstat_page pagefn)
{
BlockNumber nblocks;
BlockNumber blkno;
*************** pgstat_index(pgstattuple_type *stat, Rel
*** 514,521 ****
}
relation_close(rel, AccessShareLock);
-
- return build_pgstattuple_type(stat, fcinfo);
}
/*
--- 500,505 ----
diff -cpr pgsql-orig/contrib/pgstattuple/pgstattuple.sql.in pgsql/contrib/pgstattuple/pgstattuple.sql.in
*** pgsql-orig/contrib/pgstattuple/pgstattuple.sql.in Tue Feb 28 01:09:49 2006
--- pgsql/contrib/pgstattuple/pgstattuple.sql.in Mon Jul 24 14:03:06 2006
*************** CREATE TYPE pgstattuple_type AS (
*** 10,16 ****
dead_tuple_len BIGINT, -- total dead tuples length in bytes
dead_tuple_percent FLOAT, -- dead tuples in %
free_space BIGINT, -- free space in bytes
! free_percent FLOAT -- free space in %
);
CREATE OR REPLACE FUNCTION pgstattuple(text)
--- 10,17 ----
dead_tuple_len BIGINT, -- total dead tuples length in bytes
dead_tuple_percent FLOAT, -- dead tuples in %
free_space BIGINT, -- free space in bytes
! free_percent FLOAT, -- free space in %
! fragmented_percent FLOAT -- fragmented factor in %
);
CREATE OR REPLACE FUNCTION pgstattuple(text)
ITAGAKI Takahiro wrote:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes.The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.
BTW while you're handling this, why not change the function to use OUT
parameters instead of having to CREATE TYPE to handle the return type?
I think it is easier to handle ...
One question I had was: in the percentage of fragmentation, is higher
better or lower better? (I'm guessing lower is better, but this should
be mentioned in the docs)
BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?
You mean having VACUUM VERBOSE return a result set?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes:
ITAGAKI Takahiro wrote:
BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?
You mean having VACUUM VERBOSE return a result set?
To me, the point of VACUUM VERBOSE is mostly to give you some
reassurance that it's making progress. If it were returning rows
instead of notice messages, you'd lose that functionality (at least
in libpq-based clients). In any case, autovacuum has other ways
of getting the information without needing a change in user-visible
behavior.
regards, tom lane
Hi,
I'm working on an utility for b-tree index, called `pgstatindex`.
It reports b-tree index statistics like a pgstattuple as below.
----------------------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey1');
-[ RECORD 1 ]------+---------
version | 2
tree_level | 2
index_size | 17956864
root_block_no | 361
internal_pages | 8
leaf_pages | 2184
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 90.07
leaf_fragmentation | 0
pgbench=#
----------------------------------------------------------------
I want to make this to contrib module like a pgstattuple,
and to make this open to public in a few days.
Do you think this is useful?
2006/7/24, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
Show quoted text
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes.The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Satoshi Nagayasu wrote:
Hi,
I'm working on an utility for b-tree index, called `pgstatindex`.
It reports b-tree index statistics like a pgstattuple as below.
----------------------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey1');
-[ RECORD 1 ]------+---------
version | 2
tree_level | 2
index_size | 17956864
root_block_no | 361
internal_pages | 8
leaf_pages | 2184
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 90.07
leaf_fragmentation | 0pgbench=#
----------------------------------------------------------------I want to make this to contrib module like a pgstattuple,
and to make this open to public in a few days.Do you think this is useful?
Yes, for performance debugging, I think.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Satoshi Nagayasu wrote:
Hi,
I'm working on an utility for b-tree index, called `pgstatindex`.
Does it make sense to merge the pgstatindex stuff with pgstattuple, and
have the fragmentation report into pgstatindex instead of pgstattuple
itself?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> wrote:
Satoshi Nagayasu wrote:
I'm working on an utility for b-tree index, called `pgstatindex`.
Does it make sense to merge the pgstatindex stuff with pgstattuple, and
have the fragmentation report into pgstatindex instead of pgstattuple
itself?
It sounds good. We will have two separate commands:
- pgstattuple: returns tuple-level information
- pgstatindex: returns page-level information
We can use tuple-level info to check LP_DELETE flags on index tuples,
and use page-level info to check needs for REINDEX.
Do we add pgstatindex as a new contrib module,
or merge it into contrib/pgstattuple?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
Do we add pgstatindex as a new contrib module,
or merge it into contrib/pgstattuple?
I believe Alvaro was suggesting that you should add it as an additional
SQL function within contrib/pgstattuple. That'd be my advice too ---
I don't see a reason to break this out as a separate contrib module.
regards, tom lane
Hi folks,
As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.
pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.
So please try it, send comment to me, and have fun.
Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------
Attachments:
I thought these new functions were going to be merged into
/contrib/pgstattuple.
---------------------------------------------------------------------------
satoshi nagayasu wrote:
Hi folks,
As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.So please try it, send comment to me, and have fun.
Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------
[ application/x-gzip is not supported, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
I thought these new functions were going to be merged into
/contrib/pgstattuple.
Well, that's exactly what this patch seems to do ...
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote:
Bruce Momjian wrote:
I thought these new functions were going to be merged into
/contrib/pgstattuple.Well, that's exactly what this patch seems to do ...
Well, looking at the tarball it puts everything in pgstatindex, and the
Makefile is:
#-------------------------------------------------------------------------
#
# pgstatindex Makefile
#
# $PostgreSQL$
#
#-------------------------------------------------------------------------
SRCS = pgstatindex.c
MODULE_big = pgstatindex
OBJS = $(SRCS:.c=.o)
DOCS =
DATA_built = pgstatindex.sql
ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
include $(PGXS)
else
subdir = contrib/pgstatindex
top_builddir = /home/snaga/pgsql/sources/postgresql-8.1.3
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
nagayasu-san,
This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.
---------------------------------------------------------------------------
satoshi nagayasu wrote:
Hi folks,
As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.So please try it, send comment to me, and have fun.
Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------
[ application/x-gzip is not supported, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce,
I'll fix it in this week. Please wait a few days.
Thanks.
Bruce Momjian wrote:
nagayasu-san,
This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.---------------------------------------------------------------------------
satoshi nagayasu wrote:
Hi folks,
As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.So please try it, send comment to me, and have fun.
Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------[ application/x-gzip is not supported, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Hi all,
Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.
Satoshi Nagayasu wrote:
Bruce,
I'll fix it in this week. Please wait a few days.
Thanks.Bruce Momjian wrote:
nagayasu-san,
This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.---------------------------------------------------------------------------
satoshi nagayasu wrote:
Hi folks,
As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.So please try it, send comment to me, and have fun.
Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------[ application/x-gzip is not supported, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Attachments:
pgstatindex.difftext/plain; name=pgstatindex.diffDownload
diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
--- pgstattuple.orig/Makefile 2006-08-10 19:22:47.000000000 +0900
+++ pgstattuple/Makefile 2006-08-10 19:24:05.000000000 +0900
@@ -6,7 +6,7 @@
#
#-------------------------------------------------------------------------
-SRCS = pgstattuple.c
+SRCS = pgstattuple.c pgstatindex.c
MODULE_big = pgstattuple
OBJS = $(SRCS:.c=.o)
diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
--- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/pgstatindex.c 2006-08-11 17:51:26.000000000 +0900
@@ -0,0 +1,714 @@
+/*
+ * pgstatindex
+ *
+ * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
+ * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "access/heapam.h"
+#include "access/itup.h"
+#include "access/nbtree.h"
+#include "access/transam.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+
+PG_FUNCTION_INFO_V1(pgstatindex);
+PG_FUNCTION_INFO_V1(bt_metap);
+PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_stats);
+PG_FUNCTION_INFO_V1(pg_relpages);
+
+extern Datum pgstatindex(PG_FUNCTION_ARGS);
+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);
+extern Datum pg_relpages(PG_FUNCTION_ARGS);
+
+#define PGSTATINDEX_TYPE "public.pgstatindex_type"
+#define PGSTATINDEX_NCOLUMNS 10
+
+#define BTMETAP_TYPE "public.bt_metap_type"
+#define BTMETAP_NCOLUMNS 6
+
+#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
+#define BTPAGEITEMS_NCOLUMNS 6
+
+#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
+#define BTPAGESTATS_NCOLUMNS 12
+
+
+#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
+#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
+
+#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
+ if ( !(FirstOffsetNumber<=(offset) && \
+ (offset)<=PageGetMaxOffsetNumber(page)) ) \
+ elog(ERROR, "Page offset number out of range."); }
+
+#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
+ if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
+ elog(ERROR, "Block number out of range."); }
+
+/* ------------------------------------------------
+ * structure for single btree page statistics
+ * ------------------------------------------------
+ */
+typedef struct BTPageStat {
+ uint32 blkno;
+ uint32 live_items;
+ uint32 dead_items;
+ uint32 page_size;
+ uint32 max_avail;
+ uint32 free_size;
+ uint32 avg_item_size;
+ uint32 fragments;
+ bool is_root;
+ bool is_internal;
+ bool is_leaf;
+ bool is_deleted;
+ bool is_empty;
+} BTPageStat;
+
+/* ------------------------------------------------
+ * A structure for a whole btree index statistics
+ * used by pgstatindex().
+ * ------------------------------------------------
+ */
+typedef struct BTIndexStat {
+ uint32 version;
+ BlockNumber root_blkno;
+ uint32 level;
+
+ uint32 live_items;
+ uint32 dead_items;
+
+ uint32 root_pages;
+ uint32 internal_pages;
+ uint32 leaf_pages;
+ uint32 empty_pages;
+ uint32 deleted_pages;
+
+ uint32 page_size;
+ uint32 avg_item_size;
+
+ uint32 max_avail;
+ uint32 free_space;
+
+ uint32 fragments;
+} BTIndexStat;
+
+/* -------------------------------------------------
+ * GetBTPageStatistics()
+ *
+ * Collect statistics of single b-tree leaf page
+ * -------------------------------------------------
+ */
+static bool
+GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
+{
+ Page page = BufferGetPage(buffer);
+ PageHeader phdr = (PageHeader) page;
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+ int item_size = 0;
+ int off;
+
+ stat->blkno = blkno;
+
+ stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData );
+
+ stat->dead_items = stat->live_items = 0;
+
+ stat->page_size = PageGetPageSize(page);
+
+ /* page type */
+ stat->is_root = false;
+ stat->is_leaf = false;
+ stat->is_deleted = false;
+ stat->is_empty = false;
+
+ if ( P_ISDELETED(opaque) )
+ {
+ stat->is_deleted = true;
+ return true;
+ }
+ else if ( P_IGNORE(opaque) )
+ stat->is_empty = true;
+ else if ( P_ISLEAF(opaque) )
+ stat->is_leaf = true;
+ else if ( P_ISROOT(opaque) )
+ stat->is_root = true;
+ else
+ stat->is_internal = true;
+
+ /*----------------------------------------------
+ * If a next leaf is on the previous block,
+ * it means a fragmentation.
+ *----------------------------------------------
+ */
+ stat->fragments = 0;
+ if ( stat->is_leaf )
+ {
+ if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno )
+ stat->fragments++;
+ }
+
+ /* count live and dead tuples, and free space */
+ for (off=FirstOffsetNumber ; off<=maxoff ; off++)
+ {
+ BTItem btitem;
+ IndexTuple itup;
+
+ ItemId id = PageGetItemId(page, off);
+
+ btitem = (BTItem)PageGetItem(page, id);
+
+ itup = &(btitem->bti_itup);
+
+ item_size += IndexTupleSize(itup);
+
+ if ( !ItemIdDeleted(id) )
+ stat->live_items++;
+ else
+ stat->dead_items++;
+ }
+ stat->free_size = PageGetFreeSpace(page);
+
+#ifdef NOT_USED
+ elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail,
+ (float)stat->free_size/(float)stat->max_avail*100.0);
+#endif
+
+ if ( (stat->live_items + stat->dead_items) > 0 )
+ stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items );
+ else
+ stat->avg_item_size = 0;
+
+ return true;
+}
+
+
+/* ------------------------------------------------------
+ * pgstatindex()
+ *
+ * Usage: SELECT * FROM pgstatindex('t1_pkey');
+ * ------------------------------------------------------
+ */
+Datum
+pgstatindex(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ Relation rel;
+ RangeVar *relrv;
+ Datum result;
+ uint32 nblocks;
+ uint32 blkno;
+ BTIndexStat indexStat;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+ elog(ERROR, "pgstatindex() can be used only on b-tree index.");
+
+ /*-------------------
+ * Read a metapage
+ *-------------------
+ */
+ {
+ Buffer buffer = ReadBuffer(rel, 0);
+ Page page = BufferGetPage(buffer);
+ BTMetaPageData *metad = BTPageGetMeta(page);
+
+// snprintf(values[j++], 32, "%d", metad->btm_magic);
+ indexStat.version = metad->btm_version;
+ indexStat.root_blkno = metad->btm_root;
+ indexStat.level = metad->btm_level;
+// snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+// snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+ ReleaseBuffer(buffer);
+ }
+
+ nblocks = RelationGetNumberOfBlocks(rel);
+
+ /* -- init stat -- */
+ indexStat.fragments = 0;
+
+ indexStat.root_pages = 0;
+ indexStat.leaf_pages = 0;
+ indexStat.internal_pages = 0;
+ indexStat.empty_pages = 0;
+ indexStat.deleted_pages = 0;
+
+ indexStat.max_avail = 0;
+ indexStat.free_space = 0;
+
+ /*-----------------------
+ * Scan all blocks
+ *-----------------------
+ */
+ for (blkno=1 ; blkno<nblocks ; blkno++)
+ {
+ Buffer buffer = ReadBuffer(rel, blkno);
+ BTPageStat stat;
+
+ /* scan one page */
+ stat.blkno = blkno;
+ GetBTPageStatistics(blkno, buffer, &stat);
+
+ /*---------------------
+ * page status (type)
+ *---------------------
+ */
+ if ( stat.is_deleted )
+ indexStat.deleted_pages++;
+ else if ( stat.is_leaf )
+ indexStat.leaf_pages++;
+ else if ( stat.is_internal )
+ indexStat.internal_pages++;
+ else if ( stat.is_empty )
+ indexStat.empty_pages++;
+ else if ( stat.is_root )
+ indexStat.root_pages++;
+ else
+ elog(ERROR, "unknown page status.");
+
+ /* -- leaf fragmentation -- */
+ indexStat.fragments += stat.fragments;
+
+ if ( stat.is_leaf )
+ {
+ indexStat.max_avail += stat.max_avail;
+ indexStat.free_space += stat.free_size;
+ }
+
+ ReleaseBuffer(buffer);
+ }
+
+ relation_close(rel, AccessShareLock);
+
+#ifdef NOT_USED
+ elog(NOTICE, "[index information]");
+ elog(NOTICE, "version.................: %d", indexStat.version);
+ elog(NOTICE, "tree level..............: %d", indexStat.level);
+ elog(NOTICE, "index size..............: %d", (indexStat.root_pages +
+ indexStat.leaf_pages +
+ indexStat.internal_pages +
+ indexStat.deleted_pages +
+ indexStat.empty_pages) * BLCKSZ);
+
+ elog(NOTICE, "");
+ elog(NOTICE, "[page statistics]");
+ elog(NOTICE, "root block number.......: %d", indexStat.root_blkno);
+ elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages);
+ elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages);
+ elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages);
+ elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages);
+
+ elog(NOTICE, "");
+ elog(NOTICE, "[leaf statistics]");
+ elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
+ elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages * 100.0 );
+#endif
+
+ /*----------------------------
+ * Build a result tuple
+ *----------------------------
+ */
+ {
+ TupleDesc tupleDesc;
+ int j;
+ char *values[PGSTATINDEX_NCOLUMNS];
+
+ HeapTupleData tupleData;
+ HeapTuple tuple = &tupleData;
+
+ tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
+
+ j = 0;
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.version);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.level);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", (indexStat.root_pages +
+ indexStat.leaf_pages +
+ indexStat.internal_pages +
+ indexStat.deleted_pages +
+ indexStat.empty_pages) * BLCKSZ );
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.root_blkno);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.internal_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.empty_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 );
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+ values);
+
+ result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+ }
+
+ PG_RETURN_DATUM( result );
+}
+
+/* -----------------------------------------------
+ * bt_page()
+ *
+ * Usage: SELECT * FROM bt_page('t1_pkey', 0);
+ * -----------------------------------------------
+ */
+Datum
+bt_page_stats(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ uint32 blkno = PG_GETARG_UINT32(1);
+ Buffer buffer;
+
+ Relation rel;
+ RangeVar *relrv;
+ Datum result;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ CHECK_RELATION_BLOCK_RANGE(rel, blkno);
+
+ buffer = ReadBuffer(rel, blkno);
+
+ if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+ elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
+
+ if ( blkno==0 )
+ elog(ERROR, "Block 0 is a meta page.");
+
+ {
+ HeapTuple tuple;
+ TupleDesc tupleDesc;
+ int j;
+ char *values[BTPAGESTATS_NCOLUMNS];
+
+ BTPageStat stat;
+
+ GetBTPageStatistics(blkno, buffer, &stat);
+
+ tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
+
+ j = 0;
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.blkno);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.live_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.dead_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.avg_item_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.page_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.free_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.is_deleted);
+ values[j] = palloc(32);
+// snprintf(values[j++], 32, "%d", opaque->btpo_prev);
+ snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+ values[j] = palloc(32);
+// snprintf(values[j++], 32, "%d", opaque->btpo_next);
+ snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+ values[j] = palloc(32);
+// snprintf(values[j++], 32, "%d", opaque->btpo.level);
+ snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+ values[j] = palloc(32);
+// snprintf(values[j++], 32, "%d", opaque->btpo_flags);
+ snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+ values);
+
+ result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+ }
+
+ ReleaseBuffer(buffer);
+
+ relation_close(rel, AccessShareLock);
+
+ PG_RETURN_DATUM(result);
+}
+
+/*-------------------------------------------------------
+ * bt_page_items()
+ *
+ * Get IndexTupleData set in a leaf page
+ *
+ * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
+ *-------------------------------------------------------
+ */
+/* ---------------------------------------------------
+ * data structure for SRF to hold a scan information
+ * ---------------------------------------------------
+ */
+struct user_args
+{
+ TupleDesc tupd;
+ Relation rel;
+ Buffer buffer;
+ Page page;
+ uint16 offset;
+};
+
+Datum
+bt_page_items(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ uint32 blkno = PG_GETARG_UINT32(1);
+
+ RangeVar *relrv;
+ Datum result;
+ char *values[BTPAGEITEMS_NCOLUMNS];
+ BTPageOpaque opaque;
+ HeapTuple tuple;
+ ItemId id;
+
+ FuncCallContext *fctx;
+ MemoryContext mctx;
+ struct user_args *uargs = NULL;
+
+ if ( blkno==0 )
+ elog(ERROR, "Block 0 is a meta page.");
+
+ if ( SRF_IS_FIRSTCALL() )
+ {
+ fctx = SRF_FIRSTCALL_INIT();
+ mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+ uargs = palloc(sizeof(struct user_args));
+
+ uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
+ uargs->offset = FirstOffsetNumber;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ uargs->rel = relation_openrv(relrv, AccessShareLock);
+
+ CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
+
+ uargs->buffer = ReadBuffer(uargs->rel, blkno);
+
+ if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) )
+ elog(ERROR, "bt_page_items() can be used only on b-tree index.");
+
+ uargs->page = BufferGetPage(uargs->buffer);
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+ if ( P_ISDELETED(opaque) )
+ elog(NOTICE, "bt_page_items(): this page is deleted.");
+
+ fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+ fctx->user_fctx = uargs;
+
+ MemoryContextSwitchTo(mctx);
+ }
+
+ fctx = SRF_PERCALL_SETUP();
+ uargs = fctx->user_fctx;
+
+ if ( fctx->call_cntr < fctx->max_calls )
+ {
+ BTItem btitem;
+ IndexTuple itup;
+
+ id = PageGetItemId(uargs->page, uargs->offset);
+
+ if ( !ItemIdIsValid(id) )
+ elog(ERROR, "Invalid ItemId.");
+
+ btitem = (BTItem)PageGetItem(uargs->page, id);
+ itup = &(btitem->bti_itup);
+
+ {
+ int j = 0;
+
+ BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
+
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", uargs->offset);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+ {
+ int off;
+ char *dump;
+ char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info);
+
+ dump = palloc(IndexTupleSize(itup)*3);
+ memset(dump, 0, IndexTupleSize(itup)*3);
+
+ for (off=0 ;
+ off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ;
+ off++)
+ {
+ if ( dump[0]=='\0' )
+ sprintf(dump, "%02x", *(ptr+off) & 0xff);
+ else
+ {
+ char buf[4];
+ sprintf(buf, " %02x", *(ptr+off) & 0xff);
+ strcat(dump, buf);
+ }
+ }
+ values[j] = dump;
+ }
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
+ result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
+ }
+
+ uargs->offset = uargs->offset + 1;
+
+ SRF_RETURN_NEXT(fctx, result);
+ }
+ else
+ {
+ ReleaseBuffer(uargs->buffer);
+ relation_close(uargs->rel, AccessShareLock);
+
+ SRF_RETURN_DONE(fctx);
+ }
+}
+
+
+/* ------------------------------------------------
+ * bt_metap()
+ *
+ * Get a btree meta-page information
+ *
+ * Usage: SELECT * FROM bt_metap('t1_pkey')
+ * ------------------------------------------------
+ */
+Datum
+bt_metap(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ Buffer buffer;
+
+ Relation rel;
+ RangeVar *relrv;
+ Datum result;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+ elog(ERROR, "bt_metap() can be used only on b-tree index.");
+
+ buffer = ReadBuffer(rel, 0);
+
+ {
+ BTMetaPageData *metad;
+
+ TupleDesc tupleDesc;
+ int j;
+ char *values[BTMETAP_NCOLUMNS];
+ HeapTuple tuple;
+
+ Page page = BufferGetPage(buffer);
+
+ metad = BTPageGetMeta(page);
+
+ tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
+
+ j = 0;
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_magic);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_version);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_root);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_level);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+ values);
+
+ result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+ }
+
+ ReleaseBuffer(buffer);
+
+ relation_close(rel, AccessShareLock);
+
+ PG_RETURN_DATUM(result);
+}
+
+/* --------------------------------------------------------
+ * pg_relpages()
+ *
+ * Get a number of pages of the table/index.
+ *
+ * Usage: SELECT pg_relpages('t1');
+ * SELECT pg_relpages('t1_pkey');
+ * --------------------------------------------------------
+ */
+Datum
+pg_relpages(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+
+ Relation rel;
+ RangeVar *relrv;
+ int4 relpages;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ relpages = RelationGetNumberOfBlocks(rel);
+
+ relation_close(rel, AccessShareLock);
+
+ PG_RETURN_INT32(relpages);
+}
+
diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
--- pgstattuple.orig/pgstattuple.sql.in 2006-08-10 19:22:47.000000000 +0900
+++ pgstattuple/pgstattuple.sql.in 2006-08-11 17:46:19.000000000 +0900
@@ -22,3 +22,97 @@
RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE 'C' STRICT;
+
+--
+-- pgstatindex
+--
+DROP TYPE pgstatindex_type CASCADE;
+CREATE TYPE pgstatindex_type AS (
+ version int4,
+ tree_level int4,
+ index_size int4,
+ root_block_no int4,
+ internal_pages int4,
+ leaf_pages int4,
+ empty_pages int4,
+ deleted_pages int4,
+ avg_leaf_density float8,
+ leaf_fragmentation float8
+);
+
+CREATE OR REPLACE FUNCTION pgstatindex(text)
+RETURNS pgstatindex_type
+AS 'MODULE_PATHNAME', 'pgstatindex'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_metap()
+--
+DROP TYPE bt_metap_type CASCADE;
+CREATE TYPE bt_metap_type AS (
+ magic int4,
+ version int4,
+ root int4,
+ level int4,
+ fastroot int4,
+ fastlevel int4
+);
+
+CREATE OR REPLACE FUNCTION bt_metap(text)
+RETURNS bt_metap_type
+AS 'MODULE_PATHNAME', 'bt_metap'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_items()
+--
+DROP TYPE bt_page_items_type CASCADE;
+CREATE TYPE bt_page_items_type AS (
+ itemoffset int4,
+ ctid tid,
+ itemlen int4,
+ nulls bool,
+ vars bool,
+ data text
+);
+
+DROP FUNCTION bt_page_items(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
+RETURNS SETOF bt_page_items_type
+AS 'MODULE_PATHNAME', 'bt_page_items'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_stats()
+--
+DROP TYPE bt_page_stats_type CASCADE;
+CREATE TYPE bt_page_stats_type AS (
+ blkno int4,
+ live_items int4,
+ dead_items int4,
+ total_items int4,
+ avg_item_size float,
+ page_size int4,
+ free_size int4,
+ is_deleted int4,
+ btpo_prev int4,
+ btpo_next int4,
+ btpo_level int4,
+ btpo_flags int4
+);
+
+DROP FUNCTION bt_page_stats(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
+RETURNS bt_page_stats_type
+AS 'MODULE_PATHNAME', 'bt_page_stats'
+LANGUAGE 'C' STRICT;
+
+--
+-- pg_relpages()
+--
+CREATE OR REPLACE FUNCTION pg_relpages(text)
+RETURNS int
+AS 'MODULE_PATHNAME', 'pg_relpages'
+LANGUAGE 'C' STRICT;
diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
--- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/test.sh 2006-08-11 17:47:05.000000000 +0900
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+export PATH=/usr/local/pgsql814/bin:$PATH
+
+psql pgbench<<EOF
+DROP FUNCTION pgstatindex(text);
+EOF
+
+psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench
+
+psql pgbench<<EOF
+SELECT * FROM pg_relpages('accounts_pkey');
+\x
+SELECT * FROM pgstatindex('accounts_pkey');
+SELECT * FROM bt_metap('accounts_pkey');
+\x
+SELECT * FROM bt_page_items('accounts_pkey', 0);
+SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
+
+\x
+SELECT * FROM bt_page_stats('accounts_pkey', 1);
+EOF
I don't see any documentation, so I assume you want me to add something
to README.pgstattuple.
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
---------------------------------------------------------------------------
Satoshi Nagayasu wrote:
Hi all,
Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.Satoshi Nagayasu wrote:
Bruce,
I'll fix it in this week. Please wait a few days.
Thanks.Bruce Momjian wrote:
nagayasu-san,
This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.---------------------------------------------------------------------------
satoshi nagayasu wrote:
Hi folks,
As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.So please try it, send comment to me, and have fun.
Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------[ application/x-gzip is not supported, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile --- pgstattuple.orig/Makefile 2006-08-10 19:22:47.000000000 +0900 +++ pgstattuple/Makefile 2006-08-10 19:24:05.000000000 +0900 @@ -6,7 +6,7 @@ # #--------------------------------------------------------------------------SRCS = pgstattuple.c +SRCS = pgstattuple.c pgstatindex.cMODULE_big = pgstattuple OBJS = $(SRCS:.c=.o) diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/pgstatindex.c 2006-08-11 17:51:26.000000000 +0900 @@ -0,0 +1,714 @@ +/* + * pgstatindex + * + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp> + * + * Permission to use, copy, modify, and distribute this software and + * its documentation for any purpose, without fee, and without a + * written agreement is hereby granted, provided that the above + * copyright notice and this paragraph and the following two + * paragraphs appear in all copies. + * + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. + */ + +#include "postgres.h" + +#include "fmgr.h" +#include "funcapi.h" +#include "access/heapam.h" +#include "access/itup.h" +#include "access/nbtree.h" +#include "access/transam.h" +#include "catalog/namespace.h" +#include "catalog/pg_type.h" +#include "utils/builtins.h" +#include "utils/inval.h" + +PG_FUNCTION_INFO_V1(pgstatindex); +PG_FUNCTION_INFO_V1(bt_metap); +PG_FUNCTION_INFO_V1(bt_page_items); +PG_FUNCTION_INFO_V1(bt_page_stats); +PG_FUNCTION_INFO_V1(pg_relpages); + +extern Datum pgstatindex(PG_FUNCTION_ARGS); +extern Datum bt_metap(PG_FUNCTION_ARGS); +extern Datum bt_page_items(PG_FUNCTION_ARGS); +extern Datum bt_page_stats(PG_FUNCTION_ARGS); +extern Datum pg_relpages(PG_FUNCTION_ARGS); + +#define PGSTATINDEX_TYPE "public.pgstatindex_type" +#define PGSTATINDEX_NCOLUMNS 10 + +#define BTMETAP_TYPE "public.bt_metap_type" +#define BTMETAP_NCOLUMNS 6 + +#define BTPAGEITEMS_TYPE "public.bt_page_items_type" +#define BTPAGEITEMS_NCOLUMNS 6 + +#define BTPAGESTATS_TYPE "public.bt_page_stats_type" +#define BTPAGESTATS_NCOLUMNS 12 + + +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i') +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID) + +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \ + if ( !(FirstOffsetNumber<=(offset) && \ + (offset)<=PageGetMaxOffsetNumber(page)) ) \ + elog(ERROR, "Page offset number out of range."); } + +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \ + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \ + elog(ERROR, "Block number out of range."); } + +/* ------------------------------------------------ + * structure for single btree page statistics + * ------------------------------------------------ + */ +typedef struct BTPageStat { + uint32 blkno; + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 max_avail; + uint32 free_size; + uint32 avg_item_size; + uint32 fragments; + bool is_root; + bool is_internal; + bool is_leaf; + bool is_deleted; + bool is_empty; +} BTPageStat; + +/* ------------------------------------------------ + * A structure for a whole btree index statistics + * used by pgstatindex(). + * ------------------------------------------------ + */ +typedef struct BTIndexStat { + uint32 version; + BlockNumber root_blkno; + uint32 level; + + uint32 live_items; + uint32 dead_items; + + uint32 root_pages; + uint32 internal_pages; + uint32 leaf_pages; + uint32 empty_pages; + uint32 deleted_pages; + + uint32 page_size; + uint32 avg_item_size; + + uint32 max_avail; + uint32 free_space; + + uint32 fragments; +} BTIndexStat; + +/* ------------------------------------------------- + * GetBTPageStatistics() + * + * Collect statistics of single b-tree leaf page + * ------------------------------------------------- + */ +static bool +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat) +{ + Page page = BufferGetPage(buffer); + PageHeader phdr = (PageHeader) page; + OffsetNumber maxoff = PageGetMaxOffsetNumber(page); + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); + int item_size = 0; + int off; + + stat->blkno = blkno; + + stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData ); + + stat->dead_items = stat->live_items = 0; + + stat->page_size = PageGetPageSize(page); + + /* page type */ + stat->is_root = false; + stat->is_leaf = false; + stat->is_deleted = false; + stat->is_empty = false; + + if ( P_ISDELETED(opaque) ) + { + stat->is_deleted = true; + return true; + } + else if ( P_IGNORE(opaque) ) + stat->is_empty = true; + else if ( P_ISLEAF(opaque) ) + stat->is_leaf = true; + else if ( P_ISROOT(opaque) ) + stat->is_root = true; + else + stat->is_internal = true; + + /*---------------------------------------------- + * If a next leaf is on the previous block, + * it means a fragmentation. + *---------------------------------------------- + */ + stat->fragments = 0; + if ( stat->is_leaf ) + { + if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno ) + stat->fragments++; + } + + /* count live and dead tuples, and free space */ + for (off=FirstOffsetNumber ; off<=maxoff ; off++) + { + BTItem btitem; + IndexTuple itup; + + ItemId id = PageGetItemId(page, off); + + btitem = (BTItem)PageGetItem(page, id); + + itup = &(btitem->bti_itup); + + item_size += IndexTupleSize(itup); + + if ( !ItemIdDeleted(id) ) + stat->live_items++; + else + stat->dead_items++; + } + stat->free_size = PageGetFreeSpace(page); + +#ifdef NOT_USED + elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail, + (float)stat->free_size/(float)stat->max_avail*100.0); +#endif + + if ( (stat->live_items + stat->dead_items) > 0 ) + stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items ); + else + stat->avg_item_size = 0; + + return true; +} + + +/* ------------------------------------------------------ + * pgstatindex() + * + * Usage: SELECT * FROM pgstatindex('t1_pkey'); + * ------------------------------------------------------ + */ +Datum +pgstatindex(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Relation rel; + RangeVar *relrv; + Datum result; + uint32 nblocks; + uint32 blkno; + BTIndexStat indexStat; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if ( !IS_INDEX(rel) || !IS_BTREE(rel) ) + elog(ERROR, "pgstatindex() can be used only on b-tree index."); + + /*------------------- + * Read a metapage + *------------------- + */ + { + Buffer buffer = ReadBuffer(rel, 0); + Page page = BufferGetPage(buffer); + BTMetaPageData *metad = BTPageGetMeta(page); + +// snprintf(values[j++], 32, "%d", metad->btm_magic); + indexStat.version = metad->btm_version; + indexStat.root_blkno = metad->btm_root; + indexStat.level = metad->btm_level; +// snprintf(values[j++], 32, "%d", metad->btm_fastroot); +// snprintf(values[j++], 32, "%d", metad->btm_fastlevel); + + ReleaseBuffer(buffer); + } + + nblocks = RelationGetNumberOfBlocks(rel); + + /* -- init stat -- */ + indexStat.fragments = 0; + + indexStat.root_pages = 0; + indexStat.leaf_pages = 0; + indexStat.internal_pages = 0; + indexStat.empty_pages = 0; + indexStat.deleted_pages = 0; + + indexStat.max_avail = 0; + indexStat.free_space = 0; + + /*----------------------- + * Scan all blocks + *----------------------- + */ + for (blkno=1 ; blkno<nblocks ; blkno++) + { + Buffer buffer = ReadBuffer(rel, blkno); + BTPageStat stat; + + /* scan one page */ + stat.blkno = blkno; + GetBTPageStatistics(blkno, buffer, &stat); + + /*--------------------- + * page status (type) + *--------------------- + */ + if ( stat.is_deleted ) + indexStat.deleted_pages++; + else if ( stat.is_leaf ) + indexStat.leaf_pages++; + else if ( stat.is_internal ) + indexStat.internal_pages++; + else if ( stat.is_empty ) + indexStat.empty_pages++; + else if ( stat.is_root ) + indexStat.root_pages++; + else + elog(ERROR, "unknown page status."); + + /* -- leaf fragmentation -- */ + indexStat.fragments += stat.fragments; + + if ( stat.is_leaf ) + { + indexStat.max_avail += stat.max_avail; + indexStat.free_space += stat.free_size; + } + + ReleaseBuffer(buffer); + } + + relation_close(rel, AccessShareLock); + +#ifdef NOT_USED + elog(NOTICE, "[index information]"); + elog(NOTICE, "version.................: %d", indexStat.version); + elog(NOTICE, "tree level..............: %d", indexStat.level); + elog(NOTICE, "index size..............: %d", (indexStat.root_pages + + indexStat.leaf_pages + + indexStat.internal_pages + + indexStat.deleted_pages + + indexStat.empty_pages) * BLCKSZ); + + elog(NOTICE, ""); + elog(NOTICE, "[page statistics]"); + elog(NOTICE, "root block number.......: %d", indexStat.root_blkno); + elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages); + elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages); + elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages); + elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages); + + elog(NOTICE, ""); + elog(NOTICE, "[leaf statistics]"); + elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 ); + elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages * 100.0 ); +#endif + + /*---------------------------- + * Build a result tuple + *---------------------------- + */ + { + TupleDesc tupleDesc; + int j; + char *values[PGSTATINDEX_NCOLUMNS]; + + HeapTupleData tupleData; + HeapTuple tuple = &tupleData; + + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", (indexStat.root_pages + + indexStat.leaf_pages + + indexStat.internal_pages + + indexStat.deleted_pages + + indexStat.empty_pages) * BLCKSZ ); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.root_blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.internal_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.leaf_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.empty_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.deleted_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 ); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 ); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + PG_RETURN_DATUM( result ); +} + +/* ----------------------------------------------- + * bt_page() + * + * Usage: SELECT * FROM bt_page('t1_pkey', 0); + * ----------------------------------------------- + */ +Datum +bt_page_stats(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(rel, blkno); + + buffer = ReadBuffer(rel, blkno); + + if ( !IS_INDEX(rel) || !IS_BTREE(rel) ) + elog(ERROR, "bt_page_stats() can be used only on b-tree index."); + + if ( blkno==0 ) + elog(ERROR, "Block 0 is a meta page."); + + { + HeapTuple tuple; + TupleDesc tupleDesc; + int j; + char *values[BTPAGESTATS_NCOLUMNS]; + + BTPageStat stat; + + GetBTPageStatistics(blkno, buffer, &stat); + + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.avg_item_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.page_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.free_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.is_deleted); + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo_prev); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo_next); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo.level); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo_flags); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/*------------------------------------------------------- + * bt_page_items() + * + * Get IndexTupleData set in a leaf page + * + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0); + *------------------------------------------------------- + */ +/* --------------------------------------------------- + * data structure for SRF to hold a scan information + * --------------------------------------------------- + */ +struct user_args +{ + TupleDesc tupd; + Relation rel; + Buffer buffer; + Page page; + uint16 offset; +}; + +Datum +bt_page_items(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + + RangeVar *relrv; + Datum result; + char *values[BTPAGEITEMS_NCOLUMNS]; + BTPageOpaque opaque; + HeapTuple tuple; + ItemId id; + + FuncCallContext *fctx; + MemoryContext mctx; + struct user_args *uargs = NULL; + + if ( blkno==0 ) + elog(ERROR, "Block 0 is a meta page."); + + if ( SRF_IS_FIRSTCALL() ) + { + fctx = SRF_FIRSTCALL_INIT(); + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx); + + uargs = palloc(sizeof(struct user_args)); + + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE); + uargs->offset = FirstOffsetNumber; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + uargs->rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno); + + uargs->buffer = ReadBuffer(uargs->rel, blkno); + + if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) ) + elog(ERROR, "bt_page_items() can be used only on b-tree index."); + + uargs->page = BufferGetPage(uargs->buffer); + + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page); + + if ( P_ISDELETED(opaque) ) + elog(NOTICE, "bt_page_items(): this page is deleted."); + + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page); + fctx->user_fctx = uargs; + + MemoryContextSwitchTo(mctx); + } + + fctx = SRF_PERCALL_SETUP(); + uargs = fctx->user_fctx; + + if ( fctx->call_cntr < fctx->max_calls ) + { + BTItem btitem; + IndexTuple itup; + + id = PageGetItemId(uargs->page, uargs->offset); + + if ( !ItemIdIsValid(id) ) + elog(ERROR, "Invalid ItemId."); + + btitem = (BTItem)PageGetItem(uargs->page, id); + itup = &(btitem->bti_itup); + + { + int j = 0; + + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", uargs->offset); + values[j] = palloc(32); + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", IndexTupleSize(itup)); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f'); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f'); + + { + int off; + char *dump; + char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info); + + dump = palloc(IndexTupleSize(itup)*3); + memset(dump, 0, IndexTupleSize(itup)*3); + + for (off=0 ; + off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ; + off++) + { + if ( dump[0]=='\0' ) + sprintf(dump, "%02x", *(ptr+off) & 0xff); + else + { + char buf[4]; + sprintf(buf, " %02x", *(ptr+off) & 0xff); + strcat(dump, buf); + } + } + values[j] = dump; + } + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values); + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple); + } + + uargs->offset = uargs->offset + 1; + + SRF_RETURN_NEXT(fctx, result); + } + else + { + ReleaseBuffer(uargs->buffer); + relation_close(uargs->rel, AccessShareLock); + + SRF_RETURN_DONE(fctx); + } +} + + +/* ------------------------------------------------ + * bt_metap() + * + * Get a btree meta-page information + * + * Usage: SELECT * FROM bt_metap('t1_pkey') + * ------------------------------------------------ + */ +Datum +bt_metap(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if ( !IS_INDEX(rel) || !IS_BTREE(rel) ) + elog(ERROR, "bt_metap() can be used only on b-tree index."); + + buffer = ReadBuffer(rel, 0); + + { + BTMetaPageData *metad; + + TupleDesc tupleDesc; + int j; + char *values[BTMETAP_NCOLUMNS]; + HeapTuple tuple; + + Page page = BufferGetPage(buffer); + + metad = BTPageGetMeta(page); + + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_magic); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_root); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastroot); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastlevel); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/* -------------------------------------------------------- + * pg_relpages() + * + * Get a number of pages of the table/index. + * + * Usage: SELECT pg_relpages('t1'); + * SELECT pg_relpages('t1_pkey'); + * -------------------------------------------------------- + */ +Datum +pg_relpages(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + + Relation rel; + RangeVar *relrv; + int4 relpages; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + relpages = RelationGetNumberOfBlocks(rel); + + relation_close(rel, AccessShareLock); + + PG_RETURN_INT32(relpages); +} + diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in --- pgstattuple.orig/pgstattuple.sql.in 2006-08-10 19:22:47.000000000 +0900 +++ pgstattuple/pgstattuple.sql.in 2006-08-11 17:46:19.000000000 +0900 @@ -22,3 +22,97 @@ RETURNS pgstattuple_type AS 'MODULE_PATHNAME', 'pgstattuplebyid' LANGUAGE 'C' STRICT; + +-- +-- pgstatindex +-- +DROP TYPE pgstatindex_type CASCADE; +CREATE TYPE pgstatindex_type AS ( + version int4, + tree_level int4, + index_size int4, + root_block_no int4, + internal_pages int4, + leaf_pages int4, + empty_pages int4, + deleted_pages int4, + avg_leaf_density float8, + leaf_fragmentation float8 +); + +CREATE OR REPLACE FUNCTION pgstatindex(text) +RETURNS pgstatindex_type +AS 'MODULE_PATHNAME', 'pgstatindex' +LANGUAGE 'C' STRICT; + +-- +-- bt_metap() +-- +DROP TYPE bt_metap_type CASCADE; +CREATE TYPE bt_metap_type AS ( + magic int4, + version int4, + root int4, + level int4, + fastroot int4, + fastlevel int4 +); + +CREATE OR REPLACE FUNCTION bt_metap(text) +RETURNS bt_metap_type +AS 'MODULE_PATHNAME', 'bt_metap' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_items() +-- +DROP TYPE bt_page_items_type CASCADE; +CREATE TYPE bt_page_items_type AS ( + itemoffset int4, + ctid tid, + itemlen int4, + nulls bool, + vars bool, + data text +); + +DROP FUNCTION bt_page_items(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_items(text, int4) +RETURNS SETOF bt_page_items_type +AS 'MODULE_PATHNAME', 'bt_page_items' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_stats() +-- +DROP TYPE bt_page_stats_type CASCADE; +CREATE TYPE bt_page_stats_type AS ( + blkno int4, + live_items int4, + dead_items int4, + total_items int4, + avg_item_size float, + page_size int4, + free_size int4, + is_deleted int4, + btpo_prev int4, + btpo_next int4, + btpo_level int4, + btpo_flags int4 +); + +DROP FUNCTION bt_page_stats(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4) +RETURNS bt_page_stats_type +AS 'MODULE_PATHNAME', 'bt_page_stats' +LANGUAGE 'C' STRICT; + +-- +-- pg_relpages() +-- +CREATE OR REPLACE FUNCTION pg_relpages(text) +RETURNS int +AS 'MODULE_PATHNAME', 'pg_relpages' +LANGUAGE 'C' STRICT; diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/test.sh 2006-08-11 17:47:05.000000000 +0900 @@ -0,0 +1,22 @@ +#!/bin/sh + +export PATH=/usr/local/pgsql814/bin:$PATH + +psql pgbench<<EOF +DROP FUNCTION pgstatindex(text); +EOF + +psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench + +psql pgbench<<EOF +SELECT * FROM pg_relpages('accounts_pkey'); +\x +SELECT * FROM pgstatindex('accounts_pkey'); +SELECT * FROM bt_metap('accounts_pkey'); +\x +SELECT * FROM bt_page_items('accounts_pkey', 0); +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20; + +\x +SELECT * FROM bt_page_stats('accounts_pkey', 1); +EOF---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Satoshi Nagayasu wrote:
Hi all,
Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.
Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)
While you're at it, please consider removing C++ style comments and
unused code.
Formatting is way off as well, but I guess that is easily fixed with
pgindent.
Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro,
Alvaro Herrera wrote:
Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)While you're at it, please consider removing C++ style comments and
unused code.Formatting is way off as well, but I guess that is easily fixed with
pgindent.
Thanks for comments. I'm going to fix my patch from now.
Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).
I wanted to sample some pages from the table/index, and get their statistics
to know table/index conditions. I know pgstattuple() reports table
statistics, however, pgstattuple() generates heavy CPU and I/O load.
When we need to sample some pages from table/index, we need to know
true number of blocks.
I have another function, called pgstatpage(), to get information inside
a single block/page statistics of the table. pg_relpages() will be used
with this.
Sorry for not mentioned in previous post about pgstatpage(),
but I've remembered about it just now.
Many memories in my brain have already `paged-out` (too busy in last few months),
and some of them got `out-of-memory`. :^)
Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Bruce,
Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.
Thanks.
Satoshi Nagayasu wrote:
Alvaro,
Alvaro Herrera wrote:
Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)While you're at it, please consider removing C++ style comments and
unused code.Formatting is way off as well, but I guess that is easily fixed with
pgindent.Thanks for comments. I'm going to fix my patch from now.
Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).I wanted to sample some pages from the table/index, and get their statistics
to know table/index conditions. I know pgstattuple() reports table
statistics, however, pgstattuple() generates heavy CPU and I/O load.When we need to sample some pages from table/index, we need to know
true number of blocks.I have another function, called pgstatpage(), to get information inside
a single block/page statistics of the table. pg_relpages() will be used
with this.Sorry for not mentioned in previous post about pgstatpage(),
but I've remembered about it just now.Many memories in my brain have already `paged-out` (too busy in last few months),
and some of them got `out-of-memory`. :^)Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Attachments:
pgstattuple.difftext/plain; name=pgstattuple.diffDownload
diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
--- pgstattuple.orig/Makefile 2006-02-27 21:54:40.000000000 +0900
+++ pgstattuple/Makefile 2006-08-14 09:28:58.000000000 +0900
@@ -6,7 +6,7 @@
#
#-------------------------------------------------------------------------
-SRCS = pgstattuple.c
+SRCS = pgstattuple.c pgstatindex.c
MODULE_big = pgstattuple
OBJS = $(SRCS:.c=.o)
diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
--- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/pgstatindex.c 2006-08-14 11:24:23.000000000 +0900
@@ -0,0 +1,706 @@
+/*
+ * pgstatindex
+ *
+ * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
+ * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "access/heapam.h"
+#include "access/itup.h"
+#include "access/nbtree.h"
+#include "access/transam.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+
+PG_FUNCTION_INFO_V1(pgstatindex);
+PG_FUNCTION_INFO_V1(bt_metap);
+PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_stats);
+PG_FUNCTION_INFO_V1(pg_relpages);
+
+extern Datum pgstatindex(PG_FUNCTION_ARGS);
+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);
+extern Datum pg_relpages(PG_FUNCTION_ARGS);
+
+#define PGSTATINDEX_TYPE "public.pgstatindex_type"
+#define PGSTATINDEX_NCOLUMNS 10
+
+#define BTMETAP_TYPE "public.bt_metap_type"
+#define BTMETAP_NCOLUMNS 6
+
+#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
+#define BTPAGEITEMS_NCOLUMNS 6
+
+#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
+#define BTPAGESTATS_NCOLUMNS 11
+
+
+#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
+#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
+
+#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
+ if ( !(FirstOffsetNumber<=(offset) && \
+ (offset)<=PageGetMaxOffsetNumber(page)) ) \
+ elog(ERROR, "Page offset number out of range."); }
+
+#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
+ if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
+ elog(ERROR, "Block number out of range."); }
+
+/* ------------------------------------------------
+ * structure for single btree page statistics
+ * ------------------------------------------------
+ */
+typedef struct BTPageStat
+{
+ uint32 blkno;
+ uint32 live_items;
+ uint32 dead_items;
+ uint32 page_size;
+ uint32 max_avail;
+ uint32 free_size;
+ uint32 avg_item_size;
+ uint32 fragments;
+ char type;
+
+ /* opaque data */
+ BlockNumber btpo_prev;
+ BlockNumber btpo_next;
+ union
+ {
+ uint32 level;
+ TransactionId xact;
+ } btpo;
+ uint16 btpo_flags;
+ BTCycleId btpo_cycleid;
+} BTPageStat;
+
+/* ------------------------------------------------
+ * A structure for a whole btree index statistics
+ * used by pgstatindex().
+ * ------------------------------------------------
+ */
+typedef struct BTIndexStat
+{
+ uint32 magic;
+ uint32 version;
+ BlockNumber root_blkno;
+ uint32 level;
+
+ BlockNumber fastroot;
+ uint32 fastlevel;
+
+ uint32 live_items;
+ uint32 dead_items;
+
+ uint32 root_pages;
+ uint32 internal_pages;
+ uint32 leaf_pages;
+ uint32 empty_pages;
+ uint32 deleted_pages;
+
+ uint32 page_size;
+ uint32 avg_item_size;
+
+ uint32 max_avail;
+ uint32 free_space;
+
+ uint32 fragments;
+} BTIndexStat;
+
+/* -------------------------------------------------
+ * GetBTPageStatistics()
+ *
+ * Collect statistics of single b-tree leaf page
+ * -------------------------------------------------
+ */
+static bool
+GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat)
+{
+ Page page = BufferGetPage(buffer);
+ PageHeader phdr = (PageHeader) page;
+ OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+ BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+ int item_size = 0;
+ int off;
+
+ stat->blkno = blkno;
+
+ stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData);
+
+ stat->dead_items = stat->live_items = 0;
+
+ stat->page_size = PageGetPageSize(page);
+
+ /* page type (flags) */
+ if (P_ISDELETED(opaque))
+ {
+ stat->type = 'd';
+ return true;
+ }
+ else if (P_IGNORE(opaque))
+ stat->type = 'e';
+ else if (P_ISLEAF(opaque))
+ stat->type = 'l';
+ else if (P_ISROOT(opaque))
+ stat->type = 'r';
+ else
+ stat->type = 'i';
+
+ /* btpage opaque data */
+ stat->btpo_prev = opaque->btpo_prev;
+ stat->btpo_next = opaque->btpo_next;
+ if (P_ISDELETED(opaque))
+ stat->btpo.xact = opaque->btpo.xact;
+ else
+ stat->btpo.level = opaque->btpo.level;
+ stat->btpo_flags = opaque->btpo_flags;
+ stat->btpo_cycleid = opaque->btpo_cycleid;
+
+ /*----------------------------------------------
+ * If a next leaf is on the previous block,
+ * it means a fragmentation.
+ *----------------------------------------------
+ */
+ stat->fragments = 0;
+ if (stat->type == 'l')
+ {
+ if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
+ stat->fragments++;
+ }
+
+ /* count live and dead tuples, and free space */
+ for (off = FirstOffsetNumber; off <= maxoff; off++)
+ {
+ IndexTuple itup;
+
+ ItemId id = PageGetItemId(page, off);
+
+ itup = (IndexTuple) PageGetItem(page, id);
+
+ item_size += IndexTupleSize(itup);
+
+ if (!ItemIdDeleted(id))
+ stat->live_items++;
+ else
+ stat->dead_items++;
+ }
+ stat->free_size = PageGetFreeSpace(page);
+
+ if ((stat->live_items + stat->dead_items) > 0)
+ stat->avg_item_size = item_size / (stat->live_items + stat->dead_items);
+ else
+ stat->avg_item_size = 0;
+
+ return true;
+}
+
+
+/* ------------------------------------------------------
+ * pgstatindex()
+ *
+ * Usage: SELECT * FROM pgstatindex('t1_pkey');
+ * ------------------------------------------------------
+ */
+Datum
+pgstatindex(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ Relation rel;
+ RangeVar *relrv;
+ Datum result;
+ uint32 nblocks;
+ uint32 blkno;
+ BTIndexStat indexStat;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ if (!IS_INDEX(rel) || !IS_BTREE(rel))
+ elog(ERROR, "pgstatindex() can be used only on b-tree index.");
+
+ /*-------------------
+ * Read a metapage
+ *-------------------
+ */
+ {
+ Buffer buffer = ReadBuffer(rel, 0);
+ Page page = BufferGetPage(buffer);
+ BTMetaPageData *metad = BTPageGetMeta(page);
+
+ indexStat.magic = metad->btm_magic;
+ indexStat.version = metad->btm_version;
+ indexStat.root_blkno = metad->btm_root;
+ indexStat.level = metad->btm_level;
+ indexStat.fastroot = metad->btm_fastroot;
+ indexStat.fastlevel = metad->btm_fastlevel;
+
+ ReleaseBuffer(buffer);
+ }
+
+ nblocks = RelationGetNumberOfBlocks(rel);
+
+ /* -- init stat -- */
+ indexStat.fragments = 0;
+
+ indexStat.root_pages = 0;
+ indexStat.leaf_pages = 0;
+ indexStat.internal_pages = 0;
+ indexStat.empty_pages = 0;
+ indexStat.deleted_pages = 0;
+
+ indexStat.max_avail = 0;
+ indexStat.free_space = 0;
+
+ /*-----------------------
+ * Scan all blocks
+ *-----------------------
+ */
+ for (blkno = 1; blkno < nblocks; blkno++)
+ {
+ Buffer buffer = ReadBuffer(rel, blkno);
+ BTPageStat stat;
+
+ /* scan one page */
+ stat.blkno = blkno;
+ GetBTPageStatistics(blkno, buffer, &stat);
+
+ /*---------------------
+ * page status (type)
+ *---------------------
+ */
+ switch (stat.type)
+ {
+ case 'd':
+ indexStat.deleted_pages++;
+ break;
+ case 'l':
+ indexStat.leaf_pages++;
+ break;
+ case 'i':
+ indexStat.internal_pages++;
+ break;
+ case 'e':
+ indexStat.empty_pages++;
+ break;
+ case 'r':
+ indexStat.root_pages++;
+ break;
+ default:
+ elog(ERROR, "unknown page status.");
+ }
+
+ /* -- leaf fragmentation -- */
+ indexStat.fragments += stat.fragments;
+
+ if (stat.type == 'l')
+ {
+ indexStat.max_avail += stat.max_avail;
+ indexStat.free_space += stat.free_size;
+ }
+
+ ReleaseBuffer(buffer);
+ }
+
+ relation_close(rel, AccessShareLock);
+
+ /*----------------------------
+ * Build a result tuple
+ *----------------------------
+ */
+ {
+ TupleDesc tupleDesc;
+ int j;
+ char *values[PGSTATINDEX_NCOLUMNS];
+
+ HeapTupleData tupleData;
+ HeapTuple tuple = &tupleData;
+
+ tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
+
+ j = 0;
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.version);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.level);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", (indexStat.root_pages +
+ indexStat.leaf_pages +
+ indexStat.internal_pages +
+ indexStat.deleted_pages +
+ indexStat.empty_pages) * BLCKSZ);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.root_blkno);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.internal_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.empty_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0);
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+ values);
+
+ result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+ }
+
+ PG_RETURN_DATUM(result);
+}
+
+/* -----------------------------------------------
+ * bt_page()
+ *
+ * Usage: SELECT * FROM bt_page('t1_pkey', 0);
+ * -----------------------------------------------
+ */
+Datum
+bt_page_stats(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ uint32 blkno = PG_GETARG_UINT32(1);
+ Buffer buffer;
+
+ Relation rel;
+ RangeVar *relrv;
+ Datum result;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ CHECK_RELATION_BLOCK_RANGE(rel, blkno);
+
+ buffer = ReadBuffer(rel, blkno);
+
+ if (!IS_INDEX(rel) || !IS_BTREE(rel))
+ elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
+
+ if (blkno == 0)
+ elog(ERROR, "Block 0 is a meta page.");
+
+ {
+ HeapTuple tuple;
+ TupleDesc tupleDesc;
+ int j;
+ char *values[BTPAGESTATS_NCOLUMNS];
+
+ BTPageStat stat;
+
+ GetBTPageStatistics(blkno, buffer, &stat);
+
+ tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
+
+ j = 0;
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.blkno);
+
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", stat.type);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.live_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.dead_items);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.avg_item_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.page_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.free_size);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_prev);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_next);
+
+ values[j] = palloc(32);
+ if (stat.type == 'd')
+ snprintf(values[j++], 32, "%d", stat.btpo.xact);
+ else
+ snprintf(values[j++], 32, "%d", stat.btpo.level);
+
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", stat.btpo_flags);
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+ values);
+
+ result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+ }
+
+ ReleaseBuffer(buffer);
+
+ relation_close(rel, AccessShareLock);
+
+ PG_RETURN_DATUM(result);
+}
+
+/*-------------------------------------------------------
+ * bt_page_items()
+ *
+ * Get IndexTupleData set in a leaf page
+ *
+ * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
+ *-------------------------------------------------------
+ */
+/* ---------------------------------------------------
+ * data structure for SRF to hold a scan information
+ * ---------------------------------------------------
+ */
+struct user_args
+{
+ TupleDesc tupd;
+ Relation rel;
+ Buffer buffer;
+ Page page;
+ uint16 offset;
+};
+
+Datum
+bt_page_items(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ uint32 blkno = PG_GETARG_UINT32(1);
+
+ RangeVar *relrv;
+ Datum result;
+ char *values[BTPAGEITEMS_NCOLUMNS];
+ BTPageOpaque opaque;
+ HeapTuple tuple;
+ ItemId id;
+
+ FuncCallContext *fctx;
+ MemoryContext mctx;
+ struct user_args *uargs = NULL;
+
+ if (blkno == 0)
+ elog(ERROR, "Block 0 is a meta page.");
+
+ if (SRF_IS_FIRSTCALL())
+ {
+ fctx = SRF_FIRSTCALL_INIT();
+ mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+ uargs = palloc(sizeof(struct user_args));
+
+ uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
+ uargs->offset = FirstOffsetNumber;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ uargs->rel = relation_openrv(relrv, AccessShareLock);
+
+ CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno);
+
+ uargs->buffer = ReadBuffer(uargs->rel, blkno);
+
+ if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel))
+ elog(ERROR, "bt_page_items() can be used only on b-tree index.");
+
+ uargs->page = BufferGetPage(uargs->buffer);
+
+ opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+
+ if (P_ISDELETED(opaque))
+ elog(NOTICE, "bt_page_items(): this page is deleted.");
+
+ fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+ fctx->user_fctx = uargs;
+
+ MemoryContextSwitchTo(mctx);
+ }
+
+ fctx = SRF_PERCALL_SETUP();
+ uargs = fctx->user_fctx;
+
+ if (fctx->call_cntr < fctx->max_calls)
+ {
+ IndexTuple itup;
+
+ id = PageGetItemId(uargs->page, uargs->offset);
+
+ if (!ItemIdIsValid(id))
+ elog(ERROR, "Invalid ItemId.");
+
+ itup = (IndexTuple) PageGetItem(uargs->page, id);
+
+ {
+ int j = 0;
+
+ BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
+
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", uargs->offset);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+ {
+ int off;
+ char *dump;
+ char *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+
+ dump = palloc(IndexTupleSize(itup) * 3);
+ memset(dump, 0, IndexTupleSize(itup) * 3);
+
+ for (off = 0;
+ off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+ off++)
+ {
+ if (dump[0] == '\0')
+ sprintf(dump, "%02x", *(ptr + off) & 0xff);
+ else
+ {
+ char buf[4];
+
+ sprintf(buf, " %02x", *(ptr + off) & 0xff);
+ strcat(dump, buf);
+ }
+ }
+ values[j] = dump;
+ }
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
+ result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
+ }
+
+ uargs->offset = uargs->offset + 1;
+
+ SRF_RETURN_NEXT(fctx, result);
+ }
+ else
+ {
+ ReleaseBuffer(uargs->buffer);
+ relation_close(uargs->rel, AccessShareLock);
+
+ SRF_RETURN_DONE(fctx);
+ }
+}
+
+
+/* ------------------------------------------------
+ * bt_metap()
+ *
+ * Get a btree meta-page information
+ *
+ * Usage: SELECT * FROM bt_metap('t1_pkey')
+ * ------------------------------------------------
+ */
+Datum
+bt_metap(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+ Buffer buffer;
+
+ Relation rel;
+ RangeVar *relrv;
+ Datum result;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ if (!IS_INDEX(rel) || !IS_BTREE(rel))
+ elog(ERROR, "bt_metap() can be used only on b-tree index.");
+
+ buffer = ReadBuffer(rel, 0);
+
+ {
+ BTMetaPageData *metad;
+
+ TupleDesc tupleDesc;
+ int j;
+ char *values[BTMETAP_NCOLUMNS];
+ HeapTuple tuple;
+
+ Page page = BufferGetPage(buffer);
+
+ metad = BTPageGetMeta(page);
+
+ tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
+
+ j = 0;
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_magic);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_version);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_root);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_level);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+ values[j] = palloc(32);
+ snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+ tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+ values);
+
+ result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+ }
+
+ ReleaseBuffer(buffer);
+
+ relation_close(rel, AccessShareLock);
+
+ PG_RETURN_DATUM(result);
+}
+
+/* --------------------------------------------------------
+ * pg_relpages()
+ *
+ * Get a number of pages of the table/index.
+ *
+ * Usage: SELECT pg_relpages('t1');
+ * SELECT pg_relpages('t1_pkey');
+ * --------------------------------------------------------
+ */
+Datum
+pg_relpages(PG_FUNCTION_ARGS)
+{
+ text *relname = PG_GETARG_TEXT_P(0);
+
+ Relation rel;
+ RangeVar *relrv;
+ int4 relpages;
+
+ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+ rel = relation_openrv(relrv, AccessShareLock);
+
+ relpages = RelationGetNumberOfBlocks(rel);
+
+ relation_close(rel, AccessShareLock);
+
+ PG_RETURN_INT32(relpages);
+}
diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
--- pgstattuple.orig/pgstattuple.sql.in 2006-02-28 01:09:50.000000000 +0900
+++ pgstattuple/pgstattuple.sql.in 2006-08-14 10:37:32.000000000 +0900
@@ -22,3 +22,96 @@
RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;
+
+--
+-- pgstatindex
+--
+DROP TYPE pgstatindex_type CASCADE;
+CREATE TYPE pgstatindex_type AS (
+ version int4,
+ tree_level int4,
+ index_size int4,
+ root_block_no int4,
+ internal_pages int4,
+ leaf_pages int4,
+ empty_pages int4,
+ deleted_pages int4,
+ avg_leaf_density float8,
+ leaf_fragmentation float8
+);
+
+CREATE OR REPLACE FUNCTION pgstatindex(text)
+RETURNS pgstatindex_type
+AS 'MODULE_PATHNAME', 'pgstatindex'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_metap()
+--
+DROP TYPE bt_metap_type CASCADE;
+CREATE TYPE bt_metap_type AS (
+ magic int4,
+ version int4,
+ root int4,
+ level int4,
+ fastroot int4,
+ fastlevel int4
+);
+
+CREATE OR REPLACE FUNCTION bt_metap(text)
+RETURNS bt_metap_type
+AS 'MODULE_PATHNAME', 'bt_metap'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_items()
+--
+DROP TYPE bt_page_items_type CASCADE;
+CREATE TYPE bt_page_items_type AS (
+ itemoffset int4,
+ ctid tid,
+ itemlen int4,
+ nulls bool,
+ vars bool,
+ data text
+);
+
+DROP FUNCTION bt_page_items(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
+RETURNS SETOF bt_page_items_type
+AS 'MODULE_PATHNAME', 'bt_page_items'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_stats()
+--
+DROP TYPE bt_page_stats_type CASCADE;
+CREATE TYPE bt_page_stats_type AS (
+ blkno int4,
+ type char,
+ live_items int4,
+ dead_items int4,
+ avg_item_size float,
+ page_size int4,
+ free_size int4,
+ btpo_prev int4,
+ btpo_next int4,
+ btpo int4,
+ btpo_flags int4
+);
+
+DROP FUNCTION bt_page_stats(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
+RETURNS bt_page_stats_type
+AS 'MODULE_PATHNAME', 'bt_page_stats'
+LANGUAGE 'C' STRICT;
+
+--
+-- pg_relpages()
+--
+CREATE OR REPLACE FUNCTION pg_relpages(text)
+RETURNS int
+AS 'MODULE_PATHNAME', 'pg_relpages'
+LANGUAGE 'C' STRICT;
diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
--- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/test.sh 2006-08-14 10:23:08.000000000 +0900
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+PGHOME=/home/snaga/pgsql20060814
+export PATH=${PGHOME}/bin:$PATH
+
+psql -p 9999 pgbench<<EOF
+DROP FUNCTION pgstatindex(text);
+EOF
+
+psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench
+
+psql -p 9999 pgbench<<EOF
+SELECT * FROM pg_relpages('accounts_pkey');
+\x
+SELECT * FROM pgstatindex('accounts_pkey');
+SELECT * FROM bt_metap('accounts_pkey');
+\x
+SELECT * FROM bt_page_items('accounts_pkey', 0);
+SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
+
+SELECT * FROM bt_page_stats('accounts_pkey', 1);
+\x
+SELECT * FROM bt_page_stats('accounts_pkey', 1);
+SELECT * FROM bt_page_stats('accounts_pkey', 361);
+\x
+SELECT * FROM bt_page_items('accounts_pkey', 361);
+EOF
Satoshi Nagayasu wrote:
Bruce,
Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.
I was thinking, isn't it a lot cleaner to define the functions to use
OUT parameters instead of having to define a custom type for each?
Also, in 8.2 there is a uninstall SQL script -- ISTM you should put the
DROP commands there, not in the install script.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
I was thinking, isn't it a lot cleaner to define the functions to use
OUT parameters instead of having to define a custom type for each?
Not really --- it's a bit less notation maybe, but if he's got it
written like that already, I see no need to change it.
Also, in 8.2 there is a uninstall SQL script -- ISTM you should put the
DROP commands there, not in the install script.
Agreed.
regards, tom lane
Hi Nagayasu san and folks,
I have a question about pgstatindex.
Satoshi Nagayasu <nagayasus@nttdata.co.jp> wrote:
Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.
Index leaf pages are ordered just after REINDEX.
[1]: [3] [5] [2] [4] [6] because new pages are allocated at the end of the index file.
After full-split, they will be the following:
[1]: [3] [5] [2] [4] [6] because new pages are allocated at the end of the index file.
because new pages are allocated at the end of the index file.
I think this condition should be regarded as full-fragmented,
but pgstatindex reports that the leaf_fragmentation is 50%.
Presently, fragmentation factor is computed as the code below:
if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
stat->fragments++;
But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.
if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
stat->fragments++;
Do you think which method is better? Or do you have other ideas?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
I think this condition should be regarded as full-fragmented,
but pgstatindex reports that the leaf_fragmentation is 50%.Presently, fragmentation factor is computed as the code below:
if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
stat->fragments++;But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
stat->fragments++;Do you think which method is better? Or do you have other ideas?
If we do it your way, then every index will probably get a
fragmentation of nearly 100%. That's not very useful. I don't agree
that your example is fully fragmented, since on the first read the OS
will read the next four (or more) blocks so all the others are
zero-cost.
A more useful definition of fragmentation would be: if you're scanning
forward through an index, how often do you have to jump backwards
again. The current calculation seems to get that fairly right...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout wrote:
On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
I think this condition should be regarded as full-fragmented,
but pgstatindex reports that the leaf_fragmentation is 50%.Presently, fragmentation factor is computed as the code below:
if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno)
stat->fragments++;But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
stat->fragments++;Do you think which method is better? Or do you have other ideas?
If we do it your way, then every index will probably get a
fragmentation of nearly 100%. That's not very useful. I don't agree
that your example is fully fragmented, since on the first read the OS
will read the next four (or more) blocks so all the others are
zero-cost.
Ok, fine... expand the example out to an index that's not trivial in
size. Even with read-ahead, once you get to a few megs (which is
obviously not that big), you're seeking.
A more useful definition of fragmentation would be: if you're scanning
forward through an index, how often do you have to jump backwards
again. The current calculation seems to get that fairly right...
Well, what really matters is how often you'll need to seek (either
intra-track or inter-track). Any time you need to seek you're hit with a
pretty serious penalty. And until we have an asyncronous prefetch
process, a forward seek will most likely be just as expensive as a
backwards seek, because by the time the data winds it's way from the
drive back to PostgreSQL and the next read request winds it's way back
down to the drive the data you wanted has probably flown past the head.
Granted, I'm ignoring OS read-ahead here, but in a heavily fragmented
index that you're actually reading off disk (ie: it's not trivially
small), that read-ahead isn't likely to help you too much.
Given all that, I'd argue that it's best to consider any page that isn't
in-order as another fragment.
On another note, now that scans happen at a per-page level, does that
make some kind of online index clustering command a possibility? Another
thought that comes to mind is putting enough brains in the indexes or
the FSM to request free pages that are in a specific region of the index
file. That would allow things to stay less fragmented. Of course a
similar method could be used to try and maintain a table heap in cluster
order, and I suspect that method would probably be a lot easier to
impliment.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Aug 17, 2006 at 02:54:20PM -0500, Jim C. Nasby wrote:
On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout wrote:
On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
stat->fragments++;Do you think which method is better? Or do you have other ideas?
Ok, fine... expand the example out to an index that's not trivial in
size. Even with read-ahead, once you get to a few megs (which is
obviously not that big), you're seeking.
Well, mostly I'm just saying that only matching on the next block
number is going to give unrealistically low numbers. We can't ignore OS
level caching, the way Postgres works relies on it in many ways.
I'd suggest something like: btpo_next between blkno and blkno + X, to
try to take into account caching. But I'm not sure that will give
numbers significantly different from what is already generated.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
ITAGAKI Takahiro wrote:
But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
stat->fragments++;
Well, in that way, following two conditions,
[1]: [x] [y] [z] [2] and
and
[3]: [x] [2] [y] [1] will be calculated as same fragmentation ratio(100%), I can't agree with that, because both will generate different costs while index scan in the real world (I don't care about page splitting algorithm now).
will be calculated as same fragmentation ratio(100%), I can't agree
with that, because both will generate different costs while index scan
in the real world (I don't care about page splitting algorithm now).
If we think 'fragmentation' more strictly, the fragmentation ratio
should be calculated with 'distance' and 'direction' of the block
ordering and positions, because
[1]: [x] [y] [z] [2] and
and
[2]: [x] [y] [1] [z] have different costs each.
have different costs each.
However, in such way, if I get '57.6%' as a fragmentation radio,
what does it mean? What can I do next? Two cases (forward ordered blocks
with some gaps, and backward ordered blocks with some gaps) are clearly
different, but will result same radios.
Understanding and estimating real cost of the index scan is difficult.
So I want to think 'fragmentation radio' simply,
"How many backward seeks will occur while your index scan?".
I guess, in some cases, people will want to know more detailed information,
but most people need a tool which is easy to use and easy to understand.
And I believe present calculation is good enough.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Satoshi Nagayasu <nagayasus@nttdata.co.jp> wrote:
Well, in that way, following two conditions,
[1] [x] [2] [y] [3]
and
[3] [x] [2] [y] [1]
will be calculated as same fragmentation ratio(100%), I can't agree
with that, because both will generate different costs while index scan
in the real world (I don't care about page splitting algorithm now).
I think the calculations (100%) are appropriate, because we should do
REINDEX in both case. Supposing to the sizes of [x], [y] are mega or giga
bytes, the order is not important; we have to do large seeks in both case.
In addition, the latter case rarely happens in real world, isn't it?
However, in such way, if I get '57.6%' as a fragmentation radio,
what does it mean? What can I do next?
I think the information of fragmentations are probably not
the most important; the information users want to know are
"When to do REINDEX?" and "How to set the fillfactor?".
I hope you to write how to interpret the framgentation (and other) info
in README. In my understanding, I'll write "You'd better do REINDEX when
you see the fragmentation is greater than 50%" under the present
calculation method.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
ITAGAKI Takahiro wrote:
Satoshi Nagayasu <nagayasus@nttdata.co.jp> wrote:
Well, in that way, following two conditions,
[1] [x] [2] [y] [3]
and
[3] [x] [2] [y] [1]
will be calculated as same fragmentation ratio(100%), I can't agree
with that, because both will generate different costs while index scan
in the real world (I don't care about page splitting algorithm now).I think the calculations (100%) are appropriate, because we should do
REINDEX in both case. Supposing to the sizes of [x], [y] are mega or giga
bytes, the order is not important; we have to do large seeks in both case.
I don't think so. A few blocks forward skip while scan can be reasonable
and acceptable (of course, it's case by case).
BTW, What does 'large seeks' mean? Seeking a few blocks, hundred of blocks
and millions of blocks are not same, I think. Are they same for you?
However, in such way, if I get '57.6%' as a fragmentation radio,
what does it mean? What can I do next?I think the information of fragmentations are probably not
the most important; the information users want to know are
"When to do REINDEX?" and "How to set the fillfactor?".
Agreed.
So, I'm just counting backward seeks simply for the fragmentation ratio.
It means 'the mismatch radio between logical order and physical order
of the blocks'.
I hope you to write how to interpret the framgentation (and other) info
in README. In my understanding, I'll write "You'd better do REINDEX when
you see the fragmentation is greater than 50%" under the present
calculation method.
I can't understand why you want to make such decision, because you're
thinking the fragmentation information is not the most important for
the users, aren't you?
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Satoshi Nagayasu <nagayasus@nttdata.co.jp> wrote:
I hope you to write how to interpret the framgentation (and other) info
in README. In my understanding, I'll write "You'd better do REINDEX when
you see the fragmentation is greater than 50%" under the present
calculation method.I can't understand why you want to make such decision, because you're
thinking the fragmentation information is not the most important for
the users, aren't you?
Suppose a simple update case, for example, the accounts table in pgbench.
The default fillfactor of btree indexes is 90%, so the leaf pages are
fully split after we update 10-20% of tuples. But pgstatindex reports
the fragmentation is 50% in such condition, but I think we should do
REINDEX then. My decision came from this.
The setting fillfactor=50% is better than the case with high fillfactor
but all pages have split once, even if sizes of the indexes are same.
I worry that users will misunderstand the 50% of fragmentation -- if the
report says 100%, they'll consider to do REINDEX. But 50%, the necessity
is unclear.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
ITAGAKI Takahiro wrote:
Suppose a simple update case, for example, the accounts table in pgbench.
The default fillfactor of btree indexes is 90%, so the leaf pages are
fully split after we update 10-20% of tuples. But pgstatindex reports
the fragmentation is 50% in such condition, but I think we should do
REINDEX then. My decision came from this.The setting fillfactor=50% is better than the case with high fillfactor
but all pages have split once, even if sizes of the indexes are same.
I worry that users will misunderstand the 50% of fragmentation -- if the
report says 100%, they'll consider to do REINDEX. But 50%, the necessity
is unclear.
I think you should use 'average of page density' and 'number of leaf pages'
in such case. It is more useful to know filling condition of the leaves.
I've observed both while running pgbench, and the result is coming with
the WEB+DB PRESS magazine in next Wednesday. :)
Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
On Aug 17, 2006, at 4:10 PM, Martijn van Oosterhout wrote:
On Thu, Aug 17, 2006 at 02:54:20PM -0500, Jim C. Nasby wrote:
On Thu, Aug 17, 2006 at 02:23:48PM +0200, Martijn van Oosterhout
wrote:On Thu, Aug 17, 2006 at 12:55:28PM +0900, ITAGAKI Takahiro wrote:
But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.if (opaque->btpo_next != P_NONE && opaque->btpo_next !=
blkno + 1)
stat->fragments++;Do you think which method is better? Or do you have other ideas?
Ok, fine... expand the example out to an index that's not trivial in
size. Even with read-ahead, once you get to a few megs (which is
obviously not that big), you're seeking.Well, mostly I'm just saying that only matching on the next block
number is going to give unrealistically low numbers. We can't
ignore OS
level caching, the way Postgres works relies on it in many ways.
While I agree that *users* must take caching into account, I don't
think we should be fudging fragmentation numbers. For starters, we
have absolutely no idea how much caching is actually happening.
We should just report the raw numbers and let users draw their own
conclusions. Doing otherwise makes the stat far less useful.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
your patch for CVS HEAD. Thanks.
---------------------------------------------------------------------------
Satoshi Nagayasu wrote:
Hi all,
Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.Satoshi Nagayasu wrote:
Bruce,
I'll fix it in this week. Please wait a few days.
Thanks.Bruce Momjian wrote:
nagayasu-san,
This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.---------------------------------------------------------------------------
satoshi nagayasu wrote:
Hi folks,
As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.So please try it, send comment to me, and have fun.
Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------[ application/x-gzip is not supported, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile --- pgstattuple.orig/Makefile 2006-08-10 19:22:47.000000000 +0900 +++ pgstattuple/Makefile 2006-08-10 19:24:05.000000000 +0900 @@ -6,7 +6,7 @@ # #--------------------------------------------------------------------------SRCS = pgstattuple.c +SRCS = pgstattuple.c pgstatindex.cMODULE_big = pgstattuple OBJS = $(SRCS:.c=.o) diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/pgstatindex.c 2006-08-11 17:51:26.000000000 +0900 @@ -0,0 +1,714 @@ +/* + * pgstatindex + * + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp> + * + * Permission to use, copy, modify, and distribute this software and + * its documentation for any purpose, without fee, and without a + * written agreement is hereby granted, provided that the above + * copyright notice and this paragraph and the following two + * paragraphs appear in all copies. + * + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. + */ + +#include "postgres.h" + +#include "fmgr.h" +#include "funcapi.h" +#include "access/heapam.h" +#include "access/itup.h" +#include "access/nbtree.h" +#include "access/transam.h" +#include "catalog/namespace.h" +#include "catalog/pg_type.h" +#include "utils/builtins.h" +#include "utils/inval.h" + +PG_FUNCTION_INFO_V1(pgstatindex); +PG_FUNCTION_INFO_V1(bt_metap); +PG_FUNCTION_INFO_V1(bt_page_items); +PG_FUNCTION_INFO_V1(bt_page_stats); +PG_FUNCTION_INFO_V1(pg_relpages); + +extern Datum pgstatindex(PG_FUNCTION_ARGS); +extern Datum bt_metap(PG_FUNCTION_ARGS); +extern Datum bt_page_items(PG_FUNCTION_ARGS); +extern Datum bt_page_stats(PG_FUNCTION_ARGS); +extern Datum pg_relpages(PG_FUNCTION_ARGS); + +#define PGSTATINDEX_TYPE "public.pgstatindex_type" +#define PGSTATINDEX_NCOLUMNS 10 + +#define BTMETAP_TYPE "public.bt_metap_type" +#define BTMETAP_NCOLUMNS 6 + +#define BTPAGEITEMS_TYPE "public.bt_page_items_type" +#define BTPAGEITEMS_NCOLUMNS 6 + +#define BTPAGESTATS_TYPE "public.bt_page_stats_type" +#define BTPAGESTATS_NCOLUMNS 12 + + +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i') +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID) + +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \ + if ( !(FirstOffsetNumber<=(offset) && \ + (offset)<=PageGetMaxOffsetNumber(page)) ) \ + elog(ERROR, "Page offset number out of range."); } + +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \ + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \ + elog(ERROR, "Block number out of range."); } + +/* ------------------------------------------------ + * structure for single btree page statistics + * ------------------------------------------------ + */ +typedef struct BTPageStat { + uint32 blkno; + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 max_avail; + uint32 free_size; + uint32 avg_item_size; + uint32 fragments; + bool is_root; + bool is_internal; + bool is_leaf; + bool is_deleted; + bool is_empty; +} BTPageStat; + +/* ------------------------------------------------ + * A structure for a whole btree index statistics + * used by pgstatindex(). + * ------------------------------------------------ + */ +typedef struct BTIndexStat { + uint32 version; + BlockNumber root_blkno; + uint32 level; + + uint32 live_items; + uint32 dead_items; + + uint32 root_pages; + uint32 internal_pages; + uint32 leaf_pages; + uint32 empty_pages; + uint32 deleted_pages; + + uint32 page_size; + uint32 avg_item_size; + + uint32 max_avail; + uint32 free_space; + + uint32 fragments; +} BTIndexStat; + +/* ------------------------------------------------- + * GetBTPageStatistics() + * + * Collect statistics of single b-tree leaf page + * ------------------------------------------------- + */ +static bool +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat) +{ + Page page = BufferGetPage(buffer); + PageHeader phdr = (PageHeader) page; + OffsetNumber maxoff = PageGetMaxOffsetNumber(page); + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); + int item_size = 0; + int off; + + stat->blkno = blkno; + + stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData ); + + stat->dead_items = stat->live_items = 0; + + stat->page_size = PageGetPageSize(page); + + /* page type */ + stat->is_root = false; + stat->is_leaf = false; + stat->is_deleted = false; + stat->is_empty = false; + + if ( P_ISDELETED(opaque) ) + { + stat->is_deleted = true; + return true; + } + else if ( P_IGNORE(opaque) ) + stat->is_empty = true; + else if ( P_ISLEAF(opaque) ) + stat->is_leaf = true; + else if ( P_ISROOT(opaque) ) + stat->is_root = true; + else + stat->is_internal = true; + + /*---------------------------------------------- + * If a next leaf is on the previous block, + * it means a fragmentation. + *---------------------------------------------- + */ + stat->fragments = 0; + if ( stat->is_leaf ) + { + if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno ) + stat->fragments++; + } + + /* count live and dead tuples, and free space */ + for (off=FirstOffsetNumber ; off<=maxoff ; off++) + { + BTItem btitem; + IndexTuple itup; + + ItemId id = PageGetItemId(page, off); + + btitem = (BTItem)PageGetItem(page, id); + + itup = &(btitem->bti_itup); + + item_size += IndexTupleSize(itup); + + if ( !ItemIdDeleted(id) ) + stat->live_items++; + else + stat->dead_items++; + } + stat->free_size = PageGetFreeSpace(page); + +#ifdef NOT_USED + elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail, + (float)stat->free_size/(float)stat->max_avail*100.0); +#endif + + if ( (stat->live_items + stat->dead_items) > 0 ) + stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items ); + else + stat->avg_item_size = 0; + + return true; +} + + +/* ------------------------------------------------------ + * pgstatindex() + * + * Usage: SELECT * FROM pgstatindex('t1_pkey'); + * ------------------------------------------------------ + */ +Datum +pgstatindex(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Relation rel; + RangeVar *relrv; + Datum result; + uint32 nblocks; + uint32 blkno; + BTIndexStat indexStat; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if ( !IS_INDEX(rel) || !IS_BTREE(rel) ) + elog(ERROR, "pgstatindex() can be used only on b-tree index."); + + /*------------------- + * Read a metapage + *------------------- + */ + { + Buffer buffer = ReadBuffer(rel, 0); + Page page = BufferGetPage(buffer); + BTMetaPageData *metad = BTPageGetMeta(page); + +// snprintf(values[j++], 32, "%d", metad->btm_magic); + indexStat.version = metad->btm_version; + indexStat.root_blkno = metad->btm_root; + indexStat.level = metad->btm_level; +// snprintf(values[j++], 32, "%d", metad->btm_fastroot); +// snprintf(values[j++], 32, "%d", metad->btm_fastlevel); + + ReleaseBuffer(buffer); + } + + nblocks = RelationGetNumberOfBlocks(rel); + + /* -- init stat -- */ + indexStat.fragments = 0; + + indexStat.root_pages = 0; + indexStat.leaf_pages = 0; + indexStat.internal_pages = 0; + indexStat.empty_pages = 0; + indexStat.deleted_pages = 0; + + indexStat.max_avail = 0; + indexStat.free_space = 0; + + /*----------------------- + * Scan all blocks + *----------------------- + */ + for (blkno=1 ; blkno<nblocks ; blkno++) + { + Buffer buffer = ReadBuffer(rel, blkno); + BTPageStat stat; + + /* scan one page */ + stat.blkno = blkno; + GetBTPageStatistics(blkno, buffer, &stat); + + /*--------------------- + * page status (type) + *--------------------- + */ + if ( stat.is_deleted ) + indexStat.deleted_pages++; + else if ( stat.is_leaf ) + indexStat.leaf_pages++; + else if ( stat.is_internal ) + indexStat.internal_pages++; + else if ( stat.is_empty ) + indexStat.empty_pages++; + else if ( stat.is_root ) + indexStat.root_pages++; + else + elog(ERROR, "unknown page status."); + + /* -- leaf fragmentation -- */ + indexStat.fragments += stat.fragments; + + if ( stat.is_leaf ) + { + indexStat.max_avail += stat.max_avail; + indexStat.free_space += stat.free_size; + } + + ReleaseBuffer(buffer); + } + + relation_close(rel, AccessShareLock); + +#ifdef NOT_USED + elog(NOTICE, "[index information]"); + elog(NOTICE, "version.................: %d", indexStat.version); + elog(NOTICE, "tree level..............: %d", indexStat.level); + elog(NOTICE, "index size..............: %d", (indexStat.root_pages + + indexStat.leaf_pages + + indexStat.internal_pages + + indexStat.deleted_pages + + indexStat.empty_pages) * BLCKSZ); + + elog(NOTICE, ""); + elog(NOTICE, "[page statistics]"); + elog(NOTICE, "root block number.......: %d", indexStat.root_blkno); + elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages); + elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages); + elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages); + elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages); + + elog(NOTICE, ""); + elog(NOTICE, "[leaf statistics]"); + elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 ); + elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages * 100.0 ); +#endif + + /*---------------------------- + * Build a result tuple + *---------------------------- + */ + { + TupleDesc tupleDesc; + int j; + char *values[PGSTATINDEX_NCOLUMNS]; + + HeapTupleData tupleData; + HeapTuple tuple = &tupleData; + + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", (indexStat.root_pages + + indexStat.leaf_pages + + indexStat.internal_pages + + indexStat.deleted_pages + + indexStat.empty_pages) * BLCKSZ ); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.root_blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.internal_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.leaf_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.empty_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.deleted_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 ); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 ); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + PG_RETURN_DATUM( result ); +} + +/* ----------------------------------------------- + * bt_page() + * + * Usage: SELECT * FROM bt_page('t1_pkey', 0); + * ----------------------------------------------- + */ +Datum +bt_page_stats(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(rel, blkno); + + buffer = ReadBuffer(rel, blkno); + + if ( !IS_INDEX(rel) || !IS_BTREE(rel) ) + elog(ERROR, "bt_page_stats() can be used only on b-tree index."); + + if ( blkno==0 ) + elog(ERROR, "Block 0 is a meta page."); + + { + HeapTuple tuple; + TupleDesc tupleDesc; + int j; + char *values[BTPAGESTATS_NCOLUMNS]; + + BTPageStat stat; + + GetBTPageStatistics(blkno, buffer, &stat); + + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.avg_item_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.page_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.free_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.is_deleted); + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo_prev); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo_next); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo.level); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + values[j] = palloc(32); +// snprintf(values[j++], 32, "%d", opaque->btpo_flags); + snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/ + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/*------------------------------------------------------- + * bt_page_items() + * + * Get IndexTupleData set in a leaf page + * + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0); + *------------------------------------------------------- + */ +/* --------------------------------------------------- + * data structure for SRF to hold a scan information + * --------------------------------------------------- + */ +struct user_args +{ + TupleDesc tupd; + Relation rel; + Buffer buffer; + Page page; + uint16 offset; +}; + +Datum +bt_page_items(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + + RangeVar *relrv; + Datum result; + char *values[BTPAGEITEMS_NCOLUMNS]; + BTPageOpaque opaque; + HeapTuple tuple; + ItemId id; + + FuncCallContext *fctx; + MemoryContext mctx; + struct user_args *uargs = NULL; + + if ( blkno==0 ) + elog(ERROR, "Block 0 is a meta page."); + + if ( SRF_IS_FIRSTCALL() ) + { + fctx = SRF_FIRSTCALL_INIT(); + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx); + + uargs = palloc(sizeof(struct user_args)); + + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE); + uargs->offset = FirstOffsetNumber; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + uargs->rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno); + + uargs->buffer = ReadBuffer(uargs->rel, blkno); + + if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) ) + elog(ERROR, "bt_page_items() can be used only on b-tree index."); + + uargs->page = BufferGetPage(uargs->buffer); + + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page); + + if ( P_ISDELETED(opaque) ) + elog(NOTICE, "bt_page_items(): this page is deleted."); + + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page); + fctx->user_fctx = uargs; + + MemoryContextSwitchTo(mctx); + } + + fctx = SRF_PERCALL_SETUP(); + uargs = fctx->user_fctx; + + if ( fctx->call_cntr < fctx->max_calls ) + { + BTItem btitem; + IndexTuple itup; + + id = PageGetItemId(uargs->page, uargs->offset); + + if ( !ItemIdIsValid(id) ) + elog(ERROR, "Invalid ItemId."); + + btitem = (BTItem)PageGetItem(uargs->page, id); + itup = &(btitem->bti_itup); + + { + int j = 0; + + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", uargs->offset); + values[j] = palloc(32); + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", IndexTupleSize(itup)); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f'); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f'); + + { + int off; + char *dump; + char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info); + + dump = palloc(IndexTupleSize(itup)*3); + memset(dump, 0, IndexTupleSize(itup)*3); + + for (off=0 ; + off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ; + off++) + { + if ( dump[0]=='\0' ) + sprintf(dump, "%02x", *(ptr+off) & 0xff); + else + { + char buf[4]; + sprintf(buf, " %02x", *(ptr+off) & 0xff); + strcat(dump, buf); + } + } + values[j] = dump; + } + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values); + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple); + } + + uargs->offset = uargs->offset + 1; + + SRF_RETURN_NEXT(fctx, result); + } + else + { + ReleaseBuffer(uargs->buffer); + relation_close(uargs->rel, AccessShareLock); + + SRF_RETURN_DONE(fctx); + } +} + + +/* ------------------------------------------------ + * bt_metap() + * + * Get a btree meta-page information + * + * Usage: SELECT * FROM bt_metap('t1_pkey') + * ------------------------------------------------ + */ +Datum +bt_metap(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if ( !IS_INDEX(rel) || !IS_BTREE(rel) ) + elog(ERROR, "bt_metap() can be used only on b-tree index."); + + buffer = ReadBuffer(rel, 0); + + { + BTMetaPageData *metad; + + TupleDesc tupleDesc; + int j; + char *values[BTMETAP_NCOLUMNS]; + HeapTuple tuple; + + Page page = BufferGetPage(buffer); + + metad = BTPageGetMeta(page); + + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_magic); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_root); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastroot); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastlevel); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/* -------------------------------------------------------- + * pg_relpages() + * + * Get a number of pages of the table/index. + * + * Usage: SELECT pg_relpages('t1'); + * SELECT pg_relpages('t1_pkey'); + * -------------------------------------------------------- + */ +Datum +pg_relpages(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + + Relation rel; + RangeVar *relrv; + int4 relpages; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + relpages = RelationGetNumberOfBlocks(rel); + + relation_close(rel, AccessShareLock); + + PG_RETURN_INT32(relpages); +} + diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in --- pgstattuple.orig/pgstattuple.sql.in 2006-08-10 19:22:47.000000000 +0900 +++ pgstattuple/pgstattuple.sql.in 2006-08-11 17:46:19.000000000 +0900 @@ -22,3 +22,97 @@ RETURNS pgstattuple_type AS 'MODULE_PATHNAME', 'pgstattuplebyid' LANGUAGE 'C' STRICT; + +-- +-- pgstatindex +-- +DROP TYPE pgstatindex_type CASCADE; +CREATE TYPE pgstatindex_type AS ( + version int4, + tree_level int4, + index_size int4, + root_block_no int4, + internal_pages int4, + leaf_pages int4, + empty_pages int4, + deleted_pages int4, + avg_leaf_density float8, + leaf_fragmentation float8 +); + +CREATE OR REPLACE FUNCTION pgstatindex(text) +RETURNS pgstatindex_type +AS 'MODULE_PATHNAME', 'pgstatindex' +LANGUAGE 'C' STRICT; + +-- +-- bt_metap() +-- +DROP TYPE bt_metap_type CASCADE; +CREATE TYPE bt_metap_type AS ( + magic int4, + version int4, + root int4, + level int4, + fastroot int4, + fastlevel int4 +); + +CREATE OR REPLACE FUNCTION bt_metap(text) +RETURNS bt_metap_type +AS 'MODULE_PATHNAME', 'bt_metap' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_items() +-- +DROP TYPE bt_page_items_type CASCADE; +CREATE TYPE bt_page_items_type AS ( + itemoffset int4, + ctid tid, + itemlen int4, + nulls bool, + vars bool, + data text +); + +DROP FUNCTION bt_page_items(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_items(text, int4) +RETURNS SETOF bt_page_items_type +AS 'MODULE_PATHNAME', 'bt_page_items' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_stats() +-- +DROP TYPE bt_page_stats_type CASCADE; +CREATE TYPE bt_page_stats_type AS ( + blkno int4, + live_items int4, + dead_items int4, + total_items int4, + avg_item_size float, + page_size int4, + free_size int4, + is_deleted int4, + btpo_prev int4, + btpo_next int4, + btpo_level int4, + btpo_flags int4 +); + +DROP FUNCTION bt_page_stats(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4) +RETURNS bt_page_stats_type +AS 'MODULE_PATHNAME', 'bt_page_stats' +LANGUAGE 'C' STRICT; + +-- +-- pg_relpages() +-- +CREATE OR REPLACE FUNCTION pg_relpages(text) +RETURNS int +AS 'MODULE_PATHNAME', 'pg_relpages' +LANGUAGE 'C' STRICT; diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/test.sh 2006-08-11 17:47:05.000000000 +0900 @@ -0,0 +1,22 @@ +#!/bin/sh + +export PATH=/usr/local/pgsql814/bin:$PATH + +psql pgbench<<EOF +DROP FUNCTION pgstatindex(text); +EOF + +psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench + +psql pgbench<<EOF +SELECT * FROM pg_relpages('accounts_pkey'); +\x +SELECT * FROM pgstatindex('accounts_pkey'); +SELECT * FROM bt_metap('accounts_pkey'); +\x +SELECT * FROM bt_page_items('accounts_pkey', 0); +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20; + +\x +SELECT * FROM bt_page_stats('accounts_pkey', 1); +EOF---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce,
Bruce Momjian wrote:
BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
your patch for CVS HEAD. Thanks.
I've posted CVS HEAD workable version on Aug.14.
Please check it out. Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Satoshi Nagayasu wrote:
Bruce,
Bruce Momjian wrote:
BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
your patch for CVS HEAD. Thanks.I've posted CVS HEAD workable version on Aug.14.
Please check it out. Thanks.
OK, I found it, but it has no updates to README.pgstattuple to describe
the new functionality. Should I write it?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Sorry, I'll write README (and uninstall.sql?) by tomorrow.
Bruce Momjian wrote:
Satoshi Nagayasu wrote:
Bruce,
Bruce Momjian wrote:
BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
your patch for CVS HEAD. Thanks.I've posted CVS HEAD workable version on Aug.14.
Please check it out. Thanks.OK, I found it, but it has no updates to README.pgstattuple to describe
the new functionality. Should I write it?
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Satoshi Nagayasu wrote:
Sorry, I'll write README (and uninstall.sql?) by tomorrow.
Thanks. Yea, you need to update the uninstall too.
---------------------------------------------------------------------------
Bruce Momjian wrote:
Satoshi Nagayasu wrote:
Bruce,
Bruce Momjian wrote:
BTIem is no longer in CVS HEAD, though it was in 8.1.X. Please update
your patch for CVS HEAD. Thanks.I've posted CVS HEAD workable version on Aug.14.
Please check it out. Thanks.OK, I found it, but it has no updates to README.pgstattuple to describe
the new functionality. Should I write it?--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Fri, Aug 18, 2006 at 09:15:59AM +0900, Satoshi Nagayasu wrote:
ITAGAKI Takahiro wrote:
But the method has the above problem. So I suggest to use whether
the right link points to the next adjacent page or not.if (opaque->btpo_next != P_NONE && opaque->btpo_next != blkno + 1)
stat->fragments++;Well, in that way, following two conditions,
[1] [x] [2] [y] [3]
and
[3] [x] [2] [y] [1]
will be calculated as same fragmentation ratio(100%), I can't agree
with that, because both will generate different costs while index scan
in the real world (I don't care about page splitting algorithm now).
What about just reporting the correlation of pages in the index, as well
as fragmentation?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Patch applied. Thanks.
I updated the README documentation for the new functions, attached. I
could not update the Japanese version of the README.
---------------------------------------------------------------------------
Satoshi Nagayasu wrote:
Bruce,
Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.Thanks.
Satoshi Nagayasu wrote:
Alvaro,
Alvaro Herrera wrote:
Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)While you're at it, please consider removing C++ style comments and
unused code.Formatting is way off as well, but I guess that is easily fixed with
pgindent.Thanks for comments. I'm going to fix my patch from now.
Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).I wanted to sample some pages from the table/index, and get their statistics
to know table/index conditions. I know pgstattuple() reports table
statistics, however, pgstattuple() generates heavy CPU and I/O load.When we need to sample some pages from table/index, we need to know
true number of blocks.I have another function, called pgstatpage(), to get information inside
a single block/page statistics of the table. pg_relpages() will be used
with this.Sorry for not mentioned in previous post about pgstatpage(),
but I've remembered about it just now.Many memories in my brain have already `paged-out` (too busy in last few months),
and some of them got `out-of-memory`. :^)Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile --- pgstattuple.orig/Makefile 2006-02-27 21:54:40.000000000 +0900 +++ pgstattuple/Makefile 2006-08-14 09:28:58.000000000 +0900 @@ -6,7 +6,7 @@ # #--------------------------------------------------------------------------SRCS = pgstattuple.c +SRCS = pgstattuple.c pgstatindex.cMODULE_big = pgstattuple OBJS = $(SRCS:.c=.o) diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/pgstatindex.c 2006-08-14 11:24:23.000000000 +0900 @@ -0,0 +1,706 @@ +/* + * pgstatindex + * + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp> + * + * Permission to use, copy, modify, and distribute this software and + * its documentation for any purpose, without fee, and without a + * written agreement is hereby granted, provided that the above + * copyright notice and this paragraph and the following two + * paragraphs appear in all copies. + * + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. + */ + +#include "postgres.h" + +#include "fmgr.h" +#include "funcapi.h" +#include "access/heapam.h" +#include "access/itup.h" +#include "access/nbtree.h" +#include "access/transam.h" +#include "catalog/namespace.h" +#include "catalog/pg_type.h" +#include "utils/builtins.h" +#include "utils/inval.h" + +PG_FUNCTION_INFO_V1(pgstatindex); +PG_FUNCTION_INFO_V1(bt_metap); +PG_FUNCTION_INFO_V1(bt_page_items); +PG_FUNCTION_INFO_V1(bt_page_stats); +PG_FUNCTION_INFO_V1(pg_relpages); + +extern Datum pgstatindex(PG_FUNCTION_ARGS); +extern Datum bt_metap(PG_FUNCTION_ARGS); +extern Datum bt_page_items(PG_FUNCTION_ARGS); +extern Datum bt_page_stats(PG_FUNCTION_ARGS); +extern Datum pg_relpages(PG_FUNCTION_ARGS); + +#define PGSTATINDEX_TYPE "public.pgstatindex_type" +#define PGSTATINDEX_NCOLUMNS 10 + +#define BTMETAP_TYPE "public.bt_metap_type" +#define BTMETAP_NCOLUMNS 6 + +#define BTPAGEITEMS_TYPE "public.bt_page_items_type" +#define BTPAGEITEMS_NCOLUMNS 6 + +#define BTPAGESTATS_TYPE "public.bt_page_stats_type" +#define BTPAGESTATS_NCOLUMNS 11 + + +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i') +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID) + +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \ + if ( !(FirstOffsetNumber<=(offset) && \ + (offset)<=PageGetMaxOffsetNumber(page)) ) \ + elog(ERROR, "Page offset number out of range."); } + +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \ + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \ + elog(ERROR, "Block number out of range."); } + +/* ------------------------------------------------ + * structure for single btree page statistics + * ------------------------------------------------ + */ +typedef struct BTPageStat +{ + uint32 blkno; + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 max_avail; + uint32 free_size; + uint32 avg_item_size; + uint32 fragments; + char type; + + /* opaque data */ + BlockNumber btpo_prev; + BlockNumber btpo_next; + union + { + uint32 level; + TransactionId xact; + } btpo; + uint16 btpo_flags; + BTCycleId btpo_cycleid; +} BTPageStat; + +/* ------------------------------------------------ + * A structure for a whole btree index statistics + * used by pgstatindex(). + * ------------------------------------------------ + */ +typedef struct BTIndexStat +{ + uint32 magic; + uint32 version; + BlockNumber root_blkno; + uint32 level; + + BlockNumber fastroot; + uint32 fastlevel; + + uint32 live_items; + uint32 dead_items; + + uint32 root_pages; + uint32 internal_pages; + uint32 leaf_pages; + uint32 empty_pages; + uint32 deleted_pages; + + uint32 page_size; + uint32 avg_item_size; + + uint32 max_avail; + uint32 free_space; + + uint32 fragments; +} BTIndexStat; + +/* ------------------------------------------------- + * GetBTPageStatistics() + * + * Collect statistics of single b-tree leaf page + * ------------------------------------------------- + */ +static bool +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat) +{ + Page page = BufferGetPage(buffer); + PageHeader phdr = (PageHeader) page; + OffsetNumber maxoff = PageGetMaxOffsetNumber(page); + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); + int item_size = 0; + int off; + + stat->blkno = blkno; + + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData); + + stat->dead_items = stat->live_items = 0; + + stat->page_size = PageGetPageSize(page); + + /* page type (flags) */ + if (P_ISDELETED(opaque)) + { + stat->type = 'd'; + return true; + } + else if (P_IGNORE(opaque)) + stat->type = 'e'; + else if (P_ISLEAF(opaque)) + stat->type = 'l'; + else if (P_ISROOT(opaque)) + stat->type = 'r'; + else + stat->type = 'i'; + + /* btpage opaque data */ + stat->btpo_prev = opaque->btpo_prev; + stat->btpo_next = opaque->btpo_next; + if (P_ISDELETED(opaque)) + stat->btpo.xact = opaque->btpo.xact; + else + stat->btpo.level = opaque->btpo.level; + stat->btpo_flags = opaque->btpo_flags; + stat->btpo_cycleid = opaque->btpo_cycleid; + + /*---------------------------------------------- + * If a next leaf is on the previous block, + * it means a fragmentation. + *---------------------------------------------- + */ + stat->fragments = 0; + if (stat->type == 'l') + { + if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno) + stat->fragments++; + } + + /* count live and dead tuples, and free space */ + for (off = FirstOffsetNumber; off <= maxoff; off++) + { + IndexTuple itup; + + ItemId id = PageGetItemId(page, off); + + itup = (IndexTuple) PageGetItem(page, id); + + item_size += IndexTupleSize(itup); + + if (!ItemIdDeleted(id)) + stat->live_items++; + else + stat->dead_items++; + } + stat->free_size = PageGetFreeSpace(page); + + if ((stat->live_items + stat->dead_items) > 0) + stat->avg_item_size = item_size / (stat->live_items + stat->dead_items); + else + stat->avg_item_size = 0; + + return true; +} + + +/* ------------------------------------------------------ + * pgstatindex() + * + * Usage: SELECT * FROM pgstatindex('t1_pkey'); + * ------------------------------------------------------ + */ +Datum +pgstatindex(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Relation rel; + RangeVar *relrv; + Datum result; + uint32 nblocks; + uint32 blkno; + BTIndexStat indexStat; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "pgstatindex() can be used only on b-tree index."); + + /*------------------- + * Read a metapage + *------------------- + */ + { + Buffer buffer = ReadBuffer(rel, 0); + Page page = BufferGetPage(buffer); + BTMetaPageData *metad = BTPageGetMeta(page); + + indexStat.magic = metad->btm_magic; + indexStat.version = metad->btm_version; + indexStat.root_blkno = metad->btm_root; + indexStat.level = metad->btm_level; + indexStat.fastroot = metad->btm_fastroot; + indexStat.fastlevel = metad->btm_fastlevel; + + ReleaseBuffer(buffer); + } + + nblocks = RelationGetNumberOfBlocks(rel); + + /* -- init stat -- */ + indexStat.fragments = 0; + + indexStat.root_pages = 0; + indexStat.leaf_pages = 0; + indexStat.internal_pages = 0; + indexStat.empty_pages = 0; + indexStat.deleted_pages = 0; + + indexStat.max_avail = 0; + indexStat.free_space = 0; + + /*----------------------- + * Scan all blocks + *----------------------- + */ + for (blkno = 1; blkno < nblocks; blkno++) + { + Buffer buffer = ReadBuffer(rel, blkno); + BTPageStat stat; + + /* scan one page */ + stat.blkno = blkno; + GetBTPageStatistics(blkno, buffer, &stat); + + /*--------------------- + * page status (type) + *--------------------- + */ + switch (stat.type) + { + case 'd': + indexStat.deleted_pages++; + break; + case 'l': + indexStat.leaf_pages++; + break; + case 'i': + indexStat.internal_pages++; + break; + case 'e': + indexStat.empty_pages++; + break; + case 'r': + indexStat.root_pages++; + break; + default: + elog(ERROR, "unknown page status."); + } + + /* -- leaf fragmentation -- */ + indexStat.fragments += stat.fragments; + + if (stat.type == 'l') + { + indexStat.max_avail += stat.max_avail; + indexStat.free_space += stat.free_size; + } + + ReleaseBuffer(buffer); + } + + relation_close(rel, AccessShareLock); + + /*---------------------------- + * Build a result tuple + *---------------------------- + */ + { + TupleDesc tupleDesc; + int j; + char *values[PGSTATINDEX_NCOLUMNS]; + + HeapTupleData tupleData; + HeapTuple tuple = &tupleData; + + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", (indexStat.root_pages + + indexStat.leaf_pages + + indexStat.internal_pages + + indexStat.deleted_pages + + indexStat.empty_pages) * BLCKSZ); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.root_blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.internal_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.leaf_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.empty_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.deleted_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + PG_RETURN_DATUM(result); +} + +/* ----------------------------------------------- + * bt_page() + * + * Usage: SELECT * FROM bt_page('t1_pkey', 0); + * ----------------------------------------------- + */ +Datum +bt_page_stats(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(rel, blkno); + + buffer = ReadBuffer(rel, blkno); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "bt_page_stats() can be used only on b-tree index."); + + if (blkno == 0) + elog(ERROR, "Block 0 is a meta page."); + + { + HeapTuple tuple; + TupleDesc tupleDesc; + int j; + char *values[BTPAGESTATS_NCOLUMNS]; + + BTPageStat stat; + + GetBTPageStatistics(blkno, buffer, &stat); + + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.blkno); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", stat.type); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.avg_item_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.page_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.free_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_prev); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_next); + + values[j] = palloc(32); + if (stat.type == 'd') + snprintf(values[j++], 32, "%d", stat.btpo.xact); + else + snprintf(values[j++], 32, "%d", stat.btpo.level); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_flags); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/*------------------------------------------------------- + * bt_page_items() + * + * Get IndexTupleData set in a leaf page + * + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0); + *------------------------------------------------------- + */ +/* --------------------------------------------------- + * data structure for SRF to hold a scan information + * --------------------------------------------------- + */ +struct user_args +{ + TupleDesc tupd; + Relation rel; + Buffer buffer; + Page page; + uint16 offset; +}; + +Datum +bt_page_items(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + + RangeVar *relrv; + Datum result; + char *values[BTPAGEITEMS_NCOLUMNS]; + BTPageOpaque opaque; + HeapTuple tuple; + ItemId id; + + FuncCallContext *fctx; + MemoryContext mctx; + struct user_args *uargs = NULL; + + if (blkno == 0) + elog(ERROR, "Block 0 is a meta page."); + + if (SRF_IS_FIRSTCALL()) + { + fctx = SRF_FIRSTCALL_INIT(); + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx); + + uargs = palloc(sizeof(struct user_args)); + + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE); + uargs->offset = FirstOffsetNumber; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + uargs->rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno); + + uargs->buffer = ReadBuffer(uargs->rel, blkno); + + if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel)) + elog(ERROR, "bt_page_items() can be used only on b-tree index."); + + uargs->page = BufferGetPage(uargs->buffer); + + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page); + + if (P_ISDELETED(opaque)) + elog(NOTICE, "bt_page_items(): this page is deleted."); + + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page); + fctx->user_fctx = uargs; + + MemoryContextSwitchTo(mctx); + } + + fctx = SRF_PERCALL_SETUP(); + uargs = fctx->user_fctx; + + if (fctx->call_cntr < fctx->max_calls) + { + IndexTuple itup; + + id = PageGetItemId(uargs->page, uargs->offset); + + if (!ItemIdIsValid(id)) + elog(ERROR, "Invalid ItemId."); + + itup = (IndexTuple) PageGetItem(uargs->page, id); + + { + int j = 0; + + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", uargs->offset); + values[j] = palloc(32); + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", IndexTupleSize(itup)); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f'); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f'); + + { + int off; + char *dump; + char *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); + + dump = palloc(IndexTupleSize(itup) * 3); + memset(dump, 0, IndexTupleSize(itup) * 3); + + for (off = 0; + off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info); + off++) + { + if (dump[0] == '\0') + sprintf(dump, "%02x", *(ptr + off) & 0xff); + else + { + char buf[4]; + + sprintf(buf, " %02x", *(ptr + off) & 0xff); + strcat(dump, buf); + } + } + values[j] = dump; + } + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values); + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple); + } + + uargs->offset = uargs->offset + 1; + + SRF_RETURN_NEXT(fctx, result); + } + else + { + ReleaseBuffer(uargs->buffer); + relation_close(uargs->rel, AccessShareLock); + + SRF_RETURN_DONE(fctx); + } +} + + +/* ------------------------------------------------ + * bt_metap() + * + * Get a btree meta-page information + * + * Usage: SELECT * FROM bt_metap('t1_pkey') + * ------------------------------------------------ + */ +Datum +bt_metap(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "bt_metap() can be used only on b-tree index."); + + buffer = ReadBuffer(rel, 0); + + { + BTMetaPageData *metad; + + TupleDesc tupleDesc; + int j; + char *values[BTMETAP_NCOLUMNS]; + HeapTuple tuple; + + Page page = BufferGetPage(buffer); + + metad = BTPageGetMeta(page); + + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_magic); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_root); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastroot); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastlevel); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/* -------------------------------------------------------- + * pg_relpages() + * + * Get a number of pages of the table/index. + * + * Usage: SELECT pg_relpages('t1'); + * SELECT pg_relpages('t1_pkey'); + * -------------------------------------------------------- + */ +Datum +pg_relpages(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + + Relation rel; + RangeVar *relrv; + int4 relpages; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + relpages = RelationGetNumberOfBlocks(rel); + + relation_close(rel, AccessShareLock); + + PG_RETURN_INT32(relpages); +} diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in --- pgstattuple.orig/pgstattuple.sql.in 2006-02-28 01:09:50.000000000 +0900 +++ pgstattuple/pgstattuple.sql.in 2006-08-14 10:37:32.000000000 +0900 @@ -22,3 +22,96 @@ RETURNS pgstattuple_type AS 'MODULE_PATHNAME', 'pgstattuplebyid' LANGUAGE C STRICT; + +-- +-- pgstatindex +-- +DROP TYPE pgstatindex_type CASCADE; +CREATE TYPE pgstatindex_type AS ( + version int4, + tree_level int4, + index_size int4, + root_block_no int4, + internal_pages int4, + leaf_pages int4, + empty_pages int4, + deleted_pages int4, + avg_leaf_density float8, + leaf_fragmentation float8 +); + +CREATE OR REPLACE FUNCTION pgstatindex(text) +RETURNS pgstatindex_type +AS 'MODULE_PATHNAME', 'pgstatindex' +LANGUAGE 'C' STRICT; + +-- +-- bt_metap() +-- +DROP TYPE bt_metap_type CASCADE; +CREATE TYPE bt_metap_type AS ( + magic int4, + version int4, + root int4, + level int4, + fastroot int4, + fastlevel int4 +); + +CREATE OR REPLACE FUNCTION bt_metap(text) +RETURNS bt_metap_type +AS 'MODULE_PATHNAME', 'bt_metap' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_items() +-- +DROP TYPE bt_page_items_type CASCADE; +CREATE TYPE bt_page_items_type AS ( + itemoffset int4, + ctid tid, + itemlen int4, + nulls bool, + vars bool, + data text +); + +DROP FUNCTION bt_page_items(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_items(text, int4) +RETURNS SETOF bt_page_items_type +AS 'MODULE_PATHNAME', 'bt_page_items' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_stats() +-- +DROP TYPE bt_page_stats_type CASCADE; +CREATE TYPE bt_page_stats_type AS ( + blkno int4, + type char, + live_items int4, + dead_items int4, + avg_item_size float, + page_size int4, + free_size int4, + btpo_prev int4, + btpo_next int4, + btpo int4, + btpo_flags int4 +); + +DROP FUNCTION bt_page_stats(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4) +RETURNS bt_page_stats_type +AS 'MODULE_PATHNAME', 'bt_page_stats' +LANGUAGE 'C' STRICT; + +-- +-- pg_relpages() +-- +CREATE OR REPLACE FUNCTION pg_relpages(text) +RETURNS int +AS 'MODULE_PATHNAME', 'pg_relpages' +LANGUAGE 'C' STRICT; diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/test.sh 2006-08-14 10:23:08.000000000 +0900 @@ -0,0 +1,27 @@ +#!/bin/sh + +PGHOME=/home/snaga/pgsql20060814 +export PATH=${PGHOME}/bin:$PATH + +psql -p 9999 pgbench<<EOF +DROP FUNCTION pgstatindex(text); +EOF + +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench + +psql -p 9999 pgbench<<EOF +SELECT * FROM pg_relpages('accounts_pkey'); +\x +SELECT * FROM pgstatindex('accounts_pkey'); +SELECT * FROM bt_metap('accounts_pkey'); +\x +SELECT * FROM bt_page_items('accounts_pkey', 0); +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20; + +SELECT * FROM bt_page_stats('accounts_pkey', 1); +\x +SELECT * FROM bt_page_stats('accounts_pkey', 1); +SELECT * FROM bt_page_stats('accounts_pkey', 361); +\x +SELECT * FROM bt_page_items('accounts_pkey', 361); +EOF
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
Bruce Momjian <bruce@momjian.us> writes:
Patch applied. Thanks.
For some reason I expected this patch to correct the portability errors
and design problems identified here:
http://archives.postgresql.org/pgsql-patches/2006-07/msg00100.php
Not only has it not fixed anything, it's made things worse:
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -fpic -I. -I../../src/include -D_GNU_SOURCE -c -o pgstattuple.o pgstattuple.c
pgstattuple.c: In function 'pgstat_btree':
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 2 has type 'uint64'
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 3 has type 'uint64'
pgstattuple.c:335: warning: format '%llu' expects type 'long long unsigned int', but argument 4 has type 'uint64'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -fpic -I. -I../../src/include -D_GNU_SOURCE -c -o pgstatindex.o pgstatindex.c
pgstatindex.c: In function 'bt_page_items':
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
ar crs libpgstattuple.a pgstattuple.o pgstatindex.o
The only reason the buildfarm isn't crashing on this contrib module is
that it lacks any regression test to crash on.
regards, tom lane
Thanks Bruce,
Here are updated Japanese README, and uninstall_pgstattuple.sql.
Bruce Momjian wrote:
Patch applied. Thanks.
I updated the README documentation for the new functions, attached. I
could not update the Japanese version of the README.---------------------------------------------------------------------------
Satoshi Nagayasu wrote:
Bruce,
Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.Thanks.
Satoshi Nagayasu wrote:
Alvaro,
Alvaro Herrera wrote:
Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)While you're at it, please consider removing C++ style comments and
unused code.Formatting is way off as well, but I guess that is easily fixed with
pgindent.Thanks for comments. I'm going to fix my patch from now.
Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).I wanted to sample some pages from the table/index, and get their statistics
to know table/index conditions. I know pgstattuple() reports table
statistics, however, pgstattuple() generates heavy CPU and I/O load.When we need to sample some pages from table/index, we need to know
true number of blocks.I have another function, called pgstatpage(), to get information inside
a single block/page statistics of the table. pg_relpages() will be used
with this.Sorry for not mentioned in previous post about pgstatpage(),
but I've remembered about it just now.Many memories in my brain have already `paged-out` (too busy in last few months),
and some of them got `out-of-memory`. :^)Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile --- pgstattuple.orig/Makefile 2006-02-27 21:54:40.000000000 +0900 +++ pgstattuple/Makefile 2006-08-14 09:28:58.000000000 +0900 @@ -6,7 +6,7 @@ # #--------------------------------------------------------------------------SRCS = pgstattuple.c +SRCS = pgstattuple.c pgstatindex.cMODULE_big = pgstattuple OBJS = $(SRCS:.c=.o) diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/pgstatindex.c 2006-08-14 11:24:23.000000000 +0900 @@ -0,0 +1,706 @@ +/* + * pgstatindex + * + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp> + * + * Permission to use, copy, modify, and distribute this software and + * its documentation for any purpose, without fee, and without a + * written agreement is hereby granted, provided that the above + * copyright notice and this paragraph and the following two + * paragraphs appear in all copies. + * + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. + */ + +#include "postgres.h" + +#include "fmgr.h" +#include "funcapi.h" +#include "access/heapam.h" +#include "access/itup.h" +#include "access/nbtree.h" +#include "access/transam.h" +#include "catalog/namespace.h" +#include "catalog/pg_type.h" +#include "utils/builtins.h" +#include "utils/inval.h" + +PG_FUNCTION_INFO_V1(pgstatindex); +PG_FUNCTION_INFO_V1(bt_metap); +PG_FUNCTION_INFO_V1(bt_page_items); +PG_FUNCTION_INFO_V1(bt_page_stats); +PG_FUNCTION_INFO_V1(pg_relpages); + +extern Datum pgstatindex(PG_FUNCTION_ARGS); +extern Datum bt_metap(PG_FUNCTION_ARGS); +extern Datum bt_page_items(PG_FUNCTION_ARGS); +extern Datum bt_page_stats(PG_FUNCTION_ARGS); +extern Datum pg_relpages(PG_FUNCTION_ARGS); + +#define PGSTATINDEX_TYPE "public.pgstatindex_type" +#define PGSTATINDEX_NCOLUMNS 10 + +#define BTMETAP_TYPE "public.bt_metap_type" +#define BTMETAP_NCOLUMNS 6 + +#define BTPAGEITEMS_TYPE "public.bt_page_items_type" +#define BTPAGEITEMS_NCOLUMNS 6 + +#define BTPAGESTATS_TYPE "public.bt_page_stats_type" +#define BTPAGESTATS_NCOLUMNS 11 + + +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i') +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID) + +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \ + if ( !(FirstOffsetNumber<=(offset) && \ + (offset)<=PageGetMaxOffsetNumber(page)) ) \ + elog(ERROR, "Page offset number out of range."); } + +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \ + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \ + elog(ERROR, "Block number out of range."); } + +/* ------------------------------------------------ + * structure for single btree page statistics + * ------------------------------------------------ + */ +typedef struct BTPageStat +{ + uint32 blkno; + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 max_avail; + uint32 free_size; + uint32 avg_item_size; + uint32 fragments; + char type; + + /* opaque data */ + BlockNumber btpo_prev; + BlockNumber btpo_next; + union + { + uint32 level; + TransactionId xact; + } btpo; + uint16 btpo_flags; + BTCycleId btpo_cycleid; +} BTPageStat; + +/* ------------------------------------------------ + * A structure for a whole btree index statistics + * used by pgstatindex(). + * ------------------------------------------------ + */ +typedef struct BTIndexStat +{ + uint32 magic; + uint32 version; + BlockNumber root_blkno; + uint32 level; + + BlockNumber fastroot; + uint32 fastlevel; + + uint32 live_items; + uint32 dead_items; + + uint32 root_pages; + uint32 internal_pages; + uint32 leaf_pages; + uint32 empty_pages; + uint32 deleted_pages; + + uint32 page_size; + uint32 avg_item_size; + + uint32 max_avail; + uint32 free_space; + + uint32 fragments; +} BTIndexStat; + +/* ------------------------------------------------- + * GetBTPageStatistics() + * + * Collect statistics of single b-tree leaf page + * ------------------------------------------------- + */ +static bool +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat) +{ + Page page = BufferGetPage(buffer); + PageHeader phdr = (PageHeader) page; + OffsetNumber maxoff = PageGetMaxOffsetNumber(page); + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); + int item_size = 0; + int off; + + stat->blkno = blkno; + + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData); + + stat->dead_items = stat->live_items = 0; + + stat->page_size = PageGetPageSize(page); + + /* page type (flags) */ + if (P_ISDELETED(opaque)) + { + stat->type = 'd'; + return true; + } + else if (P_IGNORE(opaque)) + stat->type = 'e'; + else if (P_ISLEAF(opaque)) + stat->type = 'l'; + else if (P_ISROOT(opaque)) + stat->type = 'r'; + else + stat->type = 'i'; + + /* btpage opaque data */ + stat->btpo_prev = opaque->btpo_prev; + stat->btpo_next = opaque->btpo_next; + if (P_ISDELETED(opaque)) + stat->btpo.xact = opaque->btpo.xact; + else + stat->btpo.level = opaque->btpo.level; + stat->btpo_flags = opaque->btpo_flags; + stat->btpo_cycleid = opaque->btpo_cycleid; + + /*---------------------------------------------- + * If a next leaf is on the previous block, + * it means a fragmentation. + *---------------------------------------------- + */ + stat->fragments = 0; + if (stat->type == 'l') + { + if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno) + stat->fragments++; + } + + /* count live and dead tuples, and free space */ + for (off = FirstOffsetNumber; off <= maxoff; off++) + { + IndexTuple itup; + + ItemId id = PageGetItemId(page, off); + + itup = (IndexTuple) PageGetItem(page, id); + + item_size += IndexTupleSize(itup); + + if (!ItemIdDeleted(id)) + stat->live_items++; + else + stat->dead_items++; + } + stat->free_size = PageGetFreeSpace(page); + + if ((stat->live_items + stat->dead_items) > 0) + stat->avg_item_size = item_size / (stat->live_items + stat->dead_items); + else + stat->avg_item_size = 0; + + return true; +} + + +/* ------------------------------------------------------ + * pgstatindex() + * + * Usage: SELECT * FROM pgstatindex('t1_pkey'); + * ------------------------------------------------------ + */ +Datum +pgstatindex(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Relation rel; + RangeVar *relrv; + Datum result; + uint32 nblocks; + uint32 blkno; + BTIndexStat indexStat; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "pgstatindex() can be used only on b-tree index."); + + /*------------------- + * Read a metapage + *------------------- + */ + { + Buffer buffer = ReadBuffer(rel, 0); + Page page = BufferGetPage(buffer); + BTMetaPageData *metad = BTPageGetMeta(page); + + indexStat.magic = metad->btm_magic; + indexStat.version = metad->btm_version; + indexStat.root_blkno = metad->btm_root; + indexStat.level = metad->btm_level; + indexStat.fastroot = metad->btm_fastroot; + indexStat.fastlevel = metad->btm_fastlevel; + + ReleaseBuffer(buffer); + } + + nblocks = RelationGetNumberOfBlocks(rel); + + /* -- init stat -- */ + indexStat.fragments = 0; + + indexStat.root_pages = 0; + indexStat.leaf_pages = 0; + indexStat.internal_pages = 0; + indexStat.empty_pages = 0; + indexStat.deleted_pages = 0; + + indexStat.max_avail = 0; + indexStat.free_space = 0; + + /*----------------------- + * Scan all blocks + *----------------------- + */ + for (blkno = 1; blkno < nblocks; blkno++) + { + Buffer buffer = ReadBuffer(rel, blkno); + BTPageStat stat; + + /* scan one page */ + stat.blkno = blkno; + GetBTPageStatistics(blkno, buffer, &stat); + + /*--------------------- + * page status (type) + *--------------------- + */ + switch (stat.type) + { + case 'd': + indexStat.deleted_pages++; + break; + case 'l': + indexStat.leaf_pages++; + break; + case 'i': + indexStat.internal_pages++; + break; + case 'e': + indexStat.empty_pages++; + break; + case 'r': + indexStat.root_pages++; + break; + default: + elog(ERROR, "unknown page status."); + } + + /* -- leaf fragmentation -- */ + indexStat.fragments += stat.fragments; + + if (stat.type == 'l') + { + indexStat.max_avail += stat.max_avail; + indexStat.free_space += stat.free_size; + } + + ReleaseBuffer(buffer); + } + + relation_close(rel, AccessShareLock); + + /*---------------------------- + * Build a result tuple + *---------------------------- + */ + { + TupleDesc tupleDesc; + int j; + char *values[PGSTATINDEX_NCOLUMNS]; + + HeapTupleData tupleData; + HeapTuple tuple = &tupleData; + + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", (indexStat.root_pages + + indexStat.leaf_pages + + indexStat.internal_pages + + indexStat.deleted_pages + + indexStat.empty_pages) * BLCKSZ); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.root_blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.internal_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.leaf_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.empty_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.deleted_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + PG_RETURN_DATUM(result); +} + +/* ----------------------------------------------- + * bt_page() + * + * Usage: SELECT * FROM bt_page('t1_pkey', 0); + * ----------------------------------------------- + */ +Datum +bt_page_stats(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(rel, blkno); + + buffer = ReadBuffer(rel, blkno); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "bt_page_stats() can be used only on b-tree index."); + + if (blkno == 0) + elog(ERROR, "Block 0 is a meta page."); + + { + HeapTuple tuple; + TupleDesc tupleDesc; + int j; + char *values[BTPAGESTATS_NCOLUMNS]; + + BTPageStat stat; + + GetBTPageStatistics(blkno, buffer, &stat); + + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.blkno); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", stat.type); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.avg_item_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.page_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.free_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_prev); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_next); + + values[j] = palloc(32); + if (stat.type == 'd') + snprintf(values[j++], 32, "%d", stat.btpo.xact); + else + snprintf(values[j++], 32, "%d", stat.btpo.level); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_flags); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/*------------------------------------------------------- + * bt_page_items() + * + * Get IndexTupleData set in a leaf page + * + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0); + *------------------------------------------------------- + */ +/* --------------------------------------------------- + * data structure for SRF to hold a scan information + * --------------------------------------------------- + */ +struct user_args +{ + TupleDesc tupd; + Relation rel; + Buffer buffer; + Page page; + uint16 offset; +}; + +Datum +bt_page_items(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + + RangeVar *relrv; + Datum result; + char *values[BTPAGEITEMS_NCOLUMNS]; + BTPageOpaque opaque; + HeapTuple tuple; + ItemId id; + + FuncCallContext *fctx; + MemoryContext mctx; + struct user_args *uargs = NULL; + + if (blkno == 0) + elog(ERROR, "Block 0 is a meta page."); + + if (SRF_IS_FIRSTCALL()) + { + fctx = SRF_FIRSTCALL_INIT(); + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx); + + uargs = palloc(sizeof(struct user_args)); + + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE); + uargs->offset = FirstOffsetNumber; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + uargs->rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno); + + uargs->buffer = ReadBuffer(uargs->rel, blkno); + + if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel)) + elog(ERROR, "bt_page_items() can be used only on b-tree index."); + + uargs->page = BufferGetPage(uargs->buffer); + + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page); + + if (P_ISDELETED(opaque)) + elog(NOTICE, "bt_page_items(): this page is deleted."); + + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page); + fctx->user_fctx = uargs; + + MemoryContextSwitchTo(mctx); + } + + fctx = SRF_PERCALL_SETUP(); + uargs = fctx->user_fctx; + + if (fctx->call_cntr < fctx->max_calls) + { + IndexTuple itup; + + id = PageGetItemId(uargs->page, uargs->offset); + + if (!ItemIdIsValid(id)) + elog(ERROR, "Invalid ItemId."); + + itup = (IndexTuple) PageGetItem(uargs->page, id); + + { + int j = 0; + + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", uargs->offset); + values[j] = palloc(32); + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", IndexTupleSize(itup)); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f'); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f'); + + { + int off; + char *dump; + char *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); + + dump = palloc(IndexTupleSize(itup) * 3); + memset(dump, 0, IndexTupleSize(itup) * 3); + + for (off = 0; + off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info); + off++) + { + if (dump[0] == '\0') + sprintf(dump, "%02x", *(ptr + off) & 0xff); + else + { + char buf[4]; + + sprintf(buf, " %02x", *(ptr + off) & 0xff); + strcat(dump, buf); + } + } + values[j] = dump; + } + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values); + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple); + } + + uargs->offset = uargs->offset + 1; + + SRF_RETURN_NEXT(fctx, result); + } + else + { + ReleaseBuffer(uargs->buffer); + relation_close(uargs->rel, AccessShareLock); + + SRF_RETURN_DONE(fctx); + } +} + + +/* ------------------------------------------------ + * bt_metap() + * + * Get a btree meta-page information + * + * Usage: SELECT * FROM bt_metap('t1_pkey') + * ------------------------------------------------ + */ +Datum +bt_metap(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "bt_metap() can be used only on b-tree index."); + + buffer = ReadBuffer(rel, 0); + + { + BTMetaPageData *metad; + + TupleDesc tupleDesc; + int j; + char *values[BTMETAP_NCOLUMNS]; + HeapTuple tuple; + + Page page = BufferGetPage(buffer); + + metad = BTPageGetMeta(page); + + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_magic); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_root); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastroot); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastlevel); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/* -------------------------------------------------------- + * pg_relpages() + * + * Get a number of pages of the table/index. + * + * Usage: SELECT pg_relpages('t1'); + * SELECT pg_relpages('t1_pkey'); + * -------------------------------------------------------- + */ +Datum +pg_relpages(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + + Relation rel; + RangeVar *relrv; + int4 relpages; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + relpages = RelationGetNumberOfBlocks(rel); + + relation_close(rel, AccessShareLock); + + PG_RETURN_INT32(relpages); +} diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in --- pgstattuple.orig/pgstattuple.sql.in 2006-02-28 01:09:50.000000000 +0900 +++ pgstattuple/pgstattuple.sql.in 2006-08-14 10:37:32.000000000 +0900 @@ -22,3 +22,96 @@ RETURNS pgstattuple_type AS 'MODULE_PATHNAME', 'pgstattuplebyid' LANGUAGE C STRICT; + +-- +-- pgstatindex +-- +DROP TYPE pgstatindex_type CASCADE; +CREATE TYPE pgstatindex_type AS ( + version int4, + tree_level int4, + index_size int4, + root_block_no int4, + internal_pages int4, + leaf_pages int4, + empty_pages int4, + deleted_pages int4, + avg_leaf_density float8, + leaf_fragmentation float8 +); + +CREATE OR REPLACE FUNCTION pgstatindex(text) +RETURNS pgstatindex_type +AS 'MODULE_PATHNAME', 'pgstatindex' +LANGUAGE 'C' STRICT; + +-- +-- bt_metap() +-- +DROP TYPE bt_metap_type CASCADE; +CREATE TYPE bt_metap_type AS ( + magic int4, + version int4, + root int4, + level int4, + fastroot int4, + fastlevel int4 +); + +CREATE OR REPLACE FUNCTION bt_metap(text) +RETURNS bt_metap_type +AS 'MODULE_PATHNAME', 'bt_metap' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_items() +-- +DROP TYPE bt_page_items_type CASCADE; +CREATE TYPE bt_page_items_type AS ( + itemoffset int4, + ctid tid, + itemlen int4, + nulls bool, + vars bool, + data text +); + +DROP FUNCTION bt_page_items(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_items(text, int4) +RETURNS SETOF bt_page_items_type +AS 'MODULE_PATHNAME', 'bt_page_items' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_stats() +-- +DROP TYPE bt_page_stats_type CASCADE; +CREATE TYPE bt_page_stats_type AS ( + blkno int4, + type char, + live_items int4, + dead_items int4, + avg_item_size float, + page_size int4, + free_size int4, + btpo_prev int4, + btpo_next int4, + btpo int4, + btpo_flags int4 +); + +DROP FUNCTION bt_page_stats(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4) +RETURNS bt_page_stats_type +AS 'MODULE_PATHNAME', 'bt_page_stats' +LANGUAGE 'C' STRICT; + +-- +-- pg_relpages() +-- +CREATE OR REPLACE FUNCTION pg_relpages(text) +RETURNS int +AS 'MODULE_PATHNAME', 'pg_relpages' +LANGUAGE 'C' STRICT; diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/test.sh 2006-08-14 10:23:08.000000000 +0900 @@ -0,0 +1,27 @@ +#!/bin/sh + +PGHOME=/home/snaga/pgsql20060814 +export PATH=${PGHOME}/bin:$PATH + +psql -p 9999 pgbench<<EOF +DROP FUNCTION pgstatindex(text); +EOF + +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench + +psql -p 9999 pgbench<<EOF +SELECT * FROM pg_relpages('accounts_pkey'); +\x +SELECT * FROM pgstatindex('accounts_pkey'); +SELECT * FROM bt_metap('accounts_pkey'); +\x +SELECT * FROM bt_page_items('accounts_pkey', 0); +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20; + +SELECT * FROM bt_page_stats('accounts_pkey', 1); +\x +SELECT * FROM bt_page_stats('accounts_pkey', 1); +SELECT * FROM bt_page_stats('accounts_pkey', 361); +\x +SELECT * FROM bt_page_items('accounts_pkey', 361); +EOF------------------------------------------------------------------------
pgstattuple README 2002/08/29 Tatsuo Ishii
1. Functions supported:
pgstattuple
-----------
pgstattuple() returns the relation length, percentage of the "dead"
tuples of a relation and other info. This may help users to determine
whether vacuum is necessary or not. Here is an example session:test=> \x
Expanded display is on.
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95Here are explanations for each column:
table_len -- physical relation length in bytes
tuple_count -- number of live tuples
tuple_len -- total tuples length in bytes
tuple_percent -- live tuples in %
dead_tuple_len -- total dead tuples length in bytes
dead_tuple_percent -- dead tuples in %
free_space -- free space in bytes
free_percent -- free space in %pg_relpages
-----------
pg_relpages() returns the number of pages in the relation.pgstatindex
-----------
pgstatindex() returns an array showing the information about an index:test=> \x
Expanded display is on.
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0bt_metap
--------
bt_metap() returns information about the btree index metapage:test=> SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----
magic | 340322
version | 2
root | 1
level | 0
fastroot | 1
fastlevel | 0bt_page_stats
-------------
bt_page_stats() shows information about single btree pages:test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
dead_items | 0
avg_item_size | 12
page_size | 8192
free_size | 4056
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3bt_page_items
-------------
bt_page_items() returns information about specific items on btree pages:test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 002. Installing pgstattuple
$ make
$ make install
$ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test3. Using pgstattuple
pgstattuple may be called as a relation function and is
defined as follows:CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;The argument is the relation name (optionally it may be qualified)
or the OID of the relation. Note that pgstattuple only returns
one row.4. Notes
pgstattuple acquires only a read lock on the relation. So concurrent
update may affect the result.pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.5. History
2006/06/28
Extended to work against indexes.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Tom Lane wrote:
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Winline -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -g -fpic -I. -I../../src/include -D_GNU_SOURCE -c -o pgstatindex.o pgstatindex.c
pgstatindex.c: In function 'bt_page_items':
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
I guess my '%d' should be '%zd', right?
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
Attachments:
pgstatindex.c.difftext/plain; name=pgstatindex.c.diffDownload
*** pgstatindex.c 2006-09-03 02:05:29.000000000 +0900
--- pgstatindex.c.new 2006-09-04 08:22:42.000000000 +0900
***************
*** 561,567 ****
values[j] = palloc(32);
snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
values[j] = palloc(32);
! snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
values[j] = palloc(32);
snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
values[j] = palloc(32);
--- 561,567 ----
values[j] = palloc(32);
snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
values[j] = palloc(32);
! snprintf(values[j++], 32, "%zd", IndexTupleSize(itup));
values[j] = palloc(32);
snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
values[j] = palloc(32);
Bruce Momjian <bruce@momjian.us> wrote:
Patch applied. Thanks.
The two attached patches fix contrib/pgstattuple.
pgstattuple.c.diff removes the fragmemtation reporting in pgstattuple().
It is no longer needed, because pgstatindex() has upward functionality now.
Also, the report using elog was judged as improper in earlier discusses.
pgstattuple.sql.in.diff removes DROP statements in the installer. The DROP
statements make some unpleasant ERROR logs during install. According to
other contrib modules, DROPs should be in the uninstaller and should not
be in the installer.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Attachments:
pgstattuple.sql.in.diffapplication/octet-stream; name=pgstattuple.sql.in.diffDownload
*** pgstattuple.orig.sql.in Mon Sep 4 09:04:10 2006
--- pgstattuple.sql.in Mon Sep 4 10:22:00 2006
***************
*** 26,32 ****
--
-- pgstatindex
--
- DROP TYPE pgstatindex_type CASCADE;
CREATE TYPE pgstatindex_type AS (
version int4,
tree_level int4,
--- 26,31 ----
***************
*** 48,54 ****
--
-- bt_metap()
--
- DROP TYPE bt_metap_type CASCADE;
CREATE TYPE bt_metap_type AS (
magic int4,
version int4,
--- 47,52 ----
***************
*** 66,72 ****
--
-- bt_page_stats()
--
- DROP TYPE bt_page_stats_type CASCADE;
CREATE TYPE bt_page_stats_type AS (
blkno int4,
type char,
--- 64,69 ----
***************
*** 81,88 ****
btpo_flags int4
);
- DROP FUNCTION bt_page_stats(text, int4);
-
CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
RETURNS bt_page_stats_type
AS 'MODULE_PATHNAME', 'bt_page_stats'
--- 78,83 ----
***************
*** 91,97 ****
--
-- bt_page_items()
--
- DROP TYPE bt_page_items_type CASCADE;
CREATE TYPE bt_page_items_type AS (
itemoffset int4,
ctid tid,
--- 86,91 ----
***************
*** 100,107 ****
vars bool,
data text
);
-
- DROP FUNCTION bt_page_items(text, int4);
CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
RETURNS SETOF bt_page_items_type
--- 94,99 ----
pgstattuple.c.diffapplication/octet-stream; name=pgstattuple.c.diffDownload
*** pgstattuple.orig.c Wed Jul 12 08:27:09 2006
--- pgstattuple.c Mon Sep 4 10:02:25 2006
***************
*** 59,93 ****
uint64 free_space; /* free/reusable space in bytes */
} pgstattuple_type;
- /*
- * struct pgstat_btree_type
- */
- typedef struct pgstat_btree_type
- {
- pgstattuple_type base; /* inherits pgstattuple_type */
-
- uint64 continuous;
- uint64 forward;
- uint64 backward;
- } pgstat_btree_type;
-
typedef void (*pgstat_page)(pgstattuple_type *, Relation, BlockNumber);
static Datum build_pgstattuple_type(pgstattuple_type *stat,
FunctionCallInfo fcinfo);
static Datum pgstat_relation(Relation rel, FunctionCallInfo fcinfo);
static Datum pgstat_heap(Relation rel, FunctionCallInfo fcinfo);
- static Datum pgstat_btree(Relation rel, FunctionCallInfo fcinfo);
static void pgstat_btree_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
- static Datum pgstat_hash(Relation rel, FunctionCallInfo fcinfo);
static void pgstat_hash_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
- static Datum pgstat_gist(Relation rel, FunctionCallInfo fcinfo);
static void pgstat_gist_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
! static Datum pgstat_index(pgstattuple_type *stat,
! Relation rel, BlockNumber start,
pgstat_page pagefn, FunctionCallInfo fcinfo);
static void pgstat_index_page(pgstattuple_type *stat, Page page,
OffsetNumber minoff, OffsetNumber maxoff);
--- 59,77 ----
uint64 free_space; /* free/reusable space in bytes */
} pgstattuple_type;
typedef void (*pgstat_page)(pgstattuple_type *, Relation, BlockNumber);
static Datum build_pgstattuple_type(pgstattuple_type *stat,
FunctionCallInfo fcinfo);
static Datum pgstat_relation(Relation rel, FunctionCallInfo fcinfo);
static Datum pgstat_heap(Relation rel, FunctionCallInfo fcinfo);
static void pgstat_btree_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
static void pgstat_hash_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
static void pgstat_gist_page(pgstattuple_type *stat,
Relation rel, BlockNumber blkno);
! static Datum pgstat_index(Relation rel, BlockNumber start,
pgstat_page pagefn, FunctionCallInfo fcinfo);
static void pgstat_index_page(pgstattuple_type *stat, Page page,
OffsetNumber minoff, OffsetNumber maxoff);
***************
*** 217,227 ****
switch(rel->rd_rel->relam)
{
case BTREE_AM_OID:
! return pgstat_btree(rel, fcinfo);
case HASH_AM_OID:
! return pgstat_hash(rel, fcinfo);
case GIST_AM_OID:
! return pgstat_gist(rel, fcinfo);
case GIN_AM_OID:
err = "gin index";
break;
--- 201,214 ----
switch(rel->rd_rel->relam)
{
case BTREE_AM_OID:
! return pgstat_index(rel, BTREE_METAPAGE + 1,
! pgstat_btree_page, fcinfo);
case HASH_AM_OID:
! return pgstat_index(rel, HASH_METAPAGE + 1,
! pgstat_hash_page, fcinfo);
case GIST_AM_OID:
! return pgstat_index(rel, GIST_ROOT_BLKNO + 1,
! pgstat_gist_page, fcinfo);
case GIN_AM_OID:
err = "gin index";
break;
***************
*** 321,356 ****
}
/*
! * pgstat_btree -- returns live/dead tuples info in a btree index
! */
! static Datum
! pgstat_btree(Relation rel, FunctionCallInfo fcinfo)
! {
! pgstat_btree_type stat = { { 0 } };
! Datum datum;
!
! datum = pgstat_index((pgstattuple_type *) &stat, rel,
! BTREE_METAPAGE + 1, pgstat_btree_page, fcinfo);
!
! ereport(NOTICE,
! (errmsg("%.2f%% fragmented",
! 100.0 * (stat.forward + stat.backward) /
! (stat.continuous + stat.forward + stat.backward)),
! errhint("continuous=%llu, forward=%llu, backward=%llu",
! stat.continuous, stat.forward, stat.backward)));
!
! return datum;
! }
!
! /*
! * pgstat_btree_page
*/
static void
pgstat_btree_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
{
Buffer buf;
Page page;
- pgstat_btree_type *btstat = (pgstat_btree_type *)stat;
buf = ReadBuffer(rel, blkno);
LockBuffer(buf, BT_READ);
--- 308,320 ----
}
/*
! * pgstat_btree_page -- check tuples in a btree page
*/
static void
pgstat_btree_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
{
Buffer buf;
Page page;
buf = ReadBuffer(rel, blkno);
LockBuffer(buf, BT_READ);
***************
*** 373,388 ****
}
else if (P_ISLEAF(opaque))
{
- /* check fragmentation */
- if (P_RIGHTMOST(opaque))
- btstat->continuous++;
- else if (opaque->btpo_next < blkno)
- btstat->backward++;
- else if (opaque->btpo_next > blkno + 1)
- btstat->forward++;
- else
- btstat->continuous++;
-
pgstat_index_page(stat, page, P_FIRSTDATAKEY(opaque),
PageGetMaxOffsetNumber(page));
}
--- 337,342 ----
***************
*** 396,412 ****
}
/*
! * pgstat_hash -- returns live/dead tuples info in a hash index
! */
! static Datum
! pgstat_hash(Relation rel, FunctionCallInfo fcinfo)
! {
! pgstattuple_type stat = { 0 };
! return pgstat_index(&stat, rel, HASH_METAPAGE + 1, pgstat_hash_page, fcinfo);
! }
!
! /*
! * pgstat_hash_page
*/
static void
pgstat_hash_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
--- 350,356 ----
}
/*
! * pgstat_hash_page -- check tuples in a hash page
*/
static void
pgstat_hash_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
***************
*** 448,464 ****
}
/*
! * pgstat_gist -- returns live/dead tuples info in a gist index
! */
! static Datum
! pgstat_gist(Relation rel, FunctionCallInfo fcinfo)
! {
! pgstattuple_type stat = { 0 };
! return pgstat_index(&stat, rel, GIST_ROOT_BLKNO + 1, pgstat_gist_page, fcinfo);
! }
!
! /*
! * pgstat_gist_page
*/
static void
pgstat_gist_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
--- 392,398 ----
}
/*
! * pgstat_gist_page -- check tuples in a gist page
*/
static void
pgstat_gist_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno)
***************
*** 488,498 ****
* pgstat_index -- returns live/dead tuples info in a generic index
*/
static Datum
! pgstat_index(pgstattuple_type *stat, Relation rel, BlockNumber start,
! pgstat_page pagefn, FunctionCallInfo fcinfo)
{
BlockNumber nblocks;
BlockNumber blkno;
blkno = start;
for (;;)
--- 422,433 ----
* pgstat_index -- returns live/dead tuples info in a generic index
*/
static Datum
! pgstat_index(Relation rel, BlockNumber start, pgstat_page pagefn,
! FunctionCallInfo fcinfo)
{
BlockNumber nblocks;
BlockNumber blkno;
+ pgstattuple_type stat = { 0 };
blkno = start;
for (;;)
***************
*** 505,521 ****
/* Quit if we've scanned the whole relation */
if (blkno >= nblocks)
{
! stat->table_len = (uint64) nblocks * BLCKSZ;
break;
}
for (; blkno < nblocks; blkno++)
! pagefn(stat, rel, blkno);
}
relation_close(rel, AccessShareLock);
! return build_pgstattuple_type(stat, fcinfo);
}
/*
--- 440,456 ----
/* Quit if we've scanned the whole relation */
if (blkno >= nblocks)
{
! stat.table_len = (uint64) nblocks * BLCKSZ;
break;
}
for (; blkno < nblocks; blkno++)
! pagefn(&stat, rel, blkno);
}
relation_close(rel, AccessShareLock);
! return build_pgstattuple_type(&stat, fcinfo);
}
/*
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
The two attached patches fix contrib/pgstattuple.
Good, applied. I made some additional changes to get install/uninstall/
reinstall to work cleanly after the latest additions, and to get it to
compile without warnings on a 64-bit Fedora machine. (It seems to
actually work there, too.)
I notice that the original pgstattuple() function comes in two flavors,
one with OID input and one with text-relation-name input. Shouldn't all
the others be likewise?
regards, tom lane
Satoshi Nagayasu <nagayasus@nttdata.co.jp> writes:
Tom Lane wrote:
pgstatindex.c: In function 'bt_page_items':
pgstatindex.c:564: warning: format '%d' expects type 'int', but argument 4 has type 'long unsigned int'
I guess my '%d' should be '%zd', right?
No, that sounds even less portable :-(
Given the expected range of IndexTupleSize(), it seems sufficient to
cast its result to int and then use %d formatting. I've done that
in the latest commit.
regards, tom lane
Applied.
---------------------------------------------------------------------------
Satoshi Nagayasu wrote:
Thanks Bruce,
Here are updated Japanese README, and uninstall_pgstattuple.sql.
Bruce Momjian wrote:
Patch applied. Thanks.
I updated the README documentation for the new functions, attached. I
could not update the Japanese version of the README.---------------------------------------------------------------------------
Satoshi Nagayasu wrote:
Bruce,
Attached patch has been cleaned up,
and modified to be able to work with CVS HEAD.Thanks.
Satoshi Nagayasu wrote:
Alvaro,
Alvaro Herrera wrote:
Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)While you're at it, please consider removing C++ style comments and
unused code.Formatting is way off as well, but I guess that is easily fixed with
pgindent.Thanks for comments. I'm going to fix my patch from now.
Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).I wanted to sample some pages from the table/index, and get their statistics
to know table/index conditions. I know pgstattuple() reports table
statistics, however, pgstattuple() generates heavy CPU and I/O load.When we need to sample some pages from table/index, we need to know
true number of blocks.I have another function, called pgstatpage(), to get information inside
a single block/page statistics of the table. pg_relpages() will be used
with this.Sorry for not mentioned in previous post about pgstatpage(),
but I've remembered about it just now.Many memories in my brain have already `paged-out` (too busy in last few months),
and some of them got `out-of-memory`. :^)Thanks.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile --- pgstattuple.orig/Makefile 2006-02-27 21:54:40.000000000 +0900 +++ pgstattuple/Makefile 2006-08-14 09:28:58.000000000 +0900 @@ -6,7 +6,7 @@ # #--------------------------------------------------------------------------SRCS = pgstattuple.c +SRCS = pgstattuple.c pgstatindex.cMODULE_big = pgstattuple OBJS = $(SRCS:.c=.o) diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c --- pgstattuple.orig/pgstatindex.c 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/pgstatindex.c 2006-08-14 11:24:23.000000000 +0900 @@ -0,0 +1,706 @@ +/* + * pgstatindex + * + * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp> + * + * Permission to use, copy, modify, and distribute this software and + * its documentation for any purpose, without fee, and without a + * written agreement is hereby granted, provided that the above + * copyright notice and this paragraph and the following two + * paragraphs appear in all copies. + * + * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT, + * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING + * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS + * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED + * OF THE POSSIBILITY OF SUCH DAMAGE. + * + * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS + * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, + * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. + */ + +#include "postgres.h" + +#include "fmgr.h" +#include "funcapi.h" +#include "access/heapam.h" +#include "access/itup.h" +#include "access/nbtree.h" +#include "access/transam.h" +#include "catalog/namespace.h" +#include "catalog/pg_type.h" +#include "utils/builtins.h" +#include "utils/inval.h" + +PG_FUNCTION_INFO_V1(pgstatindex); +PG_FUNCTION_INFO_V1(bt_metap); +PG_FUNCTION_INFO_V1(bt_page_items); +PG_FUNCTION_INFO_V1(bt_page_stats); +PG_FUNCTION_INFO_V1(pg_relpages); + +extern Datum pgstatindex(PG_FUNCTION_ARGS); +extern Datum bt_metap(PG_FUNCTION_ARGS); +extern Datum bt_page_items(PG_FUNCTION_ARGS); +extern Datum bt_page_stats(PG_FUNCTION_ARGS); +extern Datum pg_relpages(PG_FUNCTION_ARGS); + +#define PGSTATINDEX_TYPE "public.pgstatindex_type" +#define PGSTATINDEX_NCOLUMNS 10 + +#define BTMETAP_TYPE "public.bt_metap_type" +#define BTMETAP_NCOLUMNS 6 + +#define BTPAGEITEMS_TYPE "public.bt_page_items_type" +#define BTPAGEITEMS_NCOLUMNS 6 + +#define BTPAGESTATS_TYPE "public.bt_page_stats_type" +#define BTPAGESTATS_NCOLUMNS 11 + + +#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i') +#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID) + +#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \ + if ( !(FirstOffsetNumber<=(offset) && \ + (offset)<=PageGetMaxOffsetNumber(page)) ) \ + elog(ERROR, "Page offset number out of range."); } + +#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \ + if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \ + elog(ERROR, "Block number out of range."); } + +/* ------------------------------------------------ + * structure for single btree page statistics + * ------------------------------------------------ + */ +typedef struct BTPageStat +{ + uint32 blkno; + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 max_avail; + uint32 free_size; + uint32 avg_item_size; + uint32 fragments; + char type; + + /* opaque data */ + BlockNumber btpo_prev; + BlockNumber btpo_next; + union + { + uint32 level; + TransactionId xact; + } btpo; + uint16 btpo_flags; + BTCycleId btpo_cycleid; +} BTPageStat; + +/* ------------------------------------------------ + * A structure for a whole btree index statistics + * used by pgstatindex(). + * ------------------------------------------------ + */ +typedef struct BTIndexStat +{ + uint32 magic; + uint32 version; + BlockNumber root_blkno; + uint32 level; + + BlockNumber fastroot; + uint32 fastlevel; + + uint32 live_items; + uint32 dead_items; + + uint32 root_pages; + uint32 internal_pages; + uint32 leaf_pages; + uint32 empty_pages; + uint32 deleted_pages; + + uint32 page_size; + uint32 avg_item_size; + + uint32 max_avail; + uint32 free_space; + + uint32 fragments; +} BTIndexStat; + +/* ------------------------------------------------- + * GetBTPageStatistics() + * + * Collect statistics of single b-tree leaf page + * ------------------------------------------------- + */ +static bool +GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat * stat) +{ + Page page = BufferGetPage(buffer); + PageHeader phdr = (PageHeader) page; + OffsetNumber maxoff = PageGetMaxOffsetNumber(page); + BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); + int item_size = 0; + int off; + + stat->blkno = blkno; + + stat->max_avail = BLCKSZ - (BLCKSZ - phdr->pd_special + SizeOfPageHeaderData); + + stat->dead_items = stat->live_items = 0; + + stat->page_size = PageGetPageSize(page); + + /* page type (flags) */ + if (P_ISDELETED(opaque)) + { + stat->type = 'd'; + return true; + } + else if (P_IGNORE(opaque)) + stat->type = 'e'; + else if (P_ISLEAF(opaque)) + stat->type = 'l'; + else if (P_ISROOT(opaque)) + stat->type = 'r'; + else + stat->type = 'i'; + + /* btpage opaque data */ + stat->btpo_prev = opaque->btpo_prev; + stat->btpo_next = opaque->btpo_next; + if (P_ISDELETED(opaque)) + stat->btpo.xact = opaque->btpo.xact; + else + stat->btpo.level = opaque->btpo.level; + stat->btpo_flags = opaque->btpo_flags; + stat->btpo_cycleid = opaque->btpo_cycleid; + + /*---------------------------------------------- + * If a next leaf is on the previous block, + * it means a fragmentation. + *---------------------------------------------- + */ + stat->fragments = 0; + if (stat->type == 'l') + { + if (opaque->btpo_next != P_NONE && opaque->btpo_next < blkno) + stat->fragments++; + } + + /* count live and dead tuples, and free space */ + for (off = FirstOffsetNumber; off <= maxoff; off++) + { + IndexTuple itup; + + ItemId id = PageGetItemId(page, off); + + itup = (IndexTuple) PageGetItem(page, id); + + item_size += IndexTupleSize(itup); + + if (!ItemIdDeleted(id)) + stat->live_items++; + else + stat->dead_items++; + } + stat->free_size = PageGetFreeSpace(page); + + if ((stat->live_items + stat->dead_items) > 0) + stat->avg_item_size = item_size / (stat->live_items + stat->dead_items); + else + stat->avg_item_size = 0; + + return true; +} + + +/* ------------------------------------------------------ + * pgstatindex() + * + * Usage: SELECT * FROM pgstatindex('t1_pkey'); + * ------------------------------------------------------ + */ +Datum +pgstatindex(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Relation rel; + RangeVar *relrv; + Datum result; + uint32 nblocks; + uint32 blkno; + BTIndexStat indexStat; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "pgstatindex() can be used only on b-tree index."); + + /*------------------- + * Read a metapage + *------------------- + */ + { + Buffer buffer = ReadBuffer(rel, 0); + Page page = BufferGetPage(buffer); + BTMetaPageData *metad = BTPageGetMeta(page); + + indexStat.magic = metad->btm_magic; + indexStat.version = metad->btm_version; + indexStat.root_blkno = metad->btm_root; + indexStat.level = metad->btm_level; + indexStat.fastroot = metad->btm_fastroot; + indexStat.fastlevel = metad->btm_fastlevel; + + ReleaseBuffer(buffer); + } + + nblocks = RelationGetNumberOfBlocks(rel); + + /* -- init stat -- */ + indexStat.fragments = 0; + + indexStat.root_pages = 0; + indexStat.leaf_pages = 0; + indexStat.internal_pages = 0; + indexStat.empty_pages = 0; + indexStat.deleted_pages = 0; + + indexStat.max_avail = 0; + indexStat.free_space = 0; + + /*----------------------- + * Scan all blocks + *----------------------- + */ + for (blkno = 1; blkno < nblocks; blkno++) + { + Buffer buffer = ReadBuffer(rel, blkno); + BTPageStat stat; + + /* scan one page */ + stat.blkno = blkno; + GetBTPageStatistics(blkno, buffer, &stat); + + /*--------------------- + * page status (type) + *--------------------- + */ + switch (stat.type) + { + case 'd': + indexStat.deleted_pages++; + break; + case 'l': + indexStat.leaf_pages++; + break; + case 'i': + indexStat.internal_pages++; + break; + case 'e': + indexStat.empty_pages++; + break; + case 'r': + indexStat.root_pages++; + break; + default: + elog(ERROR, "unknown page status."); + } + + /* -- leaf fragmentation -- */ + indexStat.fragments += stat.fragments; + + if (stat.type == 'l') + { + indexStat.max_avail += stat.max_avail; + indexStat.free_space += stat.free_size; + } + + ReleaseBuffer(buffer); + } + + relation_close(rel, AccessShareLock); + + /*---------------------------- + * Build a result tuple + *---------------------------- + */ + { + TupleDesc tupleDesc; + int j; + char *values[PGSTATINDEX_NCOLUMNS]; + + HeapTupleData tupleData; + HeapTuple tuple = &tupleData; + + tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", (indexStat.root_pages + + indexStat.leaf_pages + + indexStat.internal_pages + + indexStat.deleted_pages + + indexStat.empty_pages) * BLCKSZ); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.root_blkno); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.internal_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.leaf_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.empty_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", indexStat.deleted_pages); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", 100.0 - (float) indexStat.free_space / (float) indexStat.max_avail * 100.0); + values[j] = palloc(32); + snprintf(values[j++], 32, "%.2f", (float) indexStat.fragments / (float) indexStat.leaf_pages * 100.0); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + PG_RETURN_DATUM(result); +} + +/* ----------------------------------------------- + * bt_page() + * + * Usage: SELECT * FROM bt_page('t1_pkey', 0); + * ----------------------------------------------- + */ +Datum +bt_page_stats(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(rel, blkno); + + buffer = ReadBuffer(rel, blkno); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "bt_page_stats() can be used only on b-tree index."); + + if (blkno == 0) + elog(ERROR, "Block 0 is a meta page."); + + { + HeapTuple tuple; + TupleDesc tupleDesc; + int j; + char *values[BTPAGESTATS_NCOLUMNS]; + + BTPageStat stat; + + GetBTPageStatistics(blkno, buffer, &stat); + + tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.blkno); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", stat.type); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.live_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.dead_items); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.avg_item_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.page_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.free_size); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_prev); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_next); + + values[j] = palloc(32); + if (stat.type == 'd') + snprintf(values[j++], 32, "%d", stat.btpo.xact); + else + snprintf(values[j++], 32, "%d", stat.btpo.level); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", stat.btpo_flags); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/*------------------------------------------------------- + * bt_page_items() + * + * Get IndexTupleData set in a leaf page + * + * Usage: SELECT * FROM bt_page_items('t1_pkey', 0); + *------------------------------------------------------- + */ +/* --------------------------------------------------- + * data structure for SRF to hold a scan information + * --------------------------------------------------- + */ +struct user_args +{ + TupleDesc tupd; + Relation rel; + Buffer buffer; + Page page; + uint16 offset; +}; + +Datum +bt_page_items(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + uint32 blkno = PG_GETARG_UINT32(1); + + RangeVar *relrv; + Datum result; + char *values[BTPAGEITEMS_NCOLUMNS]; + BTPageOpaque opaque; + HeapTuple tuple; + ItemId id; + + FuncCallContext *fctx; + MemoryContext mctx; + struct user_args *uargs = NULL; + + if (blkno == 0) + elog(ERROR, "Block 0 is a meta page."); + + if (SRF_IS_FIRSTCALL()) + { + fctx = SRF_FIRSTCALL_INIT(); + mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx); + + uargs = palloc(sizeof(struct user_args)); + + uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE); + uargs->offset = FirstOffsetNumber; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + uargs->rel = relation_openrv(relrv, AccessShareLock); + + CHECK_RELATION_BLOCK_RANGE(uargs->rel, blkno); + + uargs->buffer = ReadBuffer(uargs->rel, blkno); + + if (!IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel)) + elog(ERROR, "bt_page_items() can be used only on b-tree index."); + + uargs->page = BufferGetPage(uargs->buffer); + + opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page); + + if (P_ISDELETED(opaque)) + elog(NOTICE, "bt_page_items(): this page is deleted."); + + fctx->max_calls = PageGetMaxOffsetNumber(uargs->page); + fctx->user_fctx = uargs; + + MemoryContextSwitchTo(mctx); + } + + fctx = SRF_PERCALL_SETUP(); + uargs = fctx->user_fctx; + + if (fctx->call_cntr < fctx->max_calls) + { + IndexTuple itup; + + id = PageGetItemId(uargs->page, uargs->offset); + + if (!ItemIdIsValid(id)) + elog(ERROR, "Invalid ItemId."); + + itup = (IndexTuple) PageGetItem(uargs->page, id); + + { + int j = 0; + + BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid)); + + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", uargs->offset); + values[j] = palloc(32); + snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", IndexTupleSize(itup)); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f'); + values[j] = palloc(32); + snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f'); + + { + int off; + char *dump; + char *ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info); + + dump = palloc(IndexTupleSize(itup) * 3); + memset(dump, 0, IndexTupleSize(itup) * 3); + + for (off = 0; + off < IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info); + off++) + { + if (dump[0] == '\0') + sprintf(dump, "%02x", *(ptr + off) & 0xff); + else + { + char buf[4]; + + sprintf(buf, " %02x", *(ptr + off) & 0xff); + strcat(dump, buf); + } + } + values[j] = dump; + } + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values); + result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple); + } + + uargs->offset = uargs->offset + 1; + + SRF_RETURN_NEXT(fctx, result); + } + else + { + ReleaseBuffer(uargs->buffer); + relation_close(uargs->rel, AccessShareLock); + + SRF_RETURN_DONE(fctx); + } +} + + +/* ------------------------------------------------ + * bt_metap() + * + * Get a btree meta-page information + * + * Usage: SELECT * FROM bt_metap('t1_pkey') + * ------------------------------------------------ + */ +Datum +bt_metap(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + Buffer buffer; + + Relation rel; + RangeVar *relrv; + Datum result; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + if (!IS_INDEX(rel) || !IS_BTREE(rel)) + elog(ERROR, "bt_metap() can be used only on b-tree index."); + + buffer = ReadBuffer(rel, 0); + + { + BTMetaPageData *metad; + + TupleDesc tupleDesc; + int j; + char *values[BTMETAP_NCOLUMNS]; + HeapTuple tuple; + + Page page = BufferGetPage(buffer); + + metad = BTPageGetMeta(page); + + tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE); + + j = 0; + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_magic); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_version); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_root); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_level); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastroot); + values[j] = palloc(32); + snprintf(values[j++], 32, "%d", metad->btm_fastlevel); + + tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc), + values); + + result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple); + } + + ReleaseBuffer(buffer); + + relation_close(rel, AccessShareLock); + + PG_RETURN_DATUM(result); +} + +/* -------------------------------------------------------- + * pg_relpages() + * + * Get a number of pages of the table/index. + * + * Usage: SELECT pg_relpages('t1'); + * SELECT pg_relpages('t1_pkey'); + * -------------------------------------------------------- + */ +Datum +pg_relpages(PG_FUNCTION_ARGS) +{ + text *relname = PG_GETARG_TEXT_P(0); + + Relation rel; + RangeVar *relrv; + int4 relpages; + + relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); + rel = relation_openrv(relrv, AccessShareLock); + + relpages = RelationGetNumberOfBlocks(rel); + + relation_close(rel, AccessShareLock); + + PG_RETURN_INT32(relpages); +} diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in --- pgstattuple.orig/pgstattuple.sql.in 2006-02-28 01:09:50.000000000 +0900 +++ pgstattuple/pgstattuple.sql.in 2006-08-14 10:37:32.000000000 +0900 @@ -22,3 +22,96 @@ RETURNS pgstattuple_type AS 'MODULE_PATHNAME', 'pgstattuplebyid' LANGUAGE C STRICT; + +-- +-- pgstatindex +-- +DROP TYPE pgstatindex_type CASCADE; +CREATE TYPE pgstatindex_type AS ( + version int4, + tree_level int4, + index_size int4, + root_block_no int4, + internal_pages int4, + leaf_pages int4, + empty_pages int4, + deleted_pages int4, + avg_leaf_density float8, + leaf_fragmentation float8 +); + +CREATE OR REPLACE FUNCTION pgstatindex(text) +RETURNS pgstatindex_type +AS 'MODULE_PATHNAME', 'pgstatindex' +LANGUAGE 'C' STRICT; + +-- +-- bt_metap() +-- +DROP TYPE bt_metap_type CASCADE; +CREATE TYPE bt_metap_type AS ( + magic int4, + version int4, + root int4, + level int4, + fastroot int4, + fastlevel int4 +); + +CREATE OR REPLACE FUNCTION bt_metap(text) +RETURNS bt_metap_type +AS 'MODULE_PATHNAME', 'bt_metap' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_items() +-- +DROP TYPE bt_page_items_type CASCADE; +CREATE TYPE bt_page_items_type AS ( + itemoffset int4, + ctid tid, + itemlen int4, + nulls bool, + vars bool, + data text +); + +DROP FUNCTION bt_page_items(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_items(text, int4) +RETURNS SETOF bt_page_items_type +AS 'MODULE_PATHNAME', 'bt_page_items' +LANGUAGE 'C' STRICT; + +-- +-- bt_page_stats() +-- +DROP TYPE bt_page_stats_type CASCADE; +CREATE TYPE bt_page_stats_type AS ( + blkno int4, + type char, + live_items int4, + dead_items int4, + avg_item_size float, + page_size int4, + free_size int4, + btpo_prev int4, + btpo_next int4, + btpo int4, + btpo_flags int4 +); + +DROP FUNCTION bt_page_stats(text, int4); + +CREATE OR REPLACE FUNCTION bt_page_stats(text, int4) +RETURNS bt_page_stats_type +AS 'MODULE_PATHNAME', 'bt_page_stats' +LANGUAGE 'C' STRICT; + +-- +-- pg_relpages() +-- +CREATE OR REPLACE FUNCTION pg_relpages(text) +RETURNS int +AS 'MODULE_PATHNAME', 'pg_relpages' +LANGUAGE 'C' STRICT; diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh --- pgstattuple.orig/test.sh 1970-01-01 09:00:00.000000000 +0900 +++ pgstattuple/test.sh 2006-08-14 10:23:08.000000000 +0900 @@ -0,0 +1,27 @@ +#!/bin/sh + +PGHOME=/home/snaga/pgsql20060814 +export PATH=${PGHOME}/bin:$PATH + +psql -p 9999 pgbench<<EOF +DROP FUNCTION pgstatindex(text); +EOF + +psql -p 9999 -f ${PGHOME}/share/contrib/pgstattuple.sql pgbench + +psql -p 9999 pgbench<<EOF +SELECT * FROM pg_relpages('accounts_pkey'); +\x +SELECT * FROM pgstatindex('accounts_pkey'); +SELECT * FROM bt_metap('accounts_pkey'); +\x +SELECT * FROM bt_page_items('accounts_pkey', 0); +SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20; + +SELECT * FROM bt_page_stats('accounts_pkey', 1); +\x +SELECT * FROM bt_page_stats('accounts_pkey', 1); +SELECT * FROM bt_page_stats('accounts_pkey', 361); +\x +SELECT * FROM bt_page_items('accounts_pkey', 361); +EOF------------------------------------------------------------------------
pgstattuple README 2002/08/29 Tatsuo Ishii
1. Functions supported:
pgstattuple
-----------
pgstattuple() returns the relation length, percentage of the "dead"
tuples of a relation and other info. This may help users to determine
whether vacuum is necessary or not. Here is an example session:test=> \x
Expanded display is on.
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95Here are explanations for each column:
table_len -- physical relation length in bytes
tuple_count -- number of live tuples
tuple_len -- total tuples length in bytes
tuple_percent -- live tuples in %
dead_tuple_len -- total dead tuples length in bytes
dead_tuple_percent -- dead tuples in %
free_space -- free space in bytes
free_percent -- free space in %pg_relpages
-----------
pg_relpages() returns the number of pages in the relation.pgstatindex
-----------
pgstatindex() returns an array showing the information about an index:test=> \x
Expanded display is on.
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0bt_metap
--------
bt_metap() returns information about the btree index metapage:test=> SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----
magic | 340322
version | 2
root | 1
level | 0
fastroot | 1
fastlevel | 0bt_page_stats
-------------
bt_page_stats() shows information about single btree pages:test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
dead_items | 0
avg_item_size | 12
page_size | 8192
free_size | 4056
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3bt_page_items
-------------
bt_page_items() returns information about specific items on btree pages:test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 002. Installing pgstattuple
$ make
$ make install
$ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test3. Using pgstattuple
pgstattuple may be called as a relation function and is
defined as follows:CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;CREATE OR REPLACE FUNCTION pgstattuple(oid) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE C STRICT;The argument is the relation name (optionally it may be qualified)
or the OID of the relation. Note that pgstattuple only returns
one row.4. Notes
pgstattuple acquires only a read lock on the relation. So concurrent
update may affect the result.pgstattuple judges a tuple is "dead" if HeapTupleSatisfiesNow()
returns false.5. History
2006/06/28
Extended to work against indexes.
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122
$PostgreSQL: pgsql/contrib/pgstattuple/README.pgstattuple.euc_jp,v 1.7 2006/07/06 02:18:07 momjian Exp $
pgstattuple README 2002/08/22 ����������������
1. ��������������������������������������������
pgstattuple
-----------
pgstattuple��������UPDATE������DELETE������������������������������������������������������������������������������
������������������������������������������������������������������������������������������������������������������������������������������������
����������������������������������������������������������������������������������������������������vacuum������������������
�������������������������������������������������������������������������������������������������������������������������������������
�������������������������
pgstattuple() ������������������������������������������������������������������"��������������������"������������������
����������������������������������������������������������������������vacuum ��������������������������������������������������������������
����������������������������������������������������������������������������������������������������test=> \x
Expanded display is on.
test=> SELECT * FROM pgstattuple('pg_catalog.pg_proc');
-[ RECORD 1 ]------+-------
table_len | 458752
tuple_count | 1470
tuple_len | 438896
tuple_percent | 95.67
dead_tuple_count | 11
dead_tuple_len | 3157
dead_tuple_percent | 0.69
free_space | 8932
free_percent | 1.95����������������������������������������������������
table_len -- �������������������������������������������������������(��������������)
tuple_count -- �����������������
tuple_len -- ���������������������������(��������������)
tuple_percent -- ����������������������table_len��������������������tuple_len��������������������
dead_tuple_len -- �����������������������
dead_tuple_percent -- ����������������������������table_len��������������������tuple_len��������������������
free_space -- ������������������������������������(��������������)
free_percent -- ������������������������������������table_len��������������������free_space��������������������pg_relpages
-----------
pg_relpages() ���������������������������������������������������������������������������pgstatindex
-----------
pgstatindex() �������������������������������������������������������������������������������test=> \x
Expanded display is on.
test=> SELECT * FROM pgstatindex('pg_cast_oid_index');
-[ RECORD 1 ]------+------
version | 2
tree_level | 0
index_size | 8192
root_block_no | 1
internal_pages | 0
leaf_pages | 1
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 50.27
leaf_fragmentation | 0bt_metap
--------
bt_metap() ������btree���������������������������������������������������������������������������������������������������test=> SELECT * FROM bt_metap('pg_cast_oid_index');
-[ RECORD 1 ]-----
magic | 340322
version | 2
root | 1
level | 0
fastroot | 1
fastlevel | 0bt_page_stats
-------------
bt_page_stats() ��������btree�����������������������������������������������������������������������������������test=> SELECT * FROM bt_page_stats('pg_cast_oid_index', 1);
-[ RECORD 1 ]-+-----
blkno | 1
type | l
live_items | 256
dead_items | 0
avg_item_size | 12
page_size | 8192
free_size | 4056
btpo_prev | 0
btpo_next | 0
btpo | 0
btpo_flags | 3bt_page_items
-------------
bt_page_items() ��������������btree���������������������������������������������������������������������������������������������������test=> SELECT * FROM bt_page_items('pg_cast_oid_index', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+---------+---------+-------+------+-------------
1 | (0,1) | 12 | f | f | 23 27 00 00
2 | (0,2) | 12 | f | f | 24 27 00 00
3 | (0,3) | 12 | f | f | 25 27 00 00
4 | (0,4) | 12 | f | f | 26 27 00 00
5 | (0,5) | 12 | f | f | 27 27 00 00
6 | (0,6) | 12 | f | f | 28 27 00 00
7 | (0,7) | 12 | f | f | 29 27 00 00
8 | (0,8) | 12 | f | f | 2a 27 00 002. pgstattuple��������������������������
PostgreSQL������/usr/local/pgsql���������������������������������������test����������������������
������������pgstattuple���������������������������������������������������������������$ make
$ make install���������������������������������������������������������������
$ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test
3. pgstattuple����������������
pgstattuple��������������������������������������������������
CREATE OR REPLACE FUNCTION pgstattuple(text) RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuple'
LANGUAGE C STRICT;������������������: ��������������������������
��������������������������pgstattuple_type��������������������
pgstattuple�����������������������������AccessShareLock������������������������������������������
pgstattuple ����������������������������������������������������������������������������������������������������������������������������
����������������������������������������������������������������������pgstattuple�����������������������������������������������������������������
HeapTupleSatisfiesNow()������������������������������������������������������4. pgstattuple����������������������������������������
pgstattuple.c�������������������������������������������������������������������pgstattuple ����������������������������
����������������pgstattuple ����������������������������������������������������������������������������������������������
����������������������������������������5. ������������������������
2006/06/28
��������������������������������������������������������������������������������
2002/09/04
SRF����������������������������Tom Lane ���������������������������������������������������������������������������������������
��������������������������README������������������2002/08/23
SRF(Set Returning Function)�������������7.3��������������������������
2001/12/20 Tom Lane�����������������
Fix pgstattuple to acquire a read lock on the target table. This
prevents embarassments such as having the table dropped or truncated
partway through the scan. Also, fix free space calculation to include
pages that currently contain no tuples.2001/10/01 PostgreSQL 7.2 ������contrib module��������������
2001/08/30 pgstattuple ����������������������� 0.1���������������
-- Adjust this setting to control where the objects get created.
SET search_path = public;DROP FUNCTION pgstattuple(oid);
DROP FUNCTION pgstattuple(text);
DROP TYPE pgstattuple_type;
DROP FUNCTION pgstatindex(text);
DROP FUNCTION bt_metap(text);
DROP FUNCTION bt_page_items(text, int4);
DROP FUNCTION bt_page_stats(text, int4);
DROP FUNCTION pg_relpages(text);DROP TYPE pgstatindex_type;
DROP TYPE bt_metap_type;
DROP TYPE bt_page_items_type;
DROP TYPE bt_page_stats_type;
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +