pgstattuple extension for indexes

Started by ITAGAKI Takahiroabout 20 years ago48 messageshackerspatches

itagaki.takahiro@oss.ntt.co.jp

about 20 years ago

hackerspatches

This is an extension of pgstattuple to query information from indexes.
It supports btree, hash and gist. Gin is not supported.
It scans only index pages and does not read corresponding heap tuples.
Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.

Suggestions welcome.

----
$ pgbench -i
$ pgbench -n -t 100 -c 10
# select * from pgstattuple('accounts_pkey');
NOTICE: 0.36% fragmented
HINT: continuous=273, forward=1, backward=0
-[ RECORD 1 ]------+--------
table_len | 2260992
tuple_count | 100996 -- 996 tuples are dead practically,
tuple_len | 1615936 but no LP_DELETE yet.
tuple_percent | 71.47
dead_tuple_count | 4
dead_tuple_len | 64 -- 64 tuples are marked as LP_DELETE.
dead_tuple_percent | 0
free_space | 208188
free_percent | 9.21
----

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Bruce Momjian

bruce@momjian.us

about 20 years ago

In reply to: ITAGAKI Takahiro (#1)

hackerspatches

Re: pgstattuple extension for indexes

Patch applied. Thanks.

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:

This is an extension of pgstattuple to query information from indexes.
It supports btree, hash and gist. Gin is not supported.
It scans only index pages and does not read corresponding heap tuples.
Therefore, 'dead_tuple' means the number of tuples with LP_DELETE flag.

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.

Suggestions welcome.

----
$ pgbench -i
$ pgbench -n -t 100 -c 10
# select * from pgstattuple('accounts_pkey');
NOTICE: 0.36% fragmented
HINT: continuous=273, forward=1, backward=0
-[ RECORD 1 ]------+--------
table_len | 2260992
tuple_count | 100996 -- 996 tuples are dead practically,
tuple_len | 1615936 but no LP_DELETE yet.
tuple_percent | 71.47
dead_tuple_count | 4
dead_tuple_len | 64 -- 64 tuples are marked as LP_DELETE.
dead_tuple_percent | 0
free_space | 208188
free_percent | 9.21
----

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: Bruce Momjian (#2)

hackerspatches

Re: pgstattuple extension for indexes

Bruce Momjian <bruce@momjian.us> writes:

Patch applied. Thanks.

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes. If an leaf has the right link on the next
adjacent page in the file, it is assumed to be continuous (not fragmented).
It will help us to decide when to REINDEX.

This was done in an entirely unacceptable way, to wit

ereport(NOTICE,
(errmsg("%.2f%% fragmented",
100.0 * (stat.forward + stat.backward) /
(stat.continuous + stat.forward + stat.backward)),
errhint("continuous=%llu, forward=%llu, backward=%llu",
stat.continuous, stat.forward, stat.backward)));

The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.

The lesser problem that drew my attention is that %llu is unportable
(and in fact draws gcc warnings for me; did you ignore that?). But
using UINT64_FORMAT instead would create a headache for translation
because the string would vary across platforms.

I won't bother correcting the violations of message style guidelines,
because this code simply has to go away.

regards, tom lane

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 20 years ago

In reply to: Tom Lane (#3)

hackerspatches

Re: pgstattuple extension for indexes

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes.

The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.

Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.

BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Alvaro Herrera

alvherre@2ndquadrant.com

about 20 years ago

In reply to: ITAGAKI Takahiro (#4)

hackerspatches

Re: pgstattuple extension for indexes

ITAGAKI Takahiro wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes.

The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.

Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.

BTW while you're handling this, why not change the function to use OUT
parameters instead of having to CREATE TYPE to handle the return type?
I think it is easier to handle ...

One question I had was: in the percentage of fragmentation, is higher
better or lower better? (I'm guessing lower is better, but this should
be mentioned in the docs)

BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?

You mean having VACUUM VERBOSE return a result set?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: Alvaro Herrera (#5)

hackerspatches

Re: pgstattuple extension for indexes

Alvaro Herrera <alvherre@commandprompt.com> writes:

ITAGAKI Takahiro wrote:

BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?

You mean having VACUUM VERBOSE return a result set?

To me, the point of VACUUM VERBOSE is mostly to give you some
reassurance that it's making progress. If it were returning rows
instead of notice messages, you'd lose that functionality (at least
in libpq-based clients). In any case, autovacuum has other ways
of getting the information without needing a change in user-visible
behavior.

regards, tom lane

Satoshi Nagayasu

nagayasus@nttdata.co.jp

about 20 years ago

In reply to: ITAGAKI Takahiro (#4)

hackerspatches

Re: pgstattuple extension for indexes

Hi,

I'm working on an utility for b-tree index, called `pgstatindex`.

pgbench=#
----------------------------------------------------------------

I want to make this to contrib module like a pgstattuple,
and to make this open to public in a few days.

Do you think this is useful?

2006/7/24, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>:

Show quoted text

Tom Lane <tgl@sss.pgh.pa.us> wrote:

Also, I added an experimental feature for btree indexes. It checks
fragmentation factor of indexes.

The really serious problem with reporting this info via NOTICE is that
there's no way for a program to get its hands on the info. The output
tuple format needs to be extended instead.

Ok, I added 'fragmented_percent' field to the output tuple. This
information will help us to decide when to do REINDEX.
However, it is only avaliable for btree index presently. Other indexes
should have equivalent information, but I don't know how to determine it.

BTW, should we change VACUUM VERBOSE in the same way? If we do so,
autovacuum can handle the reports of VACUUM VERBOSE and plan when to
do VACUUM FULL, REINDEX and/or CLUSTER using the information.
Is this worth doing?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Bruce Momjian

bruce@momjian.us

about 20 years ago

In reply to: Satoshi Nagayasu (#7)

hackerspatches

Re: pgstattuple extension for indexes

Satoshi Nagayasu wrote:

Hi,

I'm working on an utility for b-tree index, called `pgstatindex`.

It reports b-tree index statistics like a pgstattuple as below.
----------------------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey1');
-[ RECORD 1 ]------+---------
version | 2
tree_level | 2
index_size | 17956864
root_block_no | 361
internal_pages | 8
leaf_pages | 2184
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 90.07
leaf_fragmentation | 0

pgbench=#
----------------------------------------------------------------

I want to make this to contrib module like a pgstattuple,
and to make this open to public in a few days.

Do you think this is useful?

Yes, for performance debugging, I think.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Alvaro Herrera

alvherre@2ndquadrant.com

about 20 years ago

In reply to: Satoshi Nagayasu (#7)

hackerspatches

Re: pgstattuple extension for indexes

Satoshi Nagayasu wrote:

Hi,

I'm working on an utility for b-tree index, called `pgstatindex`.

Does it make sense to merge the pgstatindex stuff with pgstattuple, and
have the fragmentation report into pgstatindex instead of pgstattuple
itself?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

about 20 years ago

In reply to: Alvaro Herrera (#9)

hackerspatches

Re: pgstattuple extension for indexes

Alvaro Herrera <alvherre@commandprompt.com> wrote:

Satoshi Nagayasu wrote:

I'm working on an utility for b-tree index, called `pgstatindex`.

Does it make sense to merge the pgstatindex stuff with pgstattuple, and
have the fragmentation report into pgstatindex instead of pgstattuple
itself?

It sounds good. We will have two separate commands:
- pgstattuple: returns tuple-level information
- pgstatindex: returns page-level information

We can use tuple-level info to check LP_DELETE flags on index tuples,
and use page-level info to check needs for REINDEX.

Do we add pgstatindex as a new contrib module,
or merge it into contrib/pgstattuple?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

#11

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: ITAGAKI Takahiro (#10)

hackerspatches

Re: pgstattuple extension for indexes

ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:

Do we add pgstatindex as a new contrib module,
or merge it into contrib/pgstattuple?

I believe Alvaro was suggesting that you should add it as an additional
SQL function within contrib/pgstattuple. That'd be my advice too ---
I don't see a reason to break this out as a separate contrib module.

regards, tom lane

#12

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Tom Lane (#11)

hackerspatches

Re: pgstattuple extension for indexes

Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

#13

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#12)

hackerspatches

Re: pgstattuple extension for indexes

I thought these new functions were going to be merged into
/contrib/pgstattuple.

---------------------------------------------------------------------------

satoshi nagayasu wrote:

Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------

[ application/x-gzip is not supported, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#14

Alvaro Herrera

alvherre@2ndquadrant.com

almost 20 years ago

In reply to: Bruce Momjian (#13)

hackerspatches

Re: [HACKERS] pgstattuple extension for indexes

Bruce Momjian wrote:

I thought these new functions were going to be merged into
/contrib/pgstattuple.

Well, that's exactly what this patch seems to do ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#15

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Alvaro Herrera (#14)

hackerspatches

Re: [HACKERS] pgstattuple extension for indexes

Alvaro Herrera wrote:

Bruce Momjian wrote:

I thought these new functions were going to be merged into
/contrib/pgstattuple.

Well, that's exactly what this patch seems to do ...

Well, looking at the tarball it puts everything in pgstatindex, and the
Makefile is:

#-------------------------------------------------------------------------
#
# pgstatindex Makefile
#
# $PostgreSQL$
#
#-------------------------------------------------------------------------

SRCS = pgstatindex.c

MODULE_big = pgstatindex
OBJS = $(SRCS:.c=.o)
DOCS =
DATA_built = pgstatindex.sql

ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
include $(PGXS)
else
subdir = contrib/pgstatindex
top_builddir = /home/snaga/pgsql/sources/postgresql-8.1.3
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#12)

hackerspatches

Re: pgstattuple extension for indexes

nagayasu-san,

This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.

---------------------------------------------------------------------------

satoshi nagayasu wrote:

Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------

[ application/x-gzip is not supported, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#17

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Bruce Momjian (#16)

hackerspatches

Re: pgstattuple extension for indexes

Bruce,

I'll fix it in this week. Please wait a few days.
Thanks.

Bruce Momjian wrote:

nagayasu-san,

This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.

---------------------------------------------------------------------------

satoshi nagayasu wrote:

Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------

[ application/x-gzip is not supported, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

#18

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Satoshi Nagayasu (#17)

hackerspatches

Re: pgstattuple extension for indexes

Hi all,

Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.

Satoshi Nagayasu wrote:

Bruce,

I'll fix it in this week. Please wait a few days.
Thanks.

Bruce Momjian wrote:

nagayasu-san,

This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.

---------------------------------------------------------------------------

satoshi nagayasu wrote:

Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------

[ application/x-gzip is not supported, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

#19

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#18)

hackerspatches

Re: pgstattuple extension for indexes

I don't see any documentation, so I assume you want me to add something
to README.pgstattuple.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Satoshi Nagayasu wrote:

Hi all,

Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.

Satoshi Nagayasu wrote:

Bruce,

I'll fix it in this week. Please wait a few days.
Thanks.

Bruce Momjian wrote:

nagayasu-san,

This looks good, but we would like the code added to
/contrib/pgstattuple, rather than it being its own /contrib module. Can
you make that adjustment? Thanks.

---------------------------------------------------------------------------

satoshi nagayasu wrote:

Hi folks,

As I said on -PATCHES, I've been working on an utility to get
a b-tree index information. I'm happy to introduce
my new functions to you.

pgstattuple module provides a `pgstatindex()`, and other small
functions, which allow you to get b-tree internal information.
I believe this module will be helpful to know b-tree index deeply.

So please try it, send comment to me, and have fun.

Thanks,
--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

-----------------------------------------------------
pgbench=# \x
Expanded display is on.
pgbench=# SELECT * FROM pgstatindex('accounts_pkey');
-[ RECORD 1 ]------+--------
version | 2
tree_level | 1
index_size | 3588096
root_block_no | 3
internal_pages | 0
leaf_pages | 437
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 59.5
leaf_fragmentation | 49.89
-----------------------------------------------------

[ application/x-gzip is not supported, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>
Phone: +81-3-3523-8122

diff -ruN pgstattuple.orig/Makefile pgstattuple/Makefile
--- pgstattuple.orig/Makefile	2006-08-10 19:22:47.000000000 +0900
+++ pgstattuple/Makefile	2006-08-10 19:24:05.000000000 +0900
@@ -6,7 +6,7 @@
#
#-------------------------------------------------------------------------

-SRCS		= pgstattuple.c
+SRCS		= pgstattuple.c pgstatindex.c

MODULE_big	= pgstattuple
OBJS		= $(SRCS:.c=.o)
diff -ruN pgstattuple.orig/pgstatindex.c pgstattuple/pgstatindex.c
--- pgstattuple.orig/pgstatindex.c	1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/pgstatindex.c	2006-08-11 17:51:26.000000000 +0900
@@ -0,0 +1,714 @@
+/*
+ * pgstatindex
+ *
+ * Copyright (c) 2006 Satoshi Nagayasu <nagayasus@nttdata.co.jp>
+ *
+ * Permission to use, copy, modify, and distribute this software and
+ * its documentation for any purpose, without fee, and without a
+ * written agreement is hereby granted, provided that the above
+ * copyright notice and this paragraph and the following two
+ * paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR BE LIABLE TO ANY PARTY FOR DIRECT,
+ * INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS
+ * IS" BASIS, AND THE AUTHOR HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
+ * SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+ */
+
+#include "postgres.h"
+
+#include "fmgr.h"
+#include "funcapi.h"
+#include "access/heapam.h"
+#include "access/itup.h"
+#include "access/nbtree.h"
+#include "access/transam.h"
+#include "catalog/namespace.h"
+#include "catalog/pg_type.h"
+#include "utils/builtins.h"
+#include "utils/inval.h"
+
+PG_FUNCTION_INFO_V1(pgstatindex);
+PG_FUNCTION_INFO_V1(bt_metap);
+PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_stats);
+PG_FUNCTION_INFO_V1(pg_relpages);
+
+extern Datum pgstatindex(PG_FUNCTION_ARGS);
+extern Datum bt_metap(PG_FUNCTION_ARGS);
+extern Datum bt_page_items(PG_FUNCTION_ARGS);
+extern Datum bt_page_stats(PG_FUNCTION_ARGS);
+extern Datum pg_relpages(PG_FUNCTION_ARGS);
+
+#define PGSTATINDEX_TYPE "public.pgstatindex_type"
+#define PGSTATINDEX_NCOLUMNS 10
+
+#define BTMETAP_TYPE "public.bt_metap_type"
+#define BTMETAP_NCOLUMNS 6
+
+#define BTPAGEITEMS_TYPE "public.bt_page_items_type"
+#define BTPAGEITEMS_NCOLUMNS 6
+
+#define BTPAGESTATS_TYPE "public.bt_page_stats_type"
+#define BTPAGESTATS_NCOLUMNS 12
+
+
+#define IS_INDEX(r) ((r)->rd_rel->relkind == 'i')
+#define IS_BTREE(r) ((r)->rd_rel->relam == BTREE_AM_OID)
+
+#define CHECK_PAGE_OFFSET_RANGE(page, offset) { \
+        if ( !(FirstOffsetNumber<=(offset) && \
+                        (offset)<=PageGetMaxOffsetNumber(page)) ) \
+             elog(ERROR, "Page offset number out of range."); }
+
+#define CHECK_RELATION_BLOCK_RANGE(rel, blkno) { \
+        if ( (blkno)<0 && RelationGetNumberOfBlocks((rel))<=(blkno) ) \
+             elog(ERROR, "Block number out of range."); }
+
+/* ------------------------------------------------
+ * structure for single btree page statistics
+ * ------------------------------------------------
+ */
+typedef struct BTPageStat {
+	uint32 blkno;
+	uint32 live_items;
+	uint32 dead_items;
+	uint32 page_size;
+	uint32 max_avail;
+	uint32 free_size;
+	uint32 avg_item_size;
+	uint32 fragments;
+	bool is_root;
+	bool is_internal;
+	bool is_leaf;
+	bool is_deleted;
+	bool is_empty;
+} BTPageStat;
+
+/* ------------------------------------------------
+ * A structure for a whole btree index statistics
+ * used by pgstatindex().
+ * ------------------------------------------------
+ */
+typedef struct BTIndexStat {
+	uint32 version;
+	BlockNumber root_blkno;
+	uint32 level;
+
+	uint32 live_items;
+	uint32 dead_items;
+
+	uint32 root_pages;
+	uint32 internal_pages;
+	uint32 leaf_pages;
+	uint32 empty_pages;
+	uint32 deleted_pages;
+
+	uint32 page_size;
+	uint32 avg_item_size;
+
+	uint32 max_avail;
+	uint32 free_space;
+
+	uint32 fragments;
+} BTIndexStat;
+
+/* -------------------------------------------------
+ * GetBTPageStatistics()
+ *
+ * Collect statistics of single b-tree leaf page
+ * -------------------------------------------------
+ */
+static bool
+GetBTPageStatistics(BlockNumber blkno, Buffer buffer, BTPageStat *stat)
+{
+	Page page           = BufferGetPage(buffer);
+	PageHeader phdr     = (PageHeader) page;
+	OffsetNumber maxoff = PageGetMaxOffsetNumber(page);
+	BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
+	int item_size = 0;
+	int off;
+
+	stat->blkno = blkno;
+
+	stat->max_avail = BLCKSZ - ( BLCKSZ - phdr->pd_special + SizeOfPageHeaderData );
+
+	stat->dead_items = stat->live_items = 0;
+
+	stat->page_size = PageGetPageSize(page);
+
+	/* page type */
+	stat->is_root    = false;
+	stat->is_leaf    = false;
+	stat->is_deleted = false;
+	stat->is_empty   = false;
+
+	if ( P_ISDELETED(opaque) )
+	{
+		stat->is_deleted = true;
+		return true;
+	}
+	else if ( P_IGNORE(opaque) )
+		stat->is_empty = true;
+	else if ( P_ISLEAF(opaque) )
+		stat->is_leaf = true;
+	else if ( P_ISROOT(opaque) )
+		stat->is_root = true;
+	else
+		stat->is_internal = true;
+
+	/*----------------------------------------------
+	 * If a next leaf is on the previous block,
+	 * it means a fragmentation.
+	 *----------------------------------------------
+	 */
+	stat->fragments = 0;
+	if ( stat->is_leaf )
+	{
+		if ( opaque->btpo_next != P_NONE && opaque->btpo_next < blkno )
+			stat->fragments++;
+	}
+
+	/* count live and dead tuples, and free space */
+	for (off=FirstOffsetNumber ; off<=maxoff ; off++)
+	{
+		BTItem btitem;
+		IndexTuple itup;
+
+		ItemId id = PageGetItemId(page, off);
+
+		btitem = (BTItem)PageGetItem(page, id);
+
+		itup = &(btitem->bti_itup);
+
+		item_size += IndexTupleSize(itup);
+
+		if ( !ItemIdDeleted(id) )
+			stat->live_items++;
+		else
+			stat->dead_items++;
+	}
+	stat->free_size = PageGetFreeSpace(page);
+
+#ifdef NOT_USED
+	elog(NOTICE, "%d/%d (%.2f%%)", stat->free_size, stat->max_avail,
+		 (float)stat->free_size/(float)stat->max_avail*100.0);
+#endif
+		
+	if ( (stat->live_items + stat->dead_items) > 0 )
+		stat->avg_item_size = item_size / ( stat->live_items + stat->dead_items );
+	else
+		stat->avg_item_size = 0;
+
+	return true;
+}
+
+
+/* ------------------------------------------------------
+ * pgstatindex()
+ *
+ * Usage: SELECT * FROM pgstatindex('t1_pkey');
+ * ------------------------------------------------------ 
+ */
+Datum
+pgstatindex(PG_FUNCTION_ARGS)
+{
+	text	   *relname = PG_GETARG_TEXT_P(0);
+	Relation	rel;
+	RangeVar    *relrv;
+	Datum		result;
+	uint32		nblocks;
+	uint32		blkno;
+	BTIndexStat indexStat;
+
+	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+	rel = relation_openrv(relrv, AccessShareLock);
+
+	if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+		elog(ERROR, "pgstatindex() can be used only on b-tree index.");
+
+	/*-------------------
+	 * Read a metapage
+	 *-------------------
+	 */
+	{
+		Buffer buffer = ReadBuffer(rel, 0);
+		Page page = BufferGetPage(buffer);
+		BTMetaPageData *metad = BTPageGetMeta(page);
+
+//		snprintf(values[j++], 32, "%d", metad->btm_magic);
+		indexStat.version    = metad->btm_version;
+		indexStat.root_blkno = metad->btm_root;
+		indexStat.level      = metad->btm_level;
+//		snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+//		snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+		ReleaseBuffer(buffer);
+	}
+
+	nblocks = RelationGetNumberOfBlocks(rel);
+
+	/* -- init stat -- */
+	indexStat.fragments      = 0;
+
+	indexStat.root_pages     = 0;
+	indexStat.leaf_pages     = 0;
+	indexStat.internal_pages = 0;
+	indexStat.empty_pages    = 0;
+	indexStat.deleted_pages  = 0;
+
+	indexStat.max_avail  = 0;
+	indexStat.free_space = 0;
+
+	/*-----------------------
+	 * Scan all blocks
+	 *-----------------------
+	 */
+	for (blkno=1 ; blkno<nblocks ; blkno++)
+	{
+		Buffer buffer = ReadBuffer(rel, blkno);
+		BTPageStat stat;
+		
+		/* scan one page */
+		stat.blkno = blkno;
+		GetBTPageStatistics(blkno, buffer, &stat);
+
+		/*---------------------
+		 * page status (type)
+		 *---------------------
+		 */
+		if ( stat.is_deleted )
+			indexStat.deleted_pages++;
+		else if ( stat.is_leaf )
+			indexStat.leaf_pages++;
+		else if ( stat.is_internal )
+			indexStat.internal_pages++;
+		else if ( stat.is_empty )
+			indexStat.empty_pages++;
+		else if ( stat.is_root )
+			indexStat.root_pages++;
+		else
+			elog(ERROR, "unknown page status.");
+
+		/* -- leaf fragmentation -- */
+		indexStat.fragments += stat.fragments;
+
+		if ( stat.is_leaf )
+		{
+			indexStat.max_avail  += stat.max_avail;
+			indexStat.free_space += stat.free_size;
+		}
+
+		ReleaseBuffer(buffer);
+	}
+
+	relation_close(rel, AccessShareLock);
+
+#ifdef NOT_USED
+	elog(NOTICE, "[index information]");
+	elog(NOTICE, "version.................: %d", indexStat.version);
+	elog(NOTICE, "tree level..............: %d", indexStat.level);
+	elog(NOTICE, "index size..............: %d", (indexStat.root_pages +
+									indexStat.leaf_pages +
+									indexStat.internal_pages +
+									indexStat.deleted_pages +
+									indexStat.empty_pages) * BLCKSZ);
+
+	elog(NOTICE, "");
+	elog(NOTICE, "[page statistics]");
+	elog(NOTICE, "root block number.......: %d", indexStat.root_blkno);
+	elog(NOTICE, "internal pages..........: %d", indexStat.internal_pages);
+	elog(NOTICE, "leaf pages..............: %d", indexStat.leaf_pages);
+	elog(NOTICE, "empty pages.............: %d", indexStat.empty_pages);
+	elog(NOTICE, "deleted pages...........: %d", indexStat.deleted_pages);
+
+	elog(NOTICE, "");
+	elog(NOTICE, "[leaf statistics]");
+	elog(NOTICE, "avg. leaf density.......: %.2f%%", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
+	elog(NOTICE, "leaf fragmentation......: %.2f%%", (float)indexStat.fragments / 2.0 / (float)indexStat.leaf_pages * 100.0 );
+#endif
+
+	/*----------------------------
+	 * Build a result tuple
+	 *----------------------------
+	 */
+	{
+		TupleDesc tupleDesc;
+		int j;
+		char *values[PGSTATINDEX_NCOLUMNS];
+
+		HeapTupleData tupleData;
+		HeapTuple tuple = &tupleData;
+
+		tupleDesc = RelationNameGetTupleDesc(PGSTATINDEX_TYPE);
+
+		j = 0;
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", indexStat.version);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", indexStat.level);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", (indexStat.root_pages +
+										 indexStat.leaf_pages +
+										 indexStat.internal_pages +
+										 indexStat.deleted_pages +
+										 indexStat.empty_pages) * BLCKSZ );
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", indexStat.root_blkno);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", indexStat.internal_pages);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", indexStat.leaf_pages);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", indexStat.empty_pages);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", indexStat.deleted_pages);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%.2f", 100.0-(float)indexStat.free_space / (float)indexStat.max_avail * 100.0 );
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%.2f", (float)indexStat.fragments / (float)indexStat.leaf_pages * 100.0 );
+
+		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+									   values);
+		
+		result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+	}
+
+	PG_RETURN_DATUM( result );
+}
+
+/* -----------------------------------------------
+ * bt_page()
+ *
+ * Usage: SELECT * FROM bt_page('t1_pkey', 0);
+ * -----------------------------------------------
+ */
+Datum
+bt_page_stats(PG_FUNCTION_ARGS)
+{
+	text	   *relname = PG_GETARG_TEXT_P(0);
+	uint32	   blkno   = PG_GETARG_UINT32(1);
+	Buffer	   buffer;
+
+	Relation	rel;
+	RangeVar    *relrv;
+	Datum		result;
+
+	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+	rel = relation_openrv(relrv, AccessShareLock);
+
+	CHECK_RELATION_BLOCK_RANGE(rel, blkno);
+
+	buffer = ReadBuffer(rel, blkno);
+
+	if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+		elog(ERROR, "bt_page_stats() can be used only on b-tree index.");
+
+	if ( blkno==0 )
+		elog(ERROR, "Block 0 is a meta page.");
+
+	{
+		HeapTuple tuple;
+		TupleDesc tupleDesc;
+		int j;
+		char *values[BTPAGESTATS_NCOLUMNS];
+
+		BTPageStat stat;
+
+		GetBTPageStatistics(blkno, buffer, &stat);
+
+		tupleDesc = RelationNameGetTupleDesc(BTPAGESTATS_TYPE);
+
+		j = 0;
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.blkno);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.live_items);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.dead_items);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.live_items + stat.dead_items);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.avg_item_size);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.page_size);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.free_size);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", stat.is_deleted);
+		values[j] = palloc(32);
+//		snprintf(values[j++], 32, "%d", opaque->btpo_prev);
+		snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+		values[j] = palloc(32);
+//		snprintf(values[j++], 32, "%d", opaque->btpo_next);
+		snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+		values[j] = palloc(32);
+//		snprintf(values[j++], 32, "%d", opaque->btpo.level);
+		snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+		values[j] = palloc(32);
+//		snprintf(values[j++], 32, "%d", opaque->btpo_flags);
+		snprintf(values[j++], 32, "%d", 0); /*-- not implemented yet --*/
+
+		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+									   values);
+
+		result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+	}
+
+	ReleaseBuffer(buffer);
+
+	relation_close(rel, AccessShareLock);
+
+	PG_RETURN_DATUM(result);
+}
+
+/*-------------------------------------------------------
+ * bt_page_items()
+ *
+ * Get IndexTupleData set in a leaf page
+ *
+ * Usage: SELECT * FROM bt_page_items('t1_pkey', 0);
+ *-------------------------------------------------------
+ */
+/* ---------------------------------------------------
+ * data structure for SRF to hold a scan information
+ * ---------------------------------------------------
+ */
+struct user_args
+{
+	TupleDesc tupd;
+	Relation rel;
+	Buffer buffer;
+	Page page;
+	uint16 offset;
+};
+
+Datum
+bt_page_items(PG_FUNCTION_ARGS)
+{
+	text	   *relname = PG_GETARG_TEXT_P(0);
+	uint32	   blkno   = PG_GETARG_UINT32(1);
+
+	RangeVar    *relrv;
+	Datum result;
+	char *values[BTPAGEITEMS_NCOLUMNS];
+	BTPageOpaque opaque;
+	HeapTuple tuple;
+	ItemId id;
+
+	FuncCallContext *fctx;
+	MemoryContext mctx;
+	struct user_args *uargs = NULL;
+
+	if ( blkno==0 )
+		elog(ERROR, "Block 0 is a meta page.");
+
+	if ( SRF_IS_FIRSTCALL() )
+	{
+		fctx = SRF_FIRSTCALL_INIT();
+		mctx = MemoryContextSwitchTo(fctx->multi_call_memory_ctx);
+
+		uargs = palloc(sizeof(struct user_args));
+
+        uargs->tupd = RelationNameGetTupleDesc(BTPAGEITEMS_TYPE);
+		uargs->offset = FirstOffsetNumber;
+
+		relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+		uargs->rel = relation_openrv(relrv, AccessShareLock);
+
+		CHECK_RELATION_BLOCK_RANGE(uargs->rel,  blkno);
+
+		uargs->buffer = ReadBuffer(uargs->rel, blkno);
+
+		if ( !IS_INDEX(uargs->rel) || !IS_BTREE(uargs->rel) )
+			elog(ERROR, "bt_page_items() can be used only on b-tree index.");
+
+		uargs->page = BufferGetPage(uargs->buffer);
+	
+		opaque = (BTPageOpaque) PageGetSpecialPointer(uargs->page);
+		
+		if ( P_ISDELETED(opaque) )
+			elog(NOTICE, "bt_page_items(): this page is deleted.");
+
+		fctx->max_calls = PageGetMaxOffsetNumber(uargs->page);
+		fctx->user_fctx = uargs;
+
+		MemoryContextSwitchTo(mctx);
+	}
+	
+	fctx = SRF_PERCALL_SETUP();
+	uargs = fctx->user_fctx;
+
+	if ( fctx->call_cntr < fctx->max_calls )
+	{
+		BTItem btitem;
+		IndexTuple itup;
+
+		id = PageGetItemId(uargs->page, uargs->offset);
+
+		if ( !ItemIdIsValid(id) )
+			elog(ERROR, "Invalid ItemId.");
+
+		btitem = (BTItem)PageGetItem(uargs->page, id);
+		itup = &(btitem->bti_itup);
+		
+		{
+			int j = 0;
+
+			BlockNumber blkno = BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid));
+
+			values[j] = palloc(32);
+			snprintf(values[j++], 32, "%d", uargs->offset);
+			values[j] = palloc(32);
+			snprintf(values[j++], 32, "(%u,%u)", blkno, itup->t_tid.ip_posid);
+			values[j] = palloc(32);
+			snprintf(values[j++], 32, "%d", IndexTupleSize(itup));
+			values[j] = palloc(32);
+			snprintf(values[j++], 32, "%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+			values[j] = palloc(32);
+			snprintf(values[j++], 32, "%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+			{
+				int off;
+				char *dump;
+				char *ptr = (char *)itup + IndexInfoFindDataOffset(itup->t_info);
+
+				dump = palloc(IndexTupleSize(itup)*3);
+				memset(dump, 0, IndexTupleSize(itup)*3);
+
+				for (off=0 ;
+					 off < IndexTupleSize(itup)-IndexInfoFindDataOffset(itup->t_info) ;
+					 off++)
+				{
+					if ( dump[0]=='\0' )
+						sprintf(dump, "%02x", *(ptr+off) & 0xff);
+					else
+					{
+						char buf[4];
+						sprintf(buf, " %02x", *(ptr+off) & 0xff);
+						strcat(dump, buf);
+					}
+				}
+				values[j] = dump;
+			}
+
+			tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(uargs->tupd), values);
+			result = TupleGetDatum(TupleDescGetSlot(uargs->tupd), tuple);
+		}
+
+		uargs->offset = uargs->offset + 1;
+
+		SRF_RETURN_NEXT(fctx, result);
+	}
+	else
+	{
+		ReleaseBuffer(uargs->buffer);
+		relation_close(uargs->rel, AccessShareLock);
+
+		SRF_RETURN_DONE(fctx);
+	}
+}
+
+
+/* ------------------------------------------------
+ * bt_metap()
+ *
+ * Get a btree meta-page information
+ *
+ * Usage: SELECT * FROM bt_metap('t1_pkey')
+ * ------------------------------------------------
+ */
+Datum
+bt_metap(PG_FUNCTION_ARGS)
+{
+	text	   *relname = PG_GETARG_TEXT_P(0);
+	Buffer	   buffer;
+
+	Relation	rel;
+	RangeVar    *relrv;
+	Datum		result;
+
+	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+	rel = relation_openrv(relrv, AccessShareLock);
+
+	if ( !IS_INDEX(rel) || !IS_BTREE(rel) )
+		elog(ERROR, "bt_metap() can be used only on b-tree index.");
+
+	buffer = ReadBuffer(rel, 0);
+
+	{
+		BTMetaPageData *metad;
+
+		TupleDesc tupleDesc;
+		int j;
+		char *values[BTMETAP_NCOLUMNS];
+		HeapTuple tuple;
+
+		Page page = BufferGetPage(buffer);
+
+		metad = BTPageGetMeta(page);
+
+		tupleDesc = RelationNameGetTupleDesc(BTMETAP_TYPE);
+
+		j = 0;
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", metad->btm_magic);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", metad->btm_version);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", metad->btm_root);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", metad->btm_level);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", metad->btm_fastroot);
+		values[j] = palloc(32);
+		snprintf(values[j++], 32, "%d", metad->btm_fastlevel);
+
+		tuple = BuildTupleFromCStrings(TupleDescGetAttInMetadata(tupleDesc),
+									   values);
+
+		result = TupleGetDatum(TupleDescGetSlot(tupleDesc), tuple);
+	}
+
+	ReleaseBuffer(buffer);
+
+	relation_close(rel, AccessShareLock);
+
+	PG_RETURN_DATUM(result);
+}
+
+/* --------------------------------------------------------
+ * pg_relpages()
+ *
+ * Get a number of pages of the table/index.
+ *
+ * Usage: SELECT pg_relpages('t1');
+ *        SELECT pg_relpages('t1_pkey');
+ * --------------------------------------------------------
+ */
+Datum
+pg_relpages(PG_FUNCTION_ARGS)
+{
+	text	   *relname = PG_GETARG_TEXT_P(0);
+
+	Relation	rel;
+	RangeVar    *relrv;
+	int4  relpages;
+
+	relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
+	rel = relation_openrv(relrv, AccessShareLock);
+
+	relpages = RelationGetNumberOfBlocks(rel);
+
+	relation_close(rel, AccessShareLock);
+
+	PG_RETURN_INT32(relpages);
+}
+
diff -ruN pgstattuple.orig/pgstattuple.sql.in pgstattuple/pgstattuple.sql.in
--- pgstattuple.orig/pgstattuple.sql.in	2006-08-10 19:22:47.000000000 +0900
+++ pgstattuple/pgstattuple.sql.in	2006-08-11 17:46:19.000000000 +0900
@@ -22,3 +22,97 @@
RETURNS pgstattuple_type
AS 'MODULE_PATHNAME', 'pgstattuplebyid'
LANGUAGE 'C' STRICT;
+
+--
+-- pgstatindex
+--
+DROP TYPE pgstatindex_type CASCADE;
+CREATE TYPE pgstatindex_type AS (
+  version int4,
+  tree_level int4,
+  index_size int4,
+  root_block_no int4,
+  internal_pages int4,
+  leaf_pages int4,
+  empty_pages int4,
+  deleted_pages int4,
+  avg_leaf_density float8,
+  leaf_fragmentation float8
+);
+
+CREATE OR REPLACE FUNCTION pgstatindex(text)
+RETURNS pgstatindex_type
+AS 'MODULE_PATHNAME', 'pgstatindex'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_metap()
+--
+DROP TYPE bt_metap_type CASCADE;
+CREATE TYPE bt_metap_type AS (
+  magic int4,
+  version int4,
+  root int4,
+  level int4,
+  fastroot int4,
+  fastlevel int4
+);
+
+CREATE OR REPLACE FUNCTION bt_metap(text)
+RETURNS bt_metap_type
+AS 'MODULE_PATHNAME', 'bt_metap'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_items()
+--
+DROP TYPE bt_page_items_type CASCADE;
+CREATE TYPE bt_page_items_type AS (
+  itemoffset int4,
+  ctid tid,
+  itemlen int4,
+  nulls bool,
+  vars bool,
+  data text
+);
+
+DROP FUNCTION bt_page_items(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_items(text, int4)
+RETURNS SETOF bt_page_items_type
+AS 'MODULE_PATHNAME', 'bt_page_items'
+LANGUAGE 'C' STRICT;
+
+--
+-- bt_page_stats()
+--
+DROP TYPE bt_page_stats_type CASCADE;
+CREATE TYPE bt_page_stats_type AS (
+  blkno int4,
+  live_items int4,
+  dead_items int4,
+  total_items int4,
+  avg_item_size float,
+  page_size int4,
+  free_size int4,
+  is_deleted int4,
+  btpo_prev int4,
+  btpo_next int4,
+  btpo_level int4,
+  btpo_flags int4
+);
+
+DROP FUNCTION bt_page_stats(text, int4);
+
+CREATE OR REPLACE FUNCTION bt_page_stats(text, int4)
+RETURNS bt_page_stats_type
+AS 'MODULE_PATHNAME', 'bt_page_stats'
+LANGUAGE 'C' STRICT;
+
+--
+-- pg_relpages()
+--
+CREATE OR REPLACE FUNCTION pg_relpages(text)
+RETURNS int
+AS 'MODULE_PATHNAME', 'pg_relpages'
+LANGUAGE 'C' STRICT;
diff -ruN pgstattuple.orig/test.sh pgstattuple/test.sh
--- pgstattuple.orig/test.sh	1970-01-01 09:00:00.000000000 +0900
+++ pgstattuple/test.sh	2006-08-11 17:47:05.000000000 +0900
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+export PATH=/usr/local/pgsql814/bin:$PATH
+
+psql pgbench<<EOF
+DROP FUNCTION pgstatindex(text);
+EOF
+
+psql -f /usr/local/pgsql814/share/contrib/pgstattuple.sql pgbench
+
+psql pgbench<<EOF
+SELECT * FROM pg_relpages('accounts_pkey');
+\x
+SELECT * FROM pgstatindex('accounts_pkey');
+SELECT * FROM bt_metap('accounts_pkey');
+\x
+SELECT * FROM bt_page_items('accounts_pkey', 0);
+SELECT * FROM bt_page_items('accounts_pkey', 1) limit 20;
+
+\x
+SELECT * FROM bt_page_stats('accounts_pkey', 1);
+EOF

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#20

Alvaro Herrera

alvherre@2ndquadrant.com

almost 20 years ago

In reply to: Satoshi Nagayasu (#18)

hackerspatches

Re: pgstattuple extension for indexes

Satoshi Nagayasu wrote:

Hi all,

Here is a patch to add pgstatindex functions to the pgstattuple module,
which can work with 8.1.4. Please review and try it. Thanks.

Huh, I bet it works with 8.1.4, but it doesn't work on CVS HEAD:

/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c: In function 'GetBTPageStatistics':
/pgsql/source/00orig/contrib/pgstattuple/pgstatindex.c:182: error: 'BTItem' undeclared (first use in this function)

While you're at it, please consider removing C++ style comments and
unused code.

Formatting is way off as well, but I guess that is easily fixed with
pgindent.

Regarding the pg_relpages function, why do you think it's necessary?
(It returns the true number of blocks of a given relation). It may
belong into core given a reasonable use case, but otherwise it doesn't
seem to belong into pgstatindex (or pgstattuple for that matter).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#21

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Alvaro Herrera (#20)

hackerspatches

#22

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Satoshi Nagayasu (#21)

hackerspatches

#23

Alvaro Herrera

alvherre@2ndquadrant.com

almost 20 years ago

In reply to: Satoshi Nagayasu (#22)

hackerspatches

#24

Tom Lane

tgl@sss.pgh.pa.us

almost 20 years ago

In reply to: Alvaro Herrera (#23)

hackerspatches

#25

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

almost 20 years ago

In reply to: Satoshi Nagayasu (#22)

hackerspatches

#26

Martijn van Oosterhout

kleptog@svana.org

almost 20 years ago

In reply to: ITAGAKI Takahiro (#25)

hackerspatches

#27

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 20 years ago

In reply to: Martijn van Oosterhout (#26)

hackerspatches

#28

Martijn van Oosterhout

kleptog@svana.org

almost 20 years ago

In reply to: Jim Nasby (#27)

hackerspatches

#29

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: ITAGAKI Takahiro (#25)

hackerspatches

#30

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

almost 20 years ago

In reply to: Satoshi Nagayasu (#29)

hackerspatches

#31

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: ITAGAKI Takahiro (#30)

hackerspatches

#32

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

almost 20 years ago

In reply to: Satoshi Nagayasu (#31)

hackerspatches

#33

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: ITAGAKI Takahiro (#32)

hackerspatches

#34

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 20 years ago

In reply to: Martijn van Oosterhout (#28)

hackerspatches

#35

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#18)

hackerspatches

#36

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Bruce Momjian (#35)

hackerspatches

#37

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#36)

hackerspatches

#38

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Bruce Momjian (#37)

hackerspatches

#39

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#38)

hackerspatches

#40

Jim Nasby

Jim.Nasby@BlueTreble.com

almost 20 years ago

In reply to: Satoshi Nagayasu (#29)

hackerspatches

#41

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#22)

hackerspatches

#42

Tom Lane

tgl@sss.pgh.pa.us

almost 20 years ago

In reply to: Bruce Momjian (#41)

hackerspatches

#43

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Bruce Momjian (#41)

hackerspatches

#44

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 20 years ago

In reply to: Tom Lane (#42)

hackerspatches

#45

ITAGAKI Takahiro

itagaki.takahiro@oss.ntt.co.jp

almost 20 years ago

In reply to: Bruce Momjian (#41)

hackerspatches

#46

Tom Lane

tgl@sss.pgh.pa.us

almost 20 years ago

In reply to: ITAGAKI Takahiro (#45)

hackerspatches

#47

Tom Lane

tgl@sss.pgh.pa.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#44)

hackerspatches

#48

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Satoshi Nagayasu (#43)

hackerspatches

pgstattuple extension for indexes

Attachments:

Attachments:

Attachments:

Attachments: