Metadata and record block access stats for indexes
Hi,
For the purpose of writing a blog post I was checking the index stats
recorded for a workload, but found them rather confusing. Following
along the code with the debugger it eventually made sense, and I could
eventually understand what's counted. Looking around a bit, I
discovered an older discussion [1]/messages/by-id/CAH2-WzmdZqxCS1widYzjDAM+Z-Jz=ejJoaWXDVw9Qy1UsK0tLA@mail.gmail.com in the mailing lists and learned that
the issue is known. The proposal in that thread is to start counting
separate metadata and record stats depending on what type of index block
is retrieved.
I realized those would have helped me better understand the collected
index stats, so I started working on a patch to add these in the system
views. Attached is a WIP patch file with partial coverage of the B-Tree
index code. The implementation follows the existing stats collection
approach and the naming convention proposed in [1]/messages/by-id/CAH2-WzmdZqxCS1widYzjDAM+Z-Jz=ejJoaWXDVw9Qy1UsK0tLA@mail.gmail.com. Let me know if what
I'm doing is feasible and if there's any concerns I could address. Next
steps would be to replace all places where I currently pass in NULL with
proper counting, as well as update tests and docs.
Looking forward to your feedback! Thanks!
Cheers,
Mircea
[1]: /messages/by-id/CAH2-WzmdZqxCS1widYzjDAM+Z-Jz=ejJoaWXDVw9Qy1UsK0tLA@mail.gmail.com
/messages/by-id/CAH2-WzmdZqxCS1widYzjDAM+Z-Jz=ejJoaWXDVw9Qy1UsK0tLA@mail.gmail.com
Attachments:
v1-0001-Preliminary-work-to-capture-and-expose-separate-r.patchtext/plain; charset=UTF-8; name=v1-0001-Preliminary-work-to-capture-and-expose-separate-r.patchDownload
From a540a042ffb0b348254afdfdce39199900f9c7ec Mon Sep 17 00:00:00 2001
From: Mircea Cadariu <cadariu.mircea@gmail.com>
Date: Thu, 20 Feb 2025 13:45:12 +0000
Subject: [PATCH v1] Preliminary work to capture and expose separate record
(leaf page) and metadata (non-leaf page) index access statistics in the
system views, with partial coverage of B-Trees.
---
contrib/amcheck/verify_heapam.c | 2 +-
contrib/amcheck/verify_nbtree.c | 6 +-
contrib/bloom/blinsert.c | 6 +-
contrib/bloom/blscan.c | 2 +-
contrib/bloom/blutils.c | 6 +-
contrib/bloom/blvacuum.c | 6 +-
contrib/pageinspect/btreefuncs.c | 8 +-
contrib/pageinspect/rawpage.c | 2 +-
contrib/pg_prewarm/autoprewarm.c | 2 +-
contrib/pg_surgery/heap_surgery.c | 2 +-
contrib/pg_visibility/pg_visibility.c | 2 +-
contrib/pgstattuple/pgstatapprox.c | 2 +-
contrib/pgstattuple/pgstatindex.c | 10 +-
contrib/pgstattuple/pgstattuple.c | 10 +-
doc/src/sgml/monitoring.sgml | 110 +++++++++++++++++++
src/backend/access/brin/brin.c | 4 +-
src/backend/access/brin/brin_pageops.c | 4 +-
src/backend/access/brin/brin_revmap.c | 12 +-
src/backend/access/gin/ginbtree.c | 11 +-
src/backend/access/gin/ginfast.c | 14 +--
src/backend/access/gin/ginget.c | 6 +-
src/backend/access/gin/ginutil.c | 6 +-
src/backend/access/gin/ginvacuum.c | 22 ++--
src/backend/access/gist/gist.c | 10 +-
src/backend/access/gist/gistbuild.c | 10 +-
src/backend/access/gist/gistget.c | 4 +-
src/backend/access/gist/gistutil.c | 2 +-
src/backend/access/gist/gistvacuum.c | 6 +-
src/backend/access/hash/hash.c | 3 +-
src/backend/access/hash/hashpage.c | 10 +-
src/backend/access/heap/heapam.c | 16 +--
src/backend/access/heap/heapam_handler.c | 9 +-
src/backend/access/heap/hio.c | 8 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtinsert.c | 32 ++++--
src/backend/access/nbtree/nbtpage.c | 65 +++++++----
src/backend/access/nbtree/nbtree.c | 2 +-
src/backend/access/nbtree/nbtsearch.c | 34 ++++--
src/backend/access/nbtree/nbtutils.c | 5 +-
src/backend/access/spgist/spgdoinsert.c | 4 +-
src/backend/access/spgist/spgscan.c | 4 +-
src/backend/access/spgist/spgutils.c | 8 +-
src/backend/access/spgist/spgvacuum.c | 4 +-
src/backend/access/transam/xloginsert.c | 2 +-
src/backend/catalog/system_views.sql | 30 ++++-
src/backend/commands/sequence.c | 2 +-
src/backend/storage/aio/read_stream.c | 6 +-
src/backend/storage/buffer/bufmgr.c | 52 +++++----
src/backend/storage/freespace/freespace.c | 2 +-
src/backend/utils/activity/pgstat_database.c | 4 +
src/backend/utils/activity/pgstat_relation.c | 8 ++
src/backend/utils/adt/pgstatfuncs.c | 24 ++++
src/include/access/nbtree.h | 4 +-
src/include/catalog/pg_proc.dat | 32 ++++++
src/include/pgstat.h | 45 ++++++++
src/include/storage/bufmgr.h | 12 +-
src/test/regress/expected/rules.out | 40 ++++++-
58 files changed, 557 insertions(+), 201 deletions(-)
diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 827312306f..5b611b15e8 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -439,7 +439,7 @@ verify_heapam(PG_FUNCTION_ARGS)
/* Read and lock the next page. */
ctx.buffer = ReadBufferExtended(ctx.rel, MAIN_FORKNUM, ctx.blkno,
- RBM_NORMAL, ctx.bstrategy);
+ RBM_NORMAL, ctx.bstrategy, NULL);
LockBuffer(ctx.buffer, BUFFER_LOCK_SHARE);
ctx.page = BufferGetPage(ctx.buffer);
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index aac8c74f54..5a045b1d88 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -1249,7 +1249,7 @@ bt_recheck_sibling_links(BtreeCheckState *state,
/* Couple locks in the usual order for nbtree: Left to right */
lbuf = ReadBufferExtended(state->rel, MAIN_FORKNUM, leftcurrent,
- RBM_NORMAL, state->checkstrategy);
+ RBM_NORMAL, state->checkstrategy, NULL);
LockBuffer(lbuf, BT_READ);
_bt_checkpage(state->rel, lbuf);
page = BufferGetPage(lbuf);
@@ -1273,7 +1273,7 @@ bt_recheck_sibling_links(BtreeCheckState *state,
{
newtargetbuf = ReadBufferExtended(state->rel, MAIN_FORKNUM,
newtargetblock, RBM_NORMAL,
- state->checkstrategy);
+ state->checkstrategy, NULL);
LockBuffer(newtargetbuf, BT_READ);
_bt_checkpage(state->rel, newtargetbuf);
page = BufferGetPage(newtargetbuf);
@@ -3440,7 +3440,7 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* longer than we must.
*/
buffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, blocknum, RBM_NORMAL,
- state->checkstrategy);
+ state->checkstrategy, NULL);
LockBuffer(buffer, BT_READ);
/*
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index ee8ebaf3ca..16bf039d7b 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -201,7 +201,7 @@ blinsert(Relation index, Datum *values, bool *isnull,
* At first, try to insert new tuple to the first page in notFullPage
* array. If successful, we don't need to modify the meta page.
*/
- metaBuffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO);
+ metaBuffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO, NULL);
LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
metaData = BloomPageGetMeta(BufferGetPage(metaBuffer));
@@ -213,7 +213,7 @@ blinsert(Relation index, Datum *values, bool *isnull,
/* Don't hold metabuffer lock while doing insert */
LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
state = GenericXLogStart(index);
@@ -280,7 +280,7 @@ blinsert(Relation index, Datum *values, bool *isnull,
blkno = metaData->notFullPage[nStart];
Assert(blkno != InvalidBlockNumber);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = GenericXLogRegisterBuffer(state, buffer, 0);
diff --git a/contrib/bloom/blscan.c b/contrib/bloom/blscan.c
index bf801fe78f..30714944ac 100644
--- a/contrib/bloom/blscan.c
+++ b/contrib/bloom/blscan.c
@@ -123,7 +123,7 @@ blgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
Page page;
buffer = ReadBufferExtended(scan->indexRelation, MAIN_FORKNUM,
- blkno, RBM_NORMAL, bas);
+ blkno, RBM_NORMAL, bas, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 04b61042a5..e73fe580ce 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -185,7 +185,7 @@ initBloomState(BloomState *state, Relation index)
opts = MemoryContextAlloc(index->rd_indexcxt, sizeof(BloomOptions));
- buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO);
+ buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
@@ -364,7 +364,7 @@ BloomNewBuffer(Relation index)
if (blkno == InvalidBlockNumber)
break;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
/*
* We have to guard against the possibility that someone else already
@@ -456,7 +456,7 @@ BloomInitMetapage(Relation index, ForkNumber forknum)
* block number 0 (BLOOM_METAPAGE_BLKNO). No need to hold the extension
* lock because there cannot be concurrent inserters yet.
*/
- metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+ metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL, NULL);
LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
Assert(BufferGetBlockNumber(metaBuffer) == BLOOM_METAPAGE_BLKNO);
diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
index 86b15a75f6..ebe769d375 100644
--- a/contrib/bloom/blvacuum.c
+++ b/contrib/bloom/blvacuum.c
@@ -60,7 +60,7 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
vacuum_delay_point(false);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
@@ -139,7 +139,7 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
* info could already be out of date at this point, but blinsert() will
* cope if so.
*/
- buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO);
+ buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
@@ -190,7 +190,7 @@ blvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
vacuum_delay_point(false);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = (Page) BufferGetPage(buffer);
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 9cdc8e182b..e7c3bf2f78 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -282,7 +282,7 @@ bt_page_stats_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version)
bt_index_block_validate(rel, blkno);
- buffer = ReadBuffer(rel, blkno);
+ buffer = ReadBuffer(rel, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
/* keep compiler quiet */
@@ -422,7 +422,7 @@ bt_multi_page_stats(PG_FUNCTION_ARGS)
BTPageStat stat;
TupleDesc tupleDesc;
- buffer = ReadBuffer(rel, uargs->blkno);
+ buffer = ReadBuffer(rel, uargs->blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
/* keep compiler quiet */
@@ -651,7 +651,7 @@ bt_page_items_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version)
bt_index_block_validate(rel, blkno);
- buffer = ReadBuffer(rel, blkno);
+ buffer = ReadBuffer(rel, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
/*
@@ -875,7 +875,7 @@ bt_metap(PG_FUNCTION_ARGS)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot access temporary tables of other sessions")));
- buffer = ReadBuffer(rel, 0);
+ buffer = ReadBuffer(rel, 0, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 617dff821a..7605326c66 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -185,7 +185,7 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
/* Take a verbatim copy of the page */
- buf = ReadBufferExtended(rel, forknum, blkno, RBM_NORMAL, NULL);
+ buf = ReadBufferExtended(rel, forknum, blkno, RBM_NORMAL, NULL, NULL);
LockBuffer(buf, BUFFER_LOCK_SHARE);
memcpy(raw_page_data, BufferGetPage(buf), BLCKSZ);
diff --git a/contrib/pg_prewarm/autoprewarm.c b/contrib/pg_prewarm/autoprewarm.c
index b45755b334..df335a054a 100644
--- a/contrib/pg_prewarm/autoprewarm.c
+++ b/contrib/pg_prewarm/autoprewarm.c
@@ -533,7 +533,7 @@ autoprewarm_database_main(Datum main_arg)
/* Prewarm buffer. */
buf = ReadBufferExtended(rel, blk->forknum, blk->blocknum, RBM_NORMAL,
- NULL);
+ NULL, NULL);
if (BufferIsValid(buf))
{
apw_state->prewarmed_blocks++;
diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c
index 5b94b3d523..62388c6175 100644
--- a/contrib/pg_surgery/heap_surgery.c
+++ b/contrib/pg_surgery/heap_surgery.c
@@ -172,7 +172,7 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt)
continue;
}
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
LockBufferForCleanup(buf);
page = BufferGetPage(buf);
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 7f268a18a7..a11469660d 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -151,7 +151,7 @@ pg_visibility(PG_FUNCTION_ARGS)
/* Here we have to explicitly check rel size ... */
if (blkno < RelationGetNumberOfBlocks(rel))
{
- buffer = ReadBuffer(rel, blkno);
+ buffer = ReadBuffer(rel, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index a59ff4e9d4..46efc49522 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -94,7 +94,7 @@ statapprox_heap(Relation rel, output_type *stat)
}
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno,
- RBM_NORMAL, bstrategy);
+ RBM_NORMAL, bstrategy, NULL);
LockBuffer(buf, BUFFER_LOCK_SHARE);
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 4b9d76ec4e..a0a924acbf 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -250,7 +250,8 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
* Read metapage
*/
{
- Buffer buffer = ReadBufferExtended(rel, MAIN_FORKNUM, 0, RBM_NORMAL, bstrategy);
+ Buffer buffer = ReadBufferExtended(rel, MAIN_FORKNUM, 0, RBM_NORMAL,
+ bstrategy, NULL);
Page page = BufferGetPage(buffer);
BTMetaPageData *metad = BTPageGetMeta(page);
@@ -286,7 +287,8 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
CHECK_FOR_INTERRUPTS();
/* Read and lock buffer */
- buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
+ bstrategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
@@ -542,7 +544,7 @@ pgstatginindex_internal(Oid relid, FunctionCallInfo fcinfo)
/*
* Read metapage
*/
- buffer = ReadBuffer(rel, GIN_METAPAGE_BLKNO);
+ buffer = ReadBuffer(rel, GIN_METAPAGE_BLKNO, NULL);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
metadata = GinPageGetMeta(page);
@@ -645,7 +647,7 @@ pgstathashindex(PG_FUNCTION_ARGS)
CHECK_FOR_INTERRUPTS();
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- bstrategy);
+ bstrategy, NULL);
LockBuffer(buf, BUFFER_LOCK_SHARE);
page = (Page) BufferGetPage(buf);
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 48cb8f59c4..a15668dc11 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -373,7 +373,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
- RBM_NORMAL, hscan->rs_strategy);
+ RBM_NORMAL, hscan->rs_strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer));
UnlockReleaseBuffer(buffer);
@@ -386,7 +386,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
- RBM_NORMAL, hscan->rs_strategy);
+ RBM_NORMAL, hscan->rs_strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer));
UnlockReleaseBuffer(buffer);
@@ -411,7 +411,8 @@ pgstat_btree_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
Buffer buf;
Page page;
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy,
+ NULL);
LockBuffer(buf, BT_READ);
page = BufferGetPage(buf);
@@ -497,7 +498,8 @@ pgstat_gist_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
Buffer buf;
Page page;
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy,
+ NULL);
LockBuffer(buf, GIST_SHARE);
gistcheckpage(rel, buf);
page = BufferGetPage(buf);
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 71c4f96d05..e3d1477ef0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3425,6 +3425,44 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>metadata_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata disk blocks read in this database
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>metadata_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times metadata disk blocks were found already in the buffer
+ cache
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>record_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record disk blocks read in this database
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>record_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times record disk blocks were found already in the buffer
+ cache
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>blks_hit</structfield> <type>bigint</type>
@@ -4368,6 +4406,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata disk blocks read from all indexes on this table
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata block hits in all indexes on this table
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record disk blocks read from all indexes on this table
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record block hits in all indexes on this table
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>idx_blks_hit</structfield> <type>bigint</type>
@@ -4504,6 +4578,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata disk blocks read from this index
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata buffer hits in this index
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record disk blocks read from this index
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record buffer hits in this index
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>idx_blks_hit</structfield> <type>bigint</type>
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 60320440fc..5c0e3febe5 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1647,7 +1647,7 @@ brinGetStats(Relation index, BrinStatsData *stats)
Page metapage;
BrinMetaPageData *metadata;
- metabuffer = ReadBuffer(index, BRIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, BRIN_METAPAGE_BLKNO, NULL);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = (BrinMetaPageData *) PageGetContents(metapage);
@@ -2182,7 +2182,7 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
CHECK_FOR_INTERRUPTS();
buf = ReadBufferExtended(idxrel, MAIN_FORKNUM, blkno,
- RBM_NORMAL, strategy);
+ RBM_NORMAL, strategy, NULL);
brin_page_cleanup(idxrel, buf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 6d8dd1512d..c78f0cc4e7 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -739,7 +739,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
LockRelationForExtension(irel, ExclusiveLock);
extensionLockHeld = true;
}
- buf = ReadBuffer(irel, P_NEW);
+ buf = ReadBuffer(irel, P_NEW, NULL);
newblk = BufferGetBlockNumber(buf);
*extended = true;
@@ -756,7 +756,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
}
else
{
- buf = ReadBuffer(irel, newblk);
+ buf = ReadBuffer(irel, newblk, NULL);
}
/*
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 4e380ecc71..4a0256d96a 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -74,7 +74,7 @@ brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
BrinMetaPageData *metadata;
Page page;
- meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
+ meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO, NULL);
LockBuffer(meta, BUFFER_LOCK_SHARE);
page = BufferGetPage(meta);
metadata = (BrinMetaPageData *) PageGetContents(page);
@@ -231,7 +231,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
ReleaseBuffer(revmap->rm_currBuf);
Assert(mapBlk != InvalidBlockNumber);
- revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk, NULL);
}
LockBuffer(revmap->rm_currBuf, BUFFER_LOCK_SHARE);
@@ -269,7 +269,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
{
if (BufferIsValid(*buf))
ReleaseBuffer(*buf);
- *buf = ReadBuffer(idxRel, blk);
+ *buf = ReadBuffer(idxRel, blk, NULL);
}
LockBuffer(*buf, mode);
page = BufferGetPage(*buf);
@@ -363,7 +363,7 @@ brinRevmapDesummarizeRange(Relation idxrel, BlockNumber heapBlk)
return true;
}
- regBuf = ReadBuffer(idxrel, ItemPointerGetBlockNumber(iptr));
+ regBuf = ReadBuffer(idxrel, ItemPointerGetBlockNumber(iptr), NULL);
LockBuffer(regBuf, BUFFER_LOCK_EXCLUSIVE);
regPg = BufferGetPage(regBuf);
@@ -485,7 +485,7 @@ revmap_get_buffer(BrinRevmap *revmap, BlockNumber heapBlk)
if (revmap->rm_currBuf != InvalidBuffer)
ReleaseBuffer(revmap->rm_currBuf);
- revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk, NULL);
}
return revmap->rm_currBuf;
@@ -553,7 +553,7 @@ revmap_physical_extend(BrinRevmap *revmap)
nblocks = RelationGetNumberOfBlocks(irel);
if (mapBlk < nblocks)
{
- buf = ReadBuffer(irel, mapBlk);
+ buf = ReadBuffer(irel, mapBlk, NULL);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(buf);
}
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 26a0bdc206..ee04a27528 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -87,7 +87,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
stack = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
stack->blkno = btree->rootBlkno;
- stack->buffer = ReadBuffer(btree->index, btree->rootBlkno);
+ stack->buffer = ReadBuffer(btree->index, btree->rootBlkno, NULL);
stack->parent = NULL;
stack->predictNumber = 1;
@@ -148,7 +148,8 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
{
/* in search mode we may forget path to leaf */
stack->blkno = child;
- stack->buffer = ReleaseAndReadBuffer(stack->buffer, btree->index, stack->blkno);
+ stack->buffer = ReleaseAndReadBuffer(stack->buffer, btree->index, stack->blkno,
+ NULL);
}
else
{
@@ -157,7 +158,7 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
ptr->parent = stack;
stack = ptr;
stack->blkno = child;
- stack->buffer = ReadBuffer(btree->index, stack->blkno);
+ stack->buffer = ReadBuffer(btree->index, stack->blkno, NULL);
stack->predictNumber = 1;
}
}
@@ -182,7 +183,7 @@ ginStepRight(Buffer buffer, Relation index, int lockmode)
bool isData = GinPageIsData(page);
BlockNumber blkno = GinPageGetOpaque(page)->rightlink;
- nextbuffer = ReadBuffer(index, blkno);
+ nextbuffer = ReadBuffer(index, blkno, NULL);
LockBuffer(nextbuffer, lockmode);
UnlockReleaseBuffer(buffer);
@@ -314,7 +315,7 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
/* Descend down to next level */
blkno = leftmostBlkno;
- buffer = ReadBuffer(btree->index, blkno);
+ buffer = ReadBuffer(btree->index, blkno, NULL);
}
}
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index a6d88572cc..68e5112732 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -239,7 +239,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
data.ntuples = 0;
data.newRightlink = data.prevTail = InvalidBlockNumber;
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, NULL);
metapage = BufferGetPage(metabuffer);
/*
@@ -319,7 +319,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
data.prevTail = metadata->tail;
data.newRightlink = sublist.head;
- buffer = ReadBuffer(index, metadata->tail);
+ buffer = ReadBuffer(index, metadata->tail, NULL);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
@@ -358,7 +358,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
CheckForSerializableConflictIn(index, NULL, GIN_METAPAGE_BLKNO);
- buffer = ReadBuffer(index, metadata->tail);
+ buffer = ReadBuffer(index, metadata->tail, NULL);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
@@ -575,7 +575,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
while (data.ndeleted < GIN_NDELETE_AT_ONCE && blknoToDelete != newHead)
{
freespace[data.ndeleted] = blknoToDelete;
- buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete);
+ buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete, NULL);
LockBuffer(buffers[data.ndeleted], GIN_EXCLUSIVE);
page = BufferGetPage(buffers[data.ndeleted]);
@@ -827,7 +827,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
workMemory = work_mem;
}
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, NULL);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -850,7 +850,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
* Read and lock head of pending list
*/
blkno = metadata->head;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
@@ -1003,7 +1003,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
* Read next page in pending list
*/
vacuum_delay_point(false);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
}
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 63dd1f3679..ebcab38baf 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -1480,7 +1480,7 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
* current page. So, we lock next page before releasing the
* current one
*/
- Buffer tmpbuf = ReadBuffer(scan->indexRelation, blkno);
+ Buffer tmpbuf = ReadBuffer(scan->indexRelation, blkno, NULL);
LockBuffer(tmpbuf, GIN_SHARE);
UnlockReleaseBuffer(pos->pendingBuffer);
@@ -1827,7 +1827,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
match;
int i;
pendingPosition pos;
- Buffer metabuffer = ReadBuffer(scan->indexRelation, GIN_METAPAGE_BLKNO);
+ Buffer metabuffer = ReadBuffer(scan->indexRelation, GIN_METAPAGE_BLKNO, NULL);
Page page;
BlockNumber blkno;
@@ -1854,7 +1854,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
return;
}
- pos.pendingBuffer = ReadBuffer(scan->indexRelation, blkno);
+ pos.pendingBuffer = ReadBuffer(scan->indexRelation, blkno, NULL);
LockBuffer(pos.pendingBuffer, GIN_SHARE);
pos.firstOffset = FirstOffsetNumber;
UnlockReleaseBuffer(metabuffer);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 1f9e58c4f1..c12b44eaca 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -310,7 +310,7 @@ GinNewBuffer(Relation index)
if (blkno == InvalidBlockNumber)
break;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
/*
* We have to guard against the possibility that someone else already
@@ -627,7 +627,7 @@ ginGetStats(Relation index, GinStatsData *stats)
Page metapage;
GinMetaPageData *metadata;
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, NULL);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -654,7 +654,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats, bool is_build)
Page metapage;
GinMetaPageData *metadata;
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, NULL);
LockBuffer(metabuffer, GIN_EXCLUSIVE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index fbbe3a6dd7..250bffa49a 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -143,11 +143,11 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
* deletable, parent and left pages.
*/
lBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, leftBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
dBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, deleteBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
pBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, parentBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
page = BufferGetPage(dBuffer);
rightlink = GinPageGetOpaque(page)->rightlink;
@@ -270,7 +270,7 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
}
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
if (!isRoot)
LockBuffer(buffer, GIN_EXCLUSIVE);
@@ -355,7 +355,7 @@ ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno)
PostingItem *pitem;
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
@@ -396,7 +396,7 @@ ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno)
break;
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
}
@@ -419,7 +419,7 @@ ginVacuumPostingTree(GinVacuumState *gvs, BlockNumber rootBlkno)
*tmp;
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, rootBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
/*
* Lock posting tree root for cleanup to ensure there are no
@@ -598,7 +598,7 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
gvs.result = stats;
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
/* find leaf page */
for (;;)
@@ -631,7 +631,7 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
UnlockReleaseBuffer(buffer);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
}
/* right now we found leftmost page in entry's BTree */
@@ -674,7 +674,7 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
break;
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, GIN_EXCLUSIVE);
}
@@ -751,7 +751,7 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
vacuum_delay_point(false);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, GIN_SHARE);
page = (Page) BufferGetPage(buffer);
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 4d858b65e1..d393f0c731 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -681,7 +681,7 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
}
if (XLogRecPtrIsInvalid(stack->lsn))
- stack->buffer = ReadBuffer(state.r, stack->blkno);
+ stack->buffer = ReadBuffer(state.r, stack->blkno, NULL);
/*
* Be optimistic and grab shared lock first. Swap it for an exclusive
@@ -932,7 +932,7 @@ gistFindPath(Relation r, BlockNumber child, OffsetNumber *downlinkoffnum)
top = linitial(fifo);
fifo = list_delete_first(fifo);
- buffer = ReadBuffer(r, top->blkno);
+ buffer = ReadBuffer(r, top->blkno, NULL);
LockBuffer(buffer, GIST_SHARE);
gistcheckpage(r, buffer);
page = (Page) BufferGetPage(buffer);
@@ -1085,7 +1085,7 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
*/
break;
}
- parent->buffer = ReadBuffer(r, parent->blkno);
+ parent->buffer = ReadBuffer(r, parent->blkno, NULL);
LockBuffer(parent->buffer, GIST_EXCLUSIVE);
gistcheckpage(r, parent->buffer);
parent->page = (Page) BufferGetPage(parent->buffer);
@@ -1110,7 +1110,7 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
/* note we don't lock them or gistcheckpage them here! */
while (ptr)
{
- ptr->buffer = ReadBuffer(r, ptr->blkno);
+ ptr->buffer = ReadBuffer(r, ptr->blkno, NULL);
ptr->page = (Page) BufferGetPage(ptr->buffer);
ptr = ptr->parent;
}
@@ -1225,7 +1225,7 @@ gistfixsplit(GISTInsertState *state, GISTSTATE *giststate)
if (GistFollowRight(page))
{
/* lock next page */
- buf = ReadBuffer(state->r, GistPageGetOpaque(page)->rightlink);
+ buf = ReadBuffer(state->r, GistPageGetOpaque(page)->rightlink, NULL);
LockBuffer(buf, GIST_EXCLUSIVE);
}
else
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 9e707167d9..20a2310dfc 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -966,7 +966,7 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
* descend down to.
*/
- buffer = ReadBuffer(indexrel, blkno);
+ buffer = ReadBuffer(indexrel, blkno, NULL);
LockBuffer(buffer, GIST_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
@@ -1029,7 +1029,7 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
* We've reached a leaf page. Place the tuple here.
*/
Assert(level == 0);
- buffer = ReadBuffer(indexrel, blkno);
+ buffer = ReadBuffer(indexrel, blkno, NULL);
LockBuffer(buffer, GIST_EXCLUSIVE);
gistbufferinginserttuples(buildstate, buffer, level,
&itup, 1, InvalidOffsetNumber,
@@ -1102,7 +1102,7 @@ gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer, int level,
ItemId iid = PageGetItemId(page, off);
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
BlockNumber childblkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
- Buffer childbuf = ReadBuffer(buildstate->indexrel, childblkno);
+ Buffer childbuf = ReadBuffer(buildstate->indexrel, childblkno, NULL);
LockBuffer(childbuf, GIST_SHARE);
gistMemorizeAllDownlinks(buildstate, childbuf);
@@ -1246,7 +1246,7 @@ gistBufferingFindCorrectParent(GISTBuildState *buildstate,
parent = *parentblkno;
}
- buffer = ReadBuffer(buildstate->indexrel, parent);
+ buffer = ReadBuffer(buildstate->indexrel, parent, NULL);
page = BufferGetPage(buffer);
LockBuffer(buffer, GIST_EXCLUSIVE);
gistcheckpage(buildstate->indexrel, buffer);
@@ -1441,7 +1441,7 @@ gistGetMaxLevel(Relation index)
Page page;
IndexTuple itup;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
/*
* There's no concurrent access during index build, so locking is just
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index cc40e928e0..98c9356b4b 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -49,7 +49,7 @@ gistkillitems(IndexScanDesc scan)
Assert(!XLogRecPtrIsInvalid(so->curPageLSN));
Assert(so->killedItems != NULL);
- buffer = ReadBuffer(scan->indexRelation, so->curBlkno);
+ buffer = ReadBuffer(scan->indexRelation, so->curBlkno, NULL);
if (!BufferIsValid(buffer))
return;
@@ -340,7 +340,7 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
Assert(!GISTSearchItemIsHeap(*pageItem));
- buffer = ReadBuffer(scan->indexRelation, pageItem->blkno);
+ buffer = ReadBuffer(scan->indexRelation, pageItem->blkno, NULL);
LockBuffer(buffer, GIST_SHARE);
PredicateLockPage(r, BufferGetBlockNumber(buffer), scan->xs_snapshot);
gistcheckpage(scan->indexRelation, buffer);
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index dbc4ac639a..b4dcc1056f 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -833,7 +833,7 @@ gistNewBuffer(Relation r, Relation heaprel)
if (blkno == InvalidBlockNumber)
break; /* nothing left in FSM */
- buffer = ReadBuffer(r, blkno);
+ buffer = ReadBuffer(r, blkno, NULL);
/*
* We have to guard against the possibility that someone else already
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index dd0d9d5006..f95e42f40a 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -286,7 +286,7 @@ restart:
vacuum_delay_point(false);
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- info->strategy);
+ info->strategy, NULL);
/*
* We are not going to stay here for a long time, aggressively grab an
@@ -482,7 +482,7 @@ gistvacuum_delete_empty_pages(IndexVacuumInfo *info, GistVacState *vstate)
int deleted;
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, (BlockNumber) blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, GIST_SHARE);
page = (Page) BufferGetPage(buffer);
@@ -548,7 +548,7 @@ gistvacuum_delete_empty_pages(IndexVacuumInfo *info, GistVacState *vstate)
break;
leafbuf = ReadBufferExtended(rel, MAIN_FORKNUM, leafs_to_delete[i],
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(leafbuf, GIST_EXCLUSIVE);
gistcheckpage(rel, leafbuf);
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 02ec1126a4..c806df1eb9 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -512,7 +512,8 @@ loop_top:
* We need to acquire a cleanup lock on the primary bucket page to out
* wait concurrent scans before deleting the dead tuples.
*/
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, info->strategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, info->strategy,
+ NULL);
LockBufferForCleanup(buf);
_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index b8e5bd005e..33dbcfd2f1 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -74,7 +74,7 @@ _hash_getbuf(Relation rel, BlockNumber blkno, int access, int flags)
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
@@ -100,7 +100,7 @@ _hash_getbuf_with_condlock_cleanup(Relation rel, BlockNumber blkno, int flags)
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
if (!ConditionalLockBufferForCleanup(buf))
{
@@ -140,7 +140,7 @@ _hash_getinitbuf(Relation rel, BlockNumber blkno)
elog(ERROR, "hash AM does not use P_NEW");
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_ZERO_AND_LOCK,
- NULL);
+ NULL, NULL);
/* ref count and lock type are correct */
@@ -218,7 +218,7 @@ _hash_getnewbuf(Relation rel, BlockNumber blkno, ForkNumber forkNum)
else
{
buf = ReadBufferExtended(rel, forkNum, blkno, RBM_ZERO_AND_LOCK,
- NULL);
+ NULL, NULL);
}
/* ref count and lock type are correct */
@@ -245,7 +245,7 @@ _hash_getbuf_with_strategy(Relation rel, BlockNumber blkno,
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy, NULL);
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index fa7935a0ed..5a0aa9998c 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1538,7 +1538,7 @@ heap_fetch(Relation relation,
/*
* Fetch and pin the appropriate page of the relation.
*/
- buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
+ buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid), NULL);
/*
* Need share lock on buffer to examine tuple commit status.
@@ -1832,7 +1832,7 @@ heap_get_latest_tid(TableScanDesc sscan,
/*
* Read, pin, and lock the page.
*/
- buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid));
+ buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid), NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
@@ -2728,7 +2728,7 @@ heap_delete(Relation relation, ItemPointer tid,
errmsg("cannot delete tuples during a parallel operation")));
block = ItemPointerGetBlockNumber(tid);
- buffer = ReadBuffer(relation, block);
+ buffer = ReadBuffer(relation, block, NULL);
page = BufferGetPage(buffer);
/*
@@ -3257,7 +3257,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
block = ItemPointerGetBlockNumber(otid);
INJECTION_POINT("heap_update-before-pin");
- buffer = ReadBuffer(relation, block);
+ buffer = ReadBuffer(relation, block, NULL);
page = BufferGetPage(buffer);
/*
@@ -4513,7 +4513,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
bool have_tuple_lock = false;
bool cleared_all_frozen = false;
- *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
+ *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid), NULL);
block = ItemPointerGetBlockNumber(tid);
/*
@@ -6009,7 +6009,7 @@ heap_finish_speculative(Relation relation, ItemPointer tid)
ItemId lp = NULL;
HeapTupleHeader htup;
- buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
+ buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid), NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
@@ -6100,7 +6100,7 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
Assert(ItemPointerIsValid(tid));
block = ItemPointerGetBlockNumber(tid);
- buffer = ReadBuffer(relation, block);
+ buffer = ReadBuffer(relation, block, NULL);
page = BufferGetPage(buffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
@@ -8181,7 +8181,7 @@ heap_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate)
UnlockReleaseBuffer(buf);
blkno = ItemPointerGetBlockNumber(htid);
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
nblocksaccessed++;
Assert(!delstate->bottomup ||
nblocksaccessed <= BOTTOMUP_MAX_NBLOCKS);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index c0bec01415..12167d9bb5 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -128,7 +128,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
hscan->xs_cbuf = ReleaseAndReadBuffer(hscan->xs_cbuf,
hscan->xs_base.rel,
- ItemPointerGetBlockNumber(tid));
+ ItemPointerGetBlockNumber(tid),
+ NULL);
/*
* Prune page, but only if we weren't already on this page
@@ -2185,7 +2186,8 @@ heapam_scan_bitmap_next_block(TableScanDesc scan,
*/
hscan->rs_cbuf = ReleaseAndReadBuffer(hscan->rs_cbuf,
scan->rs_rd,
- block);
+ block,
+ NULL);
hscan->rs_cblock = block;
buffer = hscan->rs_cbuf;
snapshot = scan->rs_snapshot;
@@ -2414,7 +2416,8 @@ heapam_scan_sample_next_block(TableScanDesc scan, SampleScanState *scanstate)
/* Read page using selected strategy */
hscan->rs_cbuf = ReadBufferExtended(hscan->rs_base.rs_rd, MAIN_FORKNUM,
- blockno, RBM_NORMAL, hscan->rs_strategy);
+ blockno, RBM_NORMAL, hscan->rs_strategy,
+ NULL);
/* in pagemode, prune the page and determine visible tuple offsets */
if (hscan->rs_base.rs_flags & SO_ALLOW_PAGEMODE)
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index c482c9d61b..7d5afcc6bc 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -93,7 +93,7 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
/* If not bulk-insert, exactly like ReadBuffer */
if (!bistate)
return ReadBufferExtended(relation, MAIN_FORKNUM, targetBlock,
- mode, NULL);
+ mode, NULL, NULL);
/* If we have the desired block already pinned, re-pin and return it */
if (bistate->current_buf != InvalidBuffer)
@@ -117,7 +117,7 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
/* Perform a read using the buffer strategy */
buffer = ReadBufferExtended(relation, MAIN_FORKNUM, targetBlock,
- mode, bistate->strategy);
+ mode, bistate->strategy, NULL);
/* Save the selected block as target for future inserts */
IncrBufferRefCount(buffer);
@@ -640,7 +640,7 @@ loop:
else if (otherBlock < targetBlock)
{
/* lock other buffer first */
- buffer = ReadBuffer(relation, targetBlock);
+ buffer = ReadBuffer(relation, targetBlock, NULL);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
@@ -649,7 +649,7 @@ loop:
else
{
/* lock target buffer first */
- buffer = ReadBuffer(relation, targetBlock);
+ buffer = ReadBuffer(relation, targetBlock, NULL);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 1af18a78a2..4e63579b61 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3394,7 +3394,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
}
buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- vacrel->bstrategy);
+ vacrel->bstrategy, NULL);
/* In this phase we only need shared access to the buffer */
LockBuffer(buf, BUFFER_LOCK_SHARE);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26..a31f3098de 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -582,7 +582,7 @@ vm_readbuf(Relation rel, BlockNumber blkno, bool extend)
}
else
buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno,
- RBM_ZERO_ON_ERROR, NULL);
+ RBM_ZERO_ON_ERROR, NULL, NULL);
/*
* Initializing the page when needed is trickier than it looks, because of
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 31fe1c3ade..c76f1fd2d5 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -21,6 +21,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "common/pg_prng.h"
+#include "pgstat.h"
#include "lib/qunique.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
@@ -323,7 +324,7 @@ _bt_search_insert(Relation rel, Relation heaprel, BTInsertState insertstate)
if (RelationGetTargetBlock(rel) != InvalidBlockNumber)
{
/* Simulate a _bt_getbuf() call with conditional locking */
- insertstate->buf = ReadBuffer(rel, RelationGetTargetBlock(rel));
+ insertstate->buf = ReadBuffer(rel, RelationGetTargetBlock(rel), NULL);
if (_bt_conditionallockbuf(rel, insertstate->buf))
{
Page page;
@@ -733,7 +734,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
{
BlockNumber nblkno = opaque->btpo_next;
- nbuf = _bt_relandgetbuf(rel, nbuf, nblkno, BT_READ);
+ nbuf = _bt_relandgetbuf(rel, nbuf, nblkno, BT_READ, NULL);
page = BufferGetPage(nbuf);
opaque = BTPageGetOpaque(page);
if (!P_IGNORE(opaque))
@@ -1040,7 +1041,9 @@ _bt_stepright(Relation rel, Relation heaprel, BTInsertState insertstate,
rblkno = opaque->btpo_next;
for (;;)
{
- rbuf = _bt_relandgetbuf(rel, rbuf, rblkno, BT_WRITE);
+ bool hit;
+ rbuf = _bt_relandgetbuf(rel, rbuf, rblkno, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, P_ISLEAF(opaque), hit);
page = BufferGetPage(rbuf);
opaque = BTPageGetOpaque(page);
@@ -1256,10 +1259,14 @@ _bt_insertonpg(Relation rel,
*/
if (unlikely(split_only_page))
{
+ bool hit;
+
Assert(!isleaf);
Assert(BufferIsValid(cbuf));
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, true, hit);
+
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -1890,7 +1897,9 @@ _bt_split(Relation rel, Relation heaprel, BTScanInsert itup_key, Buffer buf,
*/
if (!isrightmost)
{
- sbuf = _bt_getbuf(rel, oopaque->btpo_next, BT_WRITE);
+ bool hit;
+ sbuf = _bt_getbuf(rel, oopaque->btpo_next, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, P_ISLEAF(oopaque), hit);
spage = BufferGetPage(sbuf);
sopaque = BTPageGetOpaque(spage);
if (sopaque->btpo_prev != origpagenumber)
@@ -2247,12 +2256,14 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
BTPageOpaque rpageop;
bool wasroot;
bool wasonly;
+ bool hit;
Assert(P_INCOMPLETE_SPLIT(lpageop));
Assert(heaprel != NULL);
/* Lock right sibling, the one missing the downlink */
- rbuf = _bt_getbuf(rel, lpageop->btpo_next, BT_WRITE);
+ rbuf = _bt_getbuf(rel, lpageop->btpo_next, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, P_ISLEAF(lpageop), hit);
rpage = BufferGetPage(rbuf);
rpageop = BTPageGetOpaque(rpage);
@@ -2264,7 +2275,8 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
BTMetaPageData *metad;
/* acquire lock on the metapage */
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, true, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -2330,7 +2342,7 @@ _bt_getstackbuf(Relation rel, Relation heaprel, BTStack stack, BlockNumber child
Page page;
BTPageOpaque opaque;
- buf = _bt_getbuf(rel, blkno, BT_WRITE);
+ buf = _bt_getbuf(rel, blkno, BT_WRITE, NULL);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
@@ -2460,6 +2472,7 @@ _bt_newlevel(Relation rel, Relation heaprel, Buffer lbuf, Buffer rbuf)
Buffer metabuf;
Page metapg;
BTMetaPageData *metad;
+ bool hit;
lbkno = BufferGetBlockNumber(lbuf);
rbkno = BufferGetBlockNumber(rbuf);
@@ -2472,7 +2485,8 @@ _bt_newlevel(Relation rel, Relation heaprel, Buffer lbuf, Buffer rbuf)
rootblknum = BufferGetBlockNumber(rootbuf);
/* acquire lock on the metapage */
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, true, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index c79dd38ee1..9bcb341470 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -30,6 +30,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/indexfsm.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
@@ -183,13 +184,15 @@ _bt_vacuum_needs_cleanup(Relation rel)
BTMetaPageData *metad;
uint32 btm_version;
BlockNumber prev_num_delpages;
+ bool hit;
/*
* Copy details from metapage to local variables quickly.
*
* Note that we deliberately avoid using cached version of metapage here.
*/
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
btm_version = metad->btm_version;
@@ -234,6 +237,7 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages)
Buffer metabuf;
Page metapg;
BTMetaPageData *metad;
+ bool hit;
/*
* On-disk compatibility note: The btm_last_cleanup_num_delpages metapage
@@ -253,7 +257,8 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages)
* no longer used as of PostgreSQL 14. We set it to -1.0 on rewrite, just
* to be consistent.
*/
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -350,6 +355,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
BlockNumber rootblkno;
uint32 rootlevel;
BTMetaPageData *metad;
+ bool hit;
Assert(access == BT_READ || heaprel != NULL);
@@ -373,7 +379,8 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
Assert(rootblkno != P_NONE);
rootlevel = metad->btm_fastlevel;
- rootbuf = _bt_getbuf(rel, rootblkno, BT_READ);
+ rootbuf = _bt_getbuf(rel, rootblkno, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -399,7 +406,8 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
rel->rd_amcache = NULL;
}
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
metad = _bt_getmeta(rel, metabuf);
/* if no root page initialized yet, do it */
@@ -535,7 +543,8 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
for (;;)
{
- rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -588,6 +597,7 @@ _bt_gettrueroot(Relation rel)
BlockNumber rootblkno;
uint32 rootlevel;
BTMetaPageData *metad;
+ bool hit;
/*
* We don't try to use cached metapage data here, since (a) this path is
@@ -599,7 +609,8 @@ _bt_gettrueroot(Relation rel)
pfree(rel->rd_amcache);
rel->rd_amcache = NULL;
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metaopaque = BTPageGetOpaque(metapg);
metad = BTPageGetMeta(metapg);
@@ -638,7 +649,8 @@ _bt_gettrueroot(Relation rel)
for (;;)
{
- rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -679,8 +691,10 @@ _bt_getrootheight(Relation rel)
if (rel->rd_amcache == NULL)
{
Buffer metabuf;
+ bool hit;
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -743,8 +757,10 @@ _bt_metaversion(Relation rel, bool *heapkeyspace, bool *allequalimage)
if (rel->rd_amcache == NULL)
{
Buffer metabuf;
+ bool hit;
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -842,14 +858,14 @@ _bt_checkpage(Relation rel, Buffer buf)
* as _bt_lockbuf().
*/
Buffer
-_bt_getbuf(Relation rel, BlockNumber blkno, int access)
+_bt_getbuf(Relation rel, BlockNumber blkno, int access, bool *hit)
{
Buffer buf;
Assert(BlockNumberIsValid(blkno));
/* Read an existing block of the relation */
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, hit);
_bt_lockbuf(rel, buf, access);
_bt_checkpage(rel, buf);
@@ -903,7 +919,7 @@ _bt_allocbuf(Relation rel, Relation heaprel)
blkno = GetFreeIndexPage(rel);
if (blkno == InvalidBlockNumber)
break;
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
if (_bt_conditionallockbuf(rel, buf))
{
page = BufferGetPage(buf);
@@ -1000,14 +1016,14 @@ _bt_allocbuf(Relation rel, Relation heaprel)
* is when the target page is the same one already in the buffer.
*/
Buffer
-_bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
+_bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access, bool *hit)
{
Buffer buf;
Assert(BlockNumberIsValid(blkno));
if (BufferIsValid(obuf))
_bt_unlockbuf(rel, obuf);
- buf = ReleaseAndReadBuffer(obuf, rel, blkno);
+ buf = ReleaseAndReadBuffer(obuf, rel, blkno, hit);
_bt_lockbuf(rel, buf, access);
_bt_checkpage(rel, buf);
@@ -1703,7 +1719,7 @@ _bt_leftsib_splitflag(Relation rel, BlockNumber leftsib, BlockNumber target)
if (leftsib == P_NONE)
return false;
- buf = _bt_getbuf(rel, leftsib, BT_READ);
+ buf = _bt_getbuf(rel, leftsib, BT_READ, NULL);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
@@ -1758,7 +1774,7 @@ _bt_rightsib_halfdeadflag(Relation rel, BlockNumber leafrightsib)
Assert(leafrightsib != P_NONE);
- buf = _bt_getbuf(rel, leafrightsib, BT_READ);
+ buf = _bt_getbuf(rel, leafrightsib, BT_READ, NULL);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
@@ -2062,7 +2078,7 @@ _bt_pagedel(Relation rel, Buffer leafbuf, BTVacState *vstate)
if (!rightsib_empty)
break;
- leafbuf = _bt_getbuf(rel, rightsib, BT_WRITE);
+ leafbuf = _bt_getbuf(rel, rightsib, BT_WRITE, NULL);
}
}
@@ -2335,6 +2351,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
uint32 targetlevel;
IndexTuple leafhikey;
BlockNumber leaftopparent;
+ bool hit;
page = BufferGetPage(leafbuf);
opaque = BTPageGetOpaque(page);
@@ -2374,7 +2391,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
Assert(target != leafblkno);
/* Fetch the block number of the target's left sibling */
- buf = _bt_getbuf(rel, target, BT_READ);
+ buf = _bt_getbuf(rel, target, BT_READ, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
leftsib = opaque->btpo_prev;
@@ -2401,7 +2419,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
_bt_lockbuf(rel, leafbuf, BT_WRITE);
if (leftsib != P_NONE)
{
- lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
+ lbuf = _bt_getbuf(rel, leftsib, BT_WRITE, NULL);
page = BufferGetPage(lbuf);
opaque = BTPageGetOpaque(page);
while (P_ISDELETED(opaque) || opaque->btpo_next != target)
@@ -2449,7 +2467,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
CHECK_FOR_INTERRUPTS();
/* step right one page */
- lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
+ lbuf = _bt_getbuf(rel, leftsib, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(lbuf);
opaque = BTPageGetOpaque(page);
}
@@ -2513,7 +2532,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
* And next write-lock the (current) right sibling.
*/
rightsib = opaque->btpo_next;
- rbuf = _bt_getbuf(rel, rightsib, BT_WRITE);
+ rbuf = _bt_getbuf(rel, rightsib, BT_WRITE, &hit);
+ pgstat_count_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(rbuf);
opaque = BTPageGetOpaque(page);
@@ -2569,7 +2589,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
if (P_RIGHTMOST(opaque))
{
/* rightsib will be the only one left on the level */
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
+ pgstat_count_metadata_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index dc244ae24c..7aa24e1f12 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1146,7 +1146,7 @@ backtrack:
* buffer access strategy.
*/
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- info->strategy);
+ info->strategy, NULL);
_bt_lockbuf(rel, buf, BT_READ);
page = BufferGetPage(buf);
opaque = NULL;
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 472ce06f19..203816c418 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -126,6 +126,7 @@ _bt_search(Relation rel, Relation heaprel, BTScanInsert key, Buffer *bufP,
IndexTuple itup;
BlockNumber child;
BTStack new_stack;
+ bool hit;
/*
* Race -- the page we just grabbed may have split since we read its
@@ -178,7 +179,8 @@ _bt_search(Relation rel, Relation heaprel, BTScanInsert key, Buffer *bufP,
page_access = BT_WRITE;
/* drop the read lock on the page, then acquire one on its child */
- *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access);
+ *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access, &hit);
+ pgstat_count_buffer(rel, opaque->btpo_level != 1, hit);
/* okay, all set to move down a level */
stack_in = new_stack;
@@ -249,6 +251,7 @@ _bt_moveright(Relation rel,
Page page;
BTPageOpaque opaque;
int32 cmpval;
+ bool hit;
Assert(!forupdate || heaprel != NULL);
@@ -299,14 +302,16 @@ _bt_moveright(Relation rel,
_bt_relbuf(rel, buf);
/* re-acquire the lock in the right mode, and re-check */
- buf = _bt_getbuf(rel, blkno, access);
+ buf = _bt_getbuf(rel, blkno, access, &hit);
+ pgstat_count_buffer(rel, !P_ISLEAF(opaque), hit);
continue;
}
if (P_IGNORE(opaque) || _bt_compare(rel, key, page, P_HIKEY) >= cmpval)
{
/* step right one page */
- buf = _bt_relandgetbuf(rel, buf, opaque->btpo_next, access);
+ buf = _bt_relandgetbuf(rel, buf, opaque->btpo_next, access, &hit);
+ pgstat_count_buffer(rel, !P_ISLEAF(opaque), hit);
continue;
}
else
@@ -2200,6 +2205,8 @@ static bool
_bt_readnextpage(IndexScanDesc scan, BlockNumber blkno,
BlockNumber lastcurrblkno, ScanDirection dir, bool seized)
{
+ bool hit;
+
Relation rel = scan->indexRelation;
BTScanOpaque so = (BTScanOpaque) scan->opaque;
@@ -2246,7 +2253,8 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno,
{
/* read blkno, but check for interrupts first */
CHECK_FOR_INTERRUPTS();
- so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
+ so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ, &hit);
+ pgstat_count_record_buffer(rel, hit);
}
else
{
@@ -2342,10 +2350,11 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
Page page;
BTPageOpaque opaque;
int tries;
+ bool hit;
/* check for interrupts while we're not holding any buffer lock */
CHECK_FOR_INTERRUPTS();
- buf = _bt_getbuf(rel, *blkno, BT_READ);
+ buf = _bt_getbuf(rel, *blkno, BT_READ, NULL);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
@@ -2372,7 +2381,8 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
break;
/* step right */
*blkno = opaque->btpo_next;
- buf = _bt_relandgetbuf(rel, buf, *blkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, *blkno, BT_READ, &hit);
+ pgstat_count_buffer(rel, P_ISLEAF(opaque), hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
}
@@ -2382,7 +2392,7 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
* _bt_readpage, which is passed by caller as lastcurrblkno) to see
* what's up with its prev sibling link
*/
- buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ, NULL);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
if (P_ISDELETED(opaque))
@@ -2399,7 +2409,8 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
elog(ERROR, "fell off the end of index \"%s\"",
RelationGetRelationName(rel));
lastcurrblkno = opaque->btpo_next;
- buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ, &hit);
+ pgstat_count_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
if (!P_ISDELETED(opaque))
@@ -2456,6 +2467,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
OffsetNumber offnum;
BlockNumber blkno;
IndexTuple itup;
+ bool hit;
/*
* If we are looking for a leaf page, okay to descend from fast root;
@@ -2488,7 +2500,8 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
if (blkno == P_NONE)
elog(ERROR, "fell off the end of index \"%s\"",
RelationGetRelationName(rel));
- buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ, &hit);
+ pgstat_count_record_buffer(rel, hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
}
@@ -2511,7 +2524,8 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
blkno = BTreeTupleGetDownLink(itup);
- buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ, &hit);
+ pgstat_count_record_buffer(rel, hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
}
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 693e43c674..a550faa84e 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -2368,8 +2368,9 @@ _bt_killitems(IndexScanDesc scan)
Buffer buf;
droppedpin = true;
- /* Attempt to re-read the buffer, getting pin and lock. */
- buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
+ /* Attempt to re-read the buffer, getting pin andlock. */
+ buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ,
+ NULL);
page = BufferGetPage(buf);
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index af6b27b213..96ca5a91a2 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -2065,13 +2065,13 @@ spgdoinsert(Relation index, SpGistState *state,
else if (parent.buffer == InvalidBuffer)
{
/* we hold no parent-page lock, so no deadlock is possible */
- current.buffer = ReadBuffer(index, current.blkno);
+ current.buffer = ReadBuffer(index, current.blkno, NULL);
LockBuffer(current.buffer, BUFFER_LOCK_EXCLUSIVE);
}
else if (current.blkno != parent.blkno)
{
/* descend to a new child page */
- current.buffer = ReadBuffer(index, current.blkno);
+ current.buffer = ReadBuffer(index, current.blkno, NULL);
/*
* Attempt to acquire lock on child page. We must beware of
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 53f910e9d8..85f6e91af0 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -847,13 +847,13 @@ redirect:
if (buffer == InvalidBuffer)
{
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
}
else if (blkno != BufferGetBlockNumber(buffer))
{
UnlockReleaseBuffer(buffer);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
}
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 367c36ef9a..ec0c455e6d 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -267,7 +267,7 @@ spgGetCache(Relation index)
Buffer metabuffer;
SpGistMetaPageData *metadata;
- metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO, NULL);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
@@ -407,7 +407,7 @@ SpGistNewBuffer(Relation index)
if (SpGistBlockIsFixed(blkno))
continue;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
/*
* We have to guard against the possibility that someone else already
@@ -452,7 +452,7 @@ SpGistUpdateMetaPage(Relation index)
{
Buffer metabuffer;
- metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO, NULL);
if (ConditionalLockBuffer(metabuffer))
{
@@ -601,7 +601,7 @@ SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
Buffer buffer;
Page page;
- buffer = ReadBuffer(index, lup->blkno);
+ buffer = ReadBuffer(index, lup->blkno, NULL);
if (!ConditionalLockBuffer(buffer))
{
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index eeddacd0d5..b4cf3470b6 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -628,7 +628,7 @@ spgvacuumpage(spgBulkDeleteState *bds, BlockNumber blkno)
vacuum_delay_point(false);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, bds->info->strategy);
+ RBM_NORMAL, bds->info->strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
@@ -709,7 +709,7 @@ spgprocesspending(spgBulkDeleteState *bds)
/* examine the referenced page */
blkno = ItemPointerGetBlockNumber(&pitem->tid);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, bds->info->strategy);
+ RBM_NORMAL, bds->info->strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 14d583ae7a..8efecc6e19 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -1300,7 +1300,7 @@ log_newpage_range(Relation rel, ForkNumber forknum,
while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
{
Buffer buf = ReadBufferExtended(rel, forknum, blkno,
- RBM_NORMAL, NULL);
+ RBM_NORMAL, NULL, NULL);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eff0990957..7567e4987f 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -758,6 +758,10 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
I.idx_blks_read AS idx_blks_read,
I.idx_blks_hit AS idx_blks_hit,
+ I.idx_metadata_blks_read AS idx_metadata_blks_read,
+ I.idx_metadata_blks_hit AS idx_metadata_blks_hit,
+ I.idx_record_blks_read AS idx_record_blks_read,
+ I.idx_record_blks_hit AS idx_record_blks_hit,
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
@@ -771,7 +775,17 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(indexrelid))::bigint
AS idx_blks_read,
sum(pg_stat_get_blocks_hit(indexrelid))::bigint
- AS idx_blks_hit
+ AS idx_blks_hit,
+ sum(pg_stat_get_metadata_blocks_fetched(indexrelid) -
+ pg_stat_get_metadata_blocks_hit(indexrelid))::bigint
+ AS idx_metadata_blks_read,
+ sum(pg_stat_get_metadata_blocks_hit(indexrelid))::bigint
+ AS idx_metadata_blks_hit,
+ sum(pg_stat_get_record_blocks_fetched(indexrelid) -
+ pg_stat_get_record_blocks_hit(indexrelid))::bigint
+ AS idx_record_blks_read,
+ sum(pg_stat_get_record_blocks_hit(indexrelid))::bigint
+ AS idx_record_blks_hit
FROM pg_index WHERE indrelid = C.oid ) I ON true
LEFT JOIN LATERAL (
SELECT sum(pg_stat_get_blocks_fetched(indexrelid) -
@@ -828,7 +842,13 @@ CREATE VIEW pg_statio_all_indexes AS
I.relname AS indexrelname,
pg_stat_get_blocks_fetched(I.oid) -
pg_stat_get_blocks_hit(I.oid) AS idx_blks_read,
- pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit
+ pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit,
+ pg_stat_get_metadata_blocks_fetched(I.oid) -
+ pg_stat_get_metadata_blocks_hit(I.oid) AS idx_metadata_blks_read,
+ pg_stat_get_metadata_blocks_hit(I.oid) AS idx_metadata_blks_hit,
+ pg_stat_get_record_blocks_fetched(I.oid) -
+ pg_stat_get_record_blocks_hit(I.oid) AS idx_record_blks_read,
+ pg_stat_get_record_blocks_hit(I.oid) AS idx_record_blks_hit
FROM pg_class C JOIN
pg_index X ON C.oid = X.indrelid JOIN
pg_class I ON I.oid = X.indexrelid
@@ -1062,6 +1082,12 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_blocks_fetched(D.oid) -
pg_stat_get_db_blocks_hit(D.oid) AS blks_read,
pg_stat_get_db_blocks_hit(D.oid) AS blks_hit,
+ pg_stat_get_db_metadata_blocks_fetched(D.oid) -
+ pg_stat_get_db_metadata_blocks_hit(D.oid) AS metadata_blks_read,
+ pg_stat_get_db_metadata_blocks_hit(D.oid) AS metadata_blks_hit,
+ pg_stat_get_db_record_blocks_fetched(D.oid) -
+ pg_stat_get_db_record_blocks_hit(D.oid) AS record_blks_read,
+ pg_stat_get_db_record_blocks_hit(D.oid) AS record_blks_hit,
pg_stat_get_db_tuples_returned(D.oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 4b7c5113aa..270c9bf826 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -1194,7 +1194,7 @@ read_seq_tuple(Relation rel, Buffer *buf, HeapTuple seqdatatuple)
sequence_magic *sm;
Form_pg_sequence_data seq;
- *buf = ReadBuffer(rel, 0);
+ *buf = ReadBuffer(rel, 0, NULL);
LockBuffer(*buf, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(*buf);
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 99e44ed99f..03fbc85eac 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -267,7 +267,8 @@ read_stream_start_pending_read(ReadStream *stream, bool suppress_advice)
&stream->buffers[buffer_index],
stream->pending_read_blocknum,
&nblocks,
- flags);
+ flags,
+ NULL);
stream->pinned_buffers += nblocks;
/* Remember whether we need to wait before returning this buffer. */
@@ -659,7 +660,8 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
&stream->buffers[oldest_buffer_index],
next_blocknum,
stream->advice_enabled ?
- READ_BUFFERS_ISSUE_ADVICE : 0)))
+ READ_BUFFERS_ISSUE_ADVICE : 0,
+ NULL)))
{
/* Fast return. */
return buffer;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 75cfc2b6fe..6acb8089a5 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -486,7 +486,8 @@ ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref)
static Buffer ReadBuffer_common(Relation rel,
SMgrRelation smgr, char smgr_persistence,
ForkNumber forkNum, BlockNumber blockNum,
- ReadBufferMode mode, BufferAccessStrategy strategy);
+ ReadBufferMode mode, BufferAccessStrategy strategy,
+ bool *hit);
static BlockNumber ExtendBufferedRelCommon(BufferManagerRelation bmr,
ForkNumber fork,
BufferAccessStrategy strategy,
@@ -743,9 +744,9 @@ ReadRecentBuffer(RelFileLocator rlocator, ForkNumber forkNum, BlockNumber blockN
* fork with RBM_NORMAL mode and default strategy.
*/
Buffer
-ReadBuffer(Relation reln, BlockNumber blockNum)
+ReadBuffer(Relation reln, BlockNumber blockNum, bool *hit)
{
- return ReadBufferExtended(reln, MAIN_FORKNUM, blockNum, RBM_NORMAL, NULL);
+ return ReadBufferExtended(reln, MAIN_FORKNUM, blockNum, RBM_NORMAL, NULL, hit);
}
/*
@@ -791,7 +792,7 @@ ReadBuffer(Relation reln, BlockNumber blockNum)
*/
inline Buffer
ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
- ReadBufferMode mode, BufferAccessStrategy strategy)
+ ReadBufferMode mode, BufferAccessStrategy strategy, bool *hit)
{
Buffer buf;
@@ -810,7 +811,7 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
* miss.
*/
buf = ReadBuffer_common(reln, RelationGetSmgr(reln), 0,
- forkNum, blockNum, mode, strategy);
+ forkNum, blockNum, mode, strategy, hit);
return buf;
}
@@ -836,7 +837,7 @@ ReadBufferWithoutRelcache(RelFileLocator rlocator, ForkNumber forkNum,
return ReadBuffer_common(NULL, smgr,
permanent ? RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED,
forkNum, blockNum,
- mode, strategy);
+ mode, strategy, NULL);
}
/*
@@ -1004,7 +1005,7 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
{
Assert(extended_by == 0);
buffer = ReadBuffer_common(bmr.rel, bmr.smgr, bmr.relpersistence,
- fork, extend_to - 1, mode, strategy);
+ fork, extend_to - 1, mode, strategy, NULL);
}
return buffer;
@@ -1109,7 +1110,8 @@ PinBufferForBlock(Relation rel,
ForkNumber forkNum,
BlockNumber blockNum,
BufferAccessStrategy strategy,
- bool *foundPtr)
+ bool *foundPtr,
+ bool *hit)
{
BufferDesc *bufHdr;
IOContext io_context;
@@ -1160,8 +1162,11 @@ PinBufferForBlock(Relation rel,
* zeroed instead), the per-relation stats always count them.
*/
pgstat_count_buffer_read(rel);
- if (*foundPtr)
+ if (*foundPtr) {
+ if (hit)
+ *hit = true;
pgstat_count_buffer_hit(rel);
+ }
}
if (*foundPtr)
{
@@ -1189,7 +1194,8 @@ static pg_attribute_always_inline Buffer
ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
ForkNumber forkNum,
BlockNumber blockNum, ReadBufferMode mode,
- BufferAccessStrategy strategy)
+ BufferAccessStrategy strategy,
+ bool *hit)
{
ReadBuffersOperation operation;
Buffer buffer;
@@ -1227,7 +1233,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
bool found;
buffer = PinBufferForBlock(rel, smgr, persistence,
- forkNum, blockNum, strategy, &found);
+ forkNum, blockNum, strategy, &found, hit);
ZeroAndLockBuffer(buffer, mode, found);
return buffer;
}
@@ -1244,7 +1250,8 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
if (StartReadBuffer(&operation,
&buffer,
blockNum,
- flags))
+ flags,
+ hit))
WaitReadBuffers(&operation);
return buffer;
@@ -1255,7 +1262,8 @@ StartReadBuffersImpl(ReadBuffersOperation *operation,
Buffer *buffers,
BlockNumber blockNum,
int *nblocks,
- int flags)
+ int flags,
+ bool *hit)
{
int actual_nblocks = *nblocks;
int io_buffers_len = 0;
@@ -1274,7 +1282,8 @@ StartReadBuffersImpl(ReadBuffersOperation *operation,
operation->forknum,
blockNum + i,
operation->strategy,
- &found);
+ &found,
+ hit);
if (found)
{
@@ -1365,9 +1374,10 @@ StartReadBuffers(ReadBuffersOperation *operation,
Buffer *buffers,
BlockNumber blockNum,
int *nblocks,
- int flags)
+ int flags,
+ bool *hit)
{
- return StartReadBuffersImpl(operation, buffers, blockNum, nblocks, flags);
+ return StartReadBuffersImpl(operation, buffers, blockNum, nblocks, flags, hit);
}
/*
@@ -1379,12 +1389,13 @@ bool
StartReadBuffer(ReadBuffersOperation *operation,
Buffer *buffer,
BlockNumber blocknum,
- int flags)
+ int flags,
+ bool *hit)
{
int nblocks = 1;
bool result;
- result = StartReadBuffersImpl(operation, buffer, blocknum, &nblocks, flags);
+ result = StartReadBuffersImpl(operation, buffer, blocknum, &nblocks, flags, hit);
Assert(nblocks == 1); /* single block can't be short */
return result;
@@ -2590,7 +2601,8 @@ MarkBufferDirty(Buffer buffer)
Buffer
ReleaseAndReadBuffer(Buffer buffer,
Relation relation,
- BlockNumber blockNum)
+ BlockNumber blockNum,
+ bool *hit)
{
ForkNumber forkNum = MAIN_FORKNUM;
BufferDesc *bufHdr;
@@ -2619,7 +2631,7 @@ ReleaseAndReadBuffer(Buffer buffer,
}
}
- return ReadBuffer(relation, blockNum);
+ return ReadBuffer(relation, blockNum, hit);
}
/*
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 4773a9cc65..f9f6259ad4 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -593,7 +593,7 @@ fsm_readbuf(Relation rel, FSMAddress addr, bool extend)
return InvalidBuffer;
}
else
- buf = ReadBufferExtended(rel, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR, NULL);
+ buf = ReadBufferExtended(rel, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR, NULL, NULL);
/*
* Initializing the page when needed is trickier than it looks, because of
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 05a8ccfdb7..bbfd75715d 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -407,6 +407,10 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
PGSTAT_ACCUM_DBCOUNT(xact_rollback);
PGSTAT_ACCUM_DBCOUNT(blocks_fetched);
PGSTAT_ACCUM_DBCOUNT(blocks_hit);
+ PGSTAT_ACCUM_DBCOUNT(metadata_blocks_fetched);
+ PGSTAT_ACCUM_DBCOUNT(metadata_blocks_hit);
+ PGSTAT_ACCUM_DBCOUNT(record_blocks_fetched);
+ PGSTAT_ACCUM_DBCOUNT(record_blocks_hit);
PGSTAT_ACCUM_DBCOUNT(tuples_returned);
PGSTAT_ACCUM_DBCOUNT(tuples_fetched);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index d64595a165..c879c3b2a6 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -871,6 +871,10 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
tabentry->ins_since_vacuum += lstats->counts.tuples_inserted;
tabentry->blocks_fetched += lstats->counts.blocks_fetched;
tabentry->blocks_hit += lstats->counts.blocks_hit;
+ tabentry->metadata_blocks_fetched += lstats->counts.metadata_blocks_fetched;
+ tabentry->metadata_blocks_hit += lstats->counts.metadata_blocks_hit;
+ tabentry->record_blocks_fetched += lstats->counts.record_blocks_fetched;
+ tabentry->record_blocks_hit += lstats->counts.record_blocks_hit;
/* Clamp live_tuples in case of negative delta_live_tuples */
tabentry->live_tuples = Max(tabentry->live_tuples, 0);
@@ -888,6 +892,10 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
dbentry->tuples_deleted += lstats->counts.tuples_deleted;
dbentry->blocks_fetched += lstats->counts.blocks_fetched;
dbentry->blocks_hit += lstats->counts.blocks_hit;
+ dbentry->metadata_blocks_fetched += lstats->counts.metadata_blocks_fetched;
+ dbentry->metadata_blocks_hit += lstats->counts.metadata_blocks_hit;
+ dbentry->record_blocks_fetched += lstats->counts.record_blocks_fetched;
+ dbentry->record_blocks_hit += lstats->counts.record_blocks_hit;
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index e9096a8849..565a96773d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -67,6 +67,18 @@ PG_STAT_GET_RELENTRY_INT64(blocks_fetched)
/* pg_stat_get_blocks_hit */
PG_STAT_GET_RELENTRY_INT64(blocks_hit)
+/* pg_stat_get_metadata_blocks_fetched */
+PG_STAT_GET_RELENTRY_INT64(metadata_blocks_fetched)
+
+/* pg_stat_get_metadata_blocks_hit */
+PG_STAT_GET_RELENTRY_INT64(metadata_blocks_hit)
+
+/* pg_stat_get_record_blocks_fetched */
+PG_STAT_GET_RELENTRY_INT64(record_blocks_fetched)
+
+/* pg_stat_get_record_blocks_hit */
+PG_STAT_GET_RELENTRY_INT64(record_blocks_hit)
+
/* pg_stat_get_dead_tuples */
PG_STAT_GET_RELENTRY_INT64(dead_tuples)
@@ -1031,6 +1043,18 @@ PG_STAT_GET_DBENTRY_INT64(blocks_fetched)
/* pg_stat_get_db_blocks_hit */
PG_STAT_GET_DBENTRY_INT64(blocks_hit)
+/* pg_stat_get_db_metadata_blocks_fetched */
+PG_STAT_GET_DBENTRY_INT64(metadata_blocks_fetched)
+
+/* pg_stat_get_db_metadata_blocks_hit */
+PG_STAT_GET_DBENTRY_INT64(metadata_blocks_hit)
+
+/* pg_stat_get_db_record_blocks_fetched */
+PG_STAT_GET_DBENTRY_INT64(record_blocks_fetched)
+
+/* pg_stat_get_db_record_blocks_hit */
+PG_STAT_GET_DBENTRY_INT64(record_blocks_hit)
+
/* pg_stat_get_db_conflict_bufferpin */
PG_STAT_GET_DBENTRY_INT64(conflict_bufferpin)
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 000c7289b8..9d7e2a6e45 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1247,10 +1247,10 @@ extern int _bt_getrootheight(Relation rel);
extern void _bt_metaversion(Relation rel, bool *heapkeyspace,
bool *allequalimage);
extern void _bt_checkpage(Relation rel, Buffer buf);
-extern Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access);
+extern Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access, bool *hit);
extern Buffer _bt_allocbuf(Relation rel, Relation heaprel);
extern Buffer _bt_relandgetbuf(Relation rel, Buffer obuf,
- BlockNumber blkno, int access);
+ BlockNumber blkno, int access, bool *hit);
extern void _bt_relbuf(Relation rel, Buffer buf);
extern void _bt_lockbuf(Relation rel, Buffer buf, int access);
extern void _bt_unlockbuf(Relation rel, Buffer buf);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d..5902827510 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5517,6 +5517,22 @@
proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_blocks_hit' },
+ { oid => '8888', descr => 'statistics: number of record blocks fetched',
+ proname => 'pg_stat_get_record_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_record_blocks_fetched' },
+{ oid => '8889', descr => 'statistics: number of record blocks found in cache',
+ proname => 'pg_stat_get_record_blocks_hit', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_record_blocks_hit' },
+{ oid => '8890', descr => 'statistics: number of metadata blocks fetched',
+ proname => 'pg_stat_get_metadata_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_metadata_blocks_fetched' },
+{ oid => '8891', descr => 'statistics: number of metadata blocks found in cache',
+ proname => 'pg_stat_get_metadata_blocks_hit', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_metadata_blocks_hit' },
{ oid => '2781', descr => 'statistics: last manual vacuum time for a table',
proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5717,6 +5733,22 @@
proname => 'pg_stat_get_db_blocks_hit', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_blocks_hit' },
+{ oid => '8892', descr => 'statistics: number of db record blocks fetched',
+ proname => 'pg_stat_get_db_record_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_record_blocks_fetched' },
+{ oid => '8893', descr => 'statistics: blocks found in cache for database',
+ proname => 'pg_stat_get_db_record_blocks_hit', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_record_blocks_hit' },
+{ oid => '8894', descr => 'statistics: number of metadata blocks fetched',
+ proname => 'pg_stat_get_db_metadata_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_metadata_blocks_fetched' },
+{ oid => '8895', descr => 'statistics: number of metadata blocks found in cache',
+ proname => 'pg_stat_get_db_metadata_blocks_hit', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_metadata_blocks_hit' },
{ oid => '2758', descr => 'statistics: tuples returned for database',
proname => 'pg_stat_get_db_tuples_returned', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 53f2a8458e..b846ef7529 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -154,6 +154,11 @@ typedef struct PgStat_TableCounts
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+
+ PgStat_Counter metadata_blocks_fetched;
+ PgStat_Counter metadata_blocks_hit;
+ PgStat_Counter record_blocks_fetched;
+ PgStat_Counter record_blocks_hit;
} PgStat_TableCounts;
/* ----------
@@ -364,6 +369,10 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter xact_rollback;
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter metadata_blocks_fetched;
+ PgStat_Counter metadata_blocks_hit;
+ PgStat_Counter record_blocks_fetched;
+ PgStat_Counter record_blocks_hit;
PgStat_Counter tuples_returned;
PgStat_Counter tuples_fetched;
PgStat_Counter tuples_inserted;
@@ -459,6 +468,11 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter metadata_blocks_fetched;
+ PgStat_Counter metadata_blocks_hit;
+ PgStat_Counter record_blocks_fetched;
+ PgStat_Counter record_blocks_hit;
+
TimestampTz last_vacuum_time; /* user initiated vacuum */
PgStat_Counter vacuum_count;
TimestampTz last_autovacuum_time; /* autovacuum initiated */
@@ -707,6 +721,37 @@ extern void pgstat_report_analyze(Relation rel,
if (pgstat_should_count_relation(rel)) \
(rel)->pgstat_info->counts.blocks_hit++; \
} while (0)
+#define pgstat_count_metadata_buffer(rel, hit) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ (rel)->pgstat_info->counts.metadata_blocks_fetched++; \
+ if ((hit)) \
+ (rel)->pgstat_info->counts.metadata_blocks_hit++; \
+ } \
+ } while (0)
+#define pgstat_count_record_buffer(rel, hit) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ (rel)->pgstat_info->counts.record_blocks_fetched++; \
+ if ((hit)) \
+ (rel)->pgstat_info->counts.record_blocks_hit++; \
+ } \
+ } while (0)
+#define pgstat_count_buffer(rel, metadata, hit) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ if ((metadata)) { \
+ (rel)->pgstat_info->counts.metadata_blocks_fetched++;\
+ if ((hit)) \
+ (rel)->pgstat_info->counts.metadata_blocks_hit++;\
+ } \
+ else { \
+ (rel)->pgstat_info->counts.record_blocks_fetched++; \
+ if ((hit)) \
+ (rel)->pgstat_info->counts.record_blocks_hit++; \
+ } \
+ } \
+ } while (0)
extern void pgstat_count_heap_insert(Relation rel, PgStat_Counter n);
extern void pgstat_count_heap_update(Relation rel, bool hot, bool newpage);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 7c1e4316dd..8f287f0841 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -201,10 +201,10 @@ extern PrefetchBufferResult PrefetchBuffer(Relation reln, ForkNumber forkNum,
BlockNumber blockNum);
extern bool ReadRecentBuffer(RelFileLocator rlocator, ForkNumber forkNum,
BlockNumber blockNum, Buffer recent_buffer);
-extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
+extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum, bool *hit);
extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
BlockNumber blockNum, ReadBufferMode mode,
- BufferAccessStrategy strategy);
+ BufferAccessStrategy strategy, bool *hit);
extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator,
ForkNumber forkNum, BlockNumber blockNum,
ReadBufferMode mode, BufferAccessStrategy strategy,
@@ -213,12 +213,14 @@ extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator,
extern bool StartReadBuffer(ReadBuffersOperation *operation,
Buffer *buffer,
BlockNumber blocknum,
- int flags);
+ int flags,
+ bool *hit);
extern bool StartReadBuffers(ReadBuffersOperation *operation,
Buffer *buffers,
BlockNumber blockNum,
int *nblocks,
- int flags);
+ int flags,
+ bool *hit);
extern void WaitReadBuffers(ReadBuffersOperation *operation);
extern void ReleaseBuffer(Buffer buffer);
@@ -229,7 +231,7 @@ extern void MarkBufferDirty(Buffer buffer);
extern void IncrBufferRefCount(Buffer buffer);
extern void CheckBufferIsPinnedOnce(Buffer buffer);
extern Buffer ReleaseAndReadBuffer(Buffer buffer, Relation relation,
- BlockNumber blockNum);
+ BlockNumber blockNum, bool *hit);
extern Buffer ExtendBufferedRel(BufferManagerRelation bmr,
ForkNumber forkNum,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5baba8d39f..5217cc74a9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1847,6 +1847,10 @@ pg_stat_database| SELECT oid AS datid,
pg_stat_get_db_xact_rollback(oid) AS xact_rollback,
(pg_stat_get_db_blocks_fetched(oid) - pg_stat_get_db_blocks_hit(oid)) AS blks_read,
pg_stat_get_db_blocks_hit(oid) AS blks_hit,
+ (pg_stat_get_db_metadata_blocks_fetched(oid) - pg_stat_get_db_metadata_blocks_hit(oid)) AS metadata_blks_read,
+ pg_stat_get_db_metadata_blocks_hit(oid) AS metadata_blks_hit,
+ (pg_stat_get_db_record_blocks_fetched(oid) - pg_stat_get_db_record_blocks_hit(oid)) AS record_blks_read,
+ pg_stat_get_db_record_blocks_hit(oid) AS record_blks_hit,
pg_stat_get_db_tuples_returned(oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(oid) AS tup_inserted,
@@ -2342,7 +2346,11 @@ pg_statio_all_indexes| SELECT c.oid AS relid,
c.relname,
i.relname AS indexrelname,
(pg_stat_get_blocks_fetched(i.oid) - pg_stat_get_blocks_hit(i.oid)) AS idx_blks_read,
- pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit
+ pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit,
+ (pg_stat_get_metadata_blocks_fetched(i.oid) - pg_stat_get_metadata_blocks_hit(i.oid)) AS idx_metadata_blks_read,
+ pg_stat_get_metadata_blocks_hit(i.oid) AS idx_metadata_blks_hit,
+ (pg_stat_get_record_blocks_fetched(i.oid) - pg_stat_get_record_blocks_hit(i.oid)) AS idx_record_blks_read,
+ pg_stat_get_record_blocks_hit(i.oid) AS idx_record_blks_hit
FROM (((pg_class c
JOIN pg_index x ON ((c.oid = x.indrelid)))
JOIN pg_class i ON ((i.oid = x.indexrelid)))
@@ -2363,6 +2371,10 @@ pg_statio_all_tables| SELECT c.oid AS relid,
pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
i.idx_blks_read,
i.idx_blks_hit,
+ i.idx_metadata_blks_read,
+ i.idx_metadata_blks_hit,
+ i.idx_record_blks_read,
+ i.idx_record_blks_hit,
(pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read,
pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
x.idx_blks_read AS tidx_blks_read,
@@ -2371,7 +2383,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
- (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
+ (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit,
+ (sum((pg_stat_get_metadata_blocks_fetched(pg_index.indexrelid) - pg_stat_get_metadata_blocks_hit(pg_index.indexrelid))))::bigint AS idx_metadata_blks_read,
+ (sum(pg_stat_get_metadata_blocks_hit(pg_index.indexrelid)))::bigint AS idx_metadata_blks_hit,
+ (sum((pg_stat_get_record_blocks_fetched(pg_index.indexrelid) - pg_stat_get_record_blocks_hit(pg_index.indexrelid))))::bigint AS idx_record_blks_read,
+ (sum(pg_stat_get_record_blocks_hit(pg_index.indexrelid)))::bigint AS idx_record_blks_hit
FROM pg_index
WHERE (pg_index.indrelid = c.oid)) i ON (true))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2385,7 +2401,11 @@ pg_statio_sys_indexes| SELECT relid,
relname,
indexrelname,
idx_blks_read,
- idx_blks_hit
+ idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit
FROM pg_statio_all_indexes
WHERE ((schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (schemaname ~ '^pg_toast'::text));
pg_statio_sys_sequences| SELECT relid,
@@ -2402,6 +2422,10 @@ pg_statio_sys_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
@@ -2414,7 +2438,11 @@ pg_statio_user_indexes| SELECT relid,
relname,
indexrelname,
idx_blks_read,
- idx_blks_hit
+ idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit
FROM pg_statio_all_indexes
WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text));
pg_statio_user_sequences| SELECT relid,
@@ -2431,6 +2459,10 @@ pg_statio_user_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
--
2.39.5 (Apple Git-154)
Hi,
Just attaching the complete patch which now covers all index types, docs
and tests.
You can run the following to see it in action:
create table test (id serial primary key);
insert into test select * from generate_series(0,30000);
select pg_stat_reset();
select * from test where id=3000;
select * from pg_statio_all_indexes where indexrelname = 'test_pkey';
This will show that there were 2 index blocks read from shared buffers
(hit): 1 metadata and one record.
Cheers,
Mircea
Show quoted text
On 28/02/2025 21:58, Mircea Cadariu wrote:
Hi,
For the purpose of writing a blog post I was checking the index stats
recorded for a workload, but found them rather confusing. Following
along the code with the debugger it eventually made sense, and I could
eventually understand what's counted. Looking around a bit, I
discovered an older discussion [1] in the mailing lists and learned
that the issue is known. The proposal in that thread is to start
counting separate metadata and record stats depending on what type of
index block is retrieved.I realized those would have helped me better understand the collected
index stats, so I started working on a patch to add these in the
system views. Attached is a WIP patch file with partial coverage of
the B-Tree index code. The implementation follows the existing stats
collection approach and the naming convention proposed in [1]. Let me
know if what I'm doing is feasible and if there's any concerns I could
address. Next steps would be to replace all places where I currently
pass in NULL with proper counting, as well as update tests and docs.Looking forward to your feedback! Thanks!
Cheers,
Mircea[1]:
/messages/by-id/CAH2-WzmdZqxCS1widYzjDAM+Z-Jz=ejJoaWXDVw9Qy1UsK0tLA@mail.gmail.com
Attachments:
0001-Add-separate-record-leaf-and-metadata-stats-for-inde.patchtext/plain; charset=UTF-8; name=0001-Add-separate-record-leaf-and-metadata-stats-for-inde.patchDownload
From 74273804d72b3b80e13023c6526bcdf87bc4889b Mon Sep 17 00:00:00 2001
From: Mircea Cadariu <cadariu.mircea@gmail.com>
Date: Fri, 11 Apr 2025 20:30:31 +0100
Subject: [PATCH] Add separate record (leaf) and metadata stats for index
buffers in the system views.
To achieve this, we pass on a boolean flag from the index code to the bufmgr.
We use this back in the index code and update the counters accordingly depending
on whether it's a metadata or record block it's just read.
---
contrib/amcheck/verify_gin.c | 6 +-
contrib/amcheck/verify_nbtree.c | 6 +-
contrib/bloom/blinsert.c | 6 +-
contrib/bloom/blscan.c | 2 +-
contrib/bloom/blutils.c | 6 +-
contrib/bloom/blvacuum.c | 6 +-
contrib/pageinspect/btreefuncs.c | 8 +-
contrib/pageinspect/rawpage.c | 2 +-
contrib/pg_surgery/heap_surgery.c | 2 +-
contrib/pg_visibility/pg_visibility.c | 2 +-
contrib/pgstattuple/pgstatapprox.c | 2 +-
contrib/pgstattuple/pgstatindex.c | 8 +-
contrib/pgstattuple/pgstattuple.c | 8 +-
doc/src/sgml/monitoring.sgml | 110 ++++++++++++++
src/backend/access/brin/brin.c | 6 +-
src/backend/access/brin/brin_pageops.c | 8 +-
src/backend/access/brin/brin_revmap.c | 24 +++-
src/backend/access/gin/ginbtree.c | 20 ++-
src/backend/access/gin/ginfast.c | 25 +++-
src/backend/access/gin/ginget.c | 12 +-
src/backend/access/gin/ginutil.c | 13 +-
src/backend/access/gin/ginvacuum.c | 22 +--
src/backend/access/gist/gist.c | 23 ++-
src/backend/access/gist/gistbuild.c | 23 ++-
src/backend/access/gist/gistget.c | 8 +-
src/backend/access/gist/gistutil.c | 5 +-
src/backend/access/gist/gistvacuum.c | 6 +-
src/backend/access/hash/hash.c | 5 +-
src/backend/access/hash/hashpage.c | 21 ++-
src/backend/access/heap/heapam.c | 16 +--
src/backend/access/heap/heapam_handler.c | 6 +-
src/backend/access/heap/hio.c | 8 +-
src/backend/access/heap/vacuumlazy.c | 2 +-
src/backend/access/heap/visibilitymap.c | 2 +-
src/backend/access/nbtree/nbtinsert.c | 35 +++--
src/backend/access/nbtree/nbtpage.c | 67 ++++++---
src/backend/access/nbtree/nbtree.c | 2 +-
src/backend/access/nbtree/nbtsearch.c | 34 +++--
src/backend/access/nbtree/nbtutils.c | 5 +-
src/backend/access/spgist/spgdoinsert.c | 8 +-
src/backend/access/spgist/spgscan.c | 8 +-
src/backend/access/spgist/spgutils.c | 17 ++-
src/backend/access/spgist/spgvacuum.c | 2 +-
src/backend/access/transam/xloginsert.c | 2 +-
src/backend/catalog/system_views.sql | 30 +++-
src/backend/commands/sequence.c | 2 +-
src/backend/storage/aio/read_stream.c | 6 +-
src/backend/storage/buffer/bufmgr.c | 54 ++++---
src/backend/storage/freespace/freespace.c | 2 +-
src/backend/utils/activity/pgstat_database.c | 4 +
src/backend/utils/activity/pgstat_relation.c | 8 ++
src/backend/utils/adt/pgstatfuncs.c | 24 ++++
src/include/access/nbtree.h | 4 +-
src/include/catalog/pg_proc.dat | 32 +++++
src/include/pgstat.h | 43 ++++++
src/include/storage/bufmgr.h | 12 +-
src/test/modules/test_aio/test_aio.c | 4 +-
src/test/regress/expected/rules.out | 40 +++++-
src/test/regress/expected/stats.out | 144 +++++++++++++++++++
src/test/regress/sql/stats.sql | 94 ++++++++++++
60 files changed, 910 insertions(+), 202 deletions(-)
diff --git a/contrib/amcheck/verify_gin.c b/contrib/amcheck/verify_gin.c
index 318fe33051..10d4908234 100644
--- a/contrib/amcheck/verify_gin.c
+++ b/contrib/amcheck/verify_gin.c
@@ -173,7 +173,7 @@ gin_check_posting_tree_parent_keys_consistency(Relation rel, BlockNumber posting
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
- RBM_NORMAL, strategy);
+ RBM_NORMAL, strategy, NULL);
LockBuffer(buffer, GIN_SHARE);
page = (Page) BufferGetPage(buffer);
@@ -439,7 +439,7 @@ gin_check_parent_keys_consistency(Relation rel,
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, stack->blkno,
- RBM_NORMAL, strategy);
+ RBM_NORMAL, strategy, NULL);
LockBuffer(buffer, GIN_SHARE);
page = (Page) BufferGetPage(buffer);
lsn = BufferGetLSNAtomic(buffer);
@@ -732,7 +732,7 @@ gin_refind_parent(Relation rel, BlockNumber parentblkno,
IndexTuple result = NULL;
parentbuf = ReadBufferExtended(rel, MAIN_FORKNUM, parentblkno, RBM_NORMAL,
- strategy);
+ strategy, NULL);
LockBuffer(parentbuf, GIN_SHARE);
parentpage = BufferGetPage(parentbuf);
diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index f11c43a0ed..5cf51513e2 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -1125,7 +1125,7 @@ bt_recheck_sibling_links(BtreeCheckState *state,
/* Couple locks in the usual order for nbtree: Left to right */
lbuf = ReadBufferExtended(state->rel, MAIN_FORKNUM, leftcurrent,
- RBM_NORMAL, state->checkstrategy);
+ RBM_NORMAL, state->checkstrategy, NULL);
LockBuffer(lbuf, BT_READ);
_bt_checkpage(state->rel, lbuf);
page = BufferGetPage(lbuf);
@@ -1149,7 +1149,7 @@ bt_recheck_sibling_links(BtreeCheckState *state,
{
newtargetbuf = ReadBufferExtended(state->rel, MAIN_FORKNUM,
newtargetblock, RBM_NORMAL,
- state->checkstrategy);
+ state->checkstrategy, NULL);
LockBuffer(newtargetbuf, BT_READ);
_bt_checkpage(state->rel, newtargetbuf);
page = BufferGetPage(newtargetbuf);
@@ -3315,7 +3315,7 @@ palloc_btree_page(BtreeCheckState *state, BlockNumber blocknum)
* longer than we must.
*/
buffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, blocknum, RBM_NORMAL,
- state->checkstrategy);
+ state->checkstrategy, NULL);
LockBuffer(buffer, BT_READ);
/*
diff --git a/contrib/bloom/blinsert.c b/contrib/bloom/blinsert.c
index 7866438122..dd7c0b9b10 100644
--- a/contrib/bloom/blinsert.c
+++ b/contrib/bloom/blinsert.c
@@ -204,7 +204,7 @@ blinsert(Relation index, Datum *values, bool *isnull,
* At first, try to insert new tuple to the first page in notFullPage
* array. If successful, we don't need to modify the meta page.
*/
- metaBuffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO);
+ metaBuffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO, NULL);
LockBuffer(metaBuffer, BUFFER_LOCK_SHARE);
metaData = BloomPageGetMeta(BufferGetPage(metaBuffer));
@@ -216,7 +216,7 @@ blinsert(Relation index, Datum *values, bool *isnull,
/* Don't hold metabuffer lock while doing insert */
LockBuffer(metaBuffer, BUFFER_LOCK_UNLOCK);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
state = GenericXLogStart(index);
@@ -283,7 +283,7 @@ blinsert(Relation index, Datum *values, bool *isnull,
blkno = metaData->notFullPage[nStart];
Assert(blkno != InvalidBlockNumber);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = GenericXLogRegisterBuffer(state, buffer, 0);
diff --git a/contrib/bloom/blscan.c b/contrib/bloom/blscan.c
index d072f47fe2..17b920ffd2 100644
--- a/contrib/bloom/blscan.c
+++ b/contrib/bloom/blscan.c
@@ -125,7 +125,7 @@ blgetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
Page page;
buffer = ReadBufferExtended(scan->indexRelation, MAIN_FORKNUM,
- blkno, RBM_NORMAL, bas);
+ blkno, RBM_NORMAL, bas, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 2c0e71eedc..a17fa0b99a 100644
--- a/contrib/bloom/blutils.c
+++ b/contrib/bloom/blutils.c
@@ -188,7 +188,7 @@ initBloomState(BloomState *state, Relation index)
opts = MemoryContextAlloc(index->rd_indexcxt, sizeof(BloomOptions));
- buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO);
+ buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
@@ -367,7 +367,7 @@ BloomNewBuffer(Relation index)
if (blkno == InvalidBlockNumber)
break;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, NULL);
/*
* We have to guard against the possibility that someone else already
@@ -459,7 +459,7 @@ BloomInitMetapage(Relation index, ForkNumber forknum)
* block number 0 (BLOOM_METAPAGE_BLKNO). No need to hold the extension
* lock because there cannot be concurrent inserters yet.
*/
- metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL);
+ metaBuffer = ReadBufferExtended(index, forknum, P_NEW, RBM_NORMAL, NULL, NULL);
LockBuffer(metaBuffer, BUFFER_LOCK_EXCLUSIVE);
Assert(BufferGetBlockNumber(metaBuffer) == BLOOM_METAPAGE_BLKNO);
diff --git a/contrib/bloom/blvacuum.c b/contrib/bloom/blvacuum.c
index 86b15a75f6..ebe769d375 100644
--- a/contrib/bloom/blvacuum.c
+++ b/contrib/bloom/blvacuum.c
@@ -60,7 +60,7 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
vacuum_delay_point(false);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
@@ -139,7 +139,7 @@ blbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
* info could already be out of date at this point, but blinsert() will
* cope if so.
*/
- buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO);
+ buffer = ReadBuffer(index, BLOOM_METAPAGE_BLKNO, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
gxlogState = GenericXLogStart(index);
@@ -190,7 +190,7 @@ blvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
vacuum_delay_point(false);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = (Page) BufferGetPage(buffer);
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 294821231f..f23a219098 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -280,7 +280,7 @@ bt_page_stats_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version)
bt_index_block_validate(rel, blkno);
- buffer = ReadBuffer(rel, blkno);
+ buffer = ReadBuffer(rel, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
/* keep compiler quiet */
@@ -420,7 +420,7 @@ bt_multi_page_stats(PG_FUNCTION_ARGS)
BTPageStat stat;
TupleDesc tupleDesc;
- buffer = ReadBuffer(rel, uargs->blkno);
+ buffer = ReadBuffer(rel, uargs->blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
/* keep compiler quiet */
@@ -649,7 +649,7 @@ bt_page_items_internal(PG_FUNCTION_ARGS, enum pageinspect_version ext_version)
bt_index_block_validate(rel, blkno);
- buffer = ReadBuffer(rel, blkno);
+ buffer = ReadBuffer(rel, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
/*
@@ -873,7 +873,7 @@ bt_metap(PG_FUNCTION_ARGS)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot access temporary tables of other sessions")));
- buffer = ReadBuffer(rel, 0);
+ buffer = ReadBuffer(rel, 0, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
diff --git a/contrib/pageinspect/rawpage.c b/contrib/pageinspect/rawpage.c
index 0d57123aa2..5bfb5072b3 100644
--- a/contrib/pageinspect/rawpage.c
+++ b/contrib/pageinspect/rawpage.c
@@ -188,7 +188,7 @@ get_raw_page_internal(text *relname, ForkNumber forknum, BlockNumber blkno)
/* Take a verbatim copy of the page */
- buf = ReadBufferExtended(rel, forknum, blkno, RBM_NORMAL, NULL);
+ buf = ReadBufferExtended(rel, forknum, blkno, RBM_NORMAL, NULL, NULL);
LockBuffer(buf, BUFFER_LOCK_SHARE);
memcpy(raw_page_data, BufferGetPage(buf), BLCKSZ);
diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c
index 3e86283beb..255f73460d 100644
--- a/contrib/pg_surgery/heap_surgery.c
+++ b/contrib/pg_surgery/heap_surgery.c
@@ -175,7 +175,7 @@ heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt)
continue;
}
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
LockBufferForCleanup(buf);
page = BufferGetPage(buf);
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index d79ef35006..45d827d3ba 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -154,7 +154,7 @@ pg_visibility(PG_FUNCTION_ARGS)
/* Here we have to explicitly check rel size ... */
if (blkno < RelationGetNumberOfBlocks(rel))
{
- buffer = ReadBuffer(rel, blkno);
+ buffer = ReadBuffer(rel, blkno, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index a59ff4e9d4..46efc49522 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -94,7 +94,7 @@ statapprox_heap(Relation rel, output_type *stat)
}
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno,
- RBM_NORMAL, bstrategy);
+ RBM_NORMAL, bstrategy, NULL);
LockBuffer(buf, BUFFER_LOCK_SHARE);
diff --git a/contrib/pgstattuple/pgstatindex.c b/contrib/pgstattuple/pgstatindex.c
index 4b9d76ec4e..2f6b4bbe8c 100644
--- a/contrib/pgstattuple/pgstatindex.c
+++ b/contrib/pgstattuple/pgstatindex.c
@@ -250,7 +250,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
* Read metapage
*/
{
- Buffer buffer = ReadBufferExtended(rel, MAIN_FORKNUM, 0, RBM_NORMAL, bstrategy);
+ Buffer buffer = ReadBufferExtended(rel, MAIN_FORKNUM, 0, RBM_NORMAL, bstrategy, NULL);
Page page = BufferGetPage(buffer);
BTMetaPageData *metad = BTPageGetMeta(page);
@@ -286,7 +286,7 @@ pgstatindex_impl(Relation rel, FunctionCallInfo fcinfo)
CHECK_FOR_INTERRUPTS();
/* Read and lock buffer */
- buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
@@ -542,7 +542,7 @@ pgstatginindex_internal(Oid relid, FunctionCallInfo fcinfo)
/*
* Read metapage
*/
- buffer = ReadBuffer(rel, GIN_METAPAGE_BLKNO);
+ buffer = ReadBuffer(rel, GIN_METAPAGE_BLKNO, NULL);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
metadata = GinPageGetMeta(page);
@@ -645,7 +645,7 @@ pgstathashindex(PG_FUNCTION_ARGS)
CHECK_FOR_INTERRUPTS();
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- bstrategy);
+ bstrategy, NULL);
LockBuffer(buf, BUFFER_LOCK_SHARE);
page = (Page) BufferGetPage(buf);
diff --git a/contrib/pgstattuple/pgstattuple.c b/contrib/pgstattuple/pgstattuple.c
index 0d9c2b0b65..27645938c4 100644
--- a/contrib/pgstattuple/pgstattuple.c
+++ b/contrib/pgstattuple/pgstattuple.c
@@ -376,7 +376,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
- RBM_NORMAL, hscan->rs_strategy);
+ RBM_NORMAL, hscan->rs_strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer));
UnlockReleaseBuffer(buffer);
@@ -389,7 +389,7 @@ pgstat_heap(Relation rel, FunctionCallInfo fcinfo)
CHECK_FOR_INTERRUPTS();
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, block,
- RBM_NORMAL, hscan->rs_strategy);
+ RBM_NORMAL, hscan->rs_strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
stat.free_space += PageGetExactFreeSpace((Page) BufferGetPage(buffer));
UnlockReleaseBuffer(buffer);
@@ -414,7 +414,7 @@ pgstat_btree_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
Buffer buf;
Page page;
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy, NULL);
LockBuffer(buf, BT_READ);
page = BufferGetPage(buf);
@@ -500,7 +500,7 @@ pgstat_gist_page(pgstattuple_type *stat, Relation rel, BlockNumber blkno,
Buffer buf;
Page page;
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy, NULL);
LockBuffer(buf, GIST_SHARE);
gistcheckpage(rel, buf);
page = BufferGetPage(buf);
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c421d89edf..500258b9d9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3404,6 +3404,44 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>metadata_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks read in this database
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>metadata_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times metadata (non-leaf) index disk blocks were found already in the buffer
+ cache
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>record_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record (leaf) index disk blocks read in this database
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>record_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of times record (leaf) index disk blocks were found already in the buffer
+ cache
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>blks_hit</structfield> <type>bigint</type>
@@ -4366,6 +4404,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks read from all indexes on this table
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index block hits in all indexes on this table
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record (leaf) index disk blocks read from all indexes on this table
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record (leaf) index block hits in all indexes on this table
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>idx_blks_hit</structfield> <type>bigint</type>
@@ -4502,6 +4576,42 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks read from this index
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index buffer hits in this index
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_read</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record disk (leaf) index blocks read from this index
+ </para></entry>
+ </row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_record_blks_hit</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of record (leaf) index buffer hits in this index
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>idx_blks_hit</structfield> <type>bigint</type>
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 01e1db7f85..7cf33abfad 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1650,8 +1650,10 @@ brinGetStats(Relation index, BrinStatsData *stats)
Buffer metabuffer;
Page metapage;
BrinMetaPageData *metadata;
+ bool hit = false;
- metabuffer = ReadBuffer(index, BRIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, BRIN_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = (BrinMetaPageData *) PageGetContents(metapage);
@@ -2186,7 +2188,7 @@ brin_vacuum_scan(Relation idxrel, BufferAccessStrategy strategy)
CHECK_FOR_INTERRUPTS();
buf = ReadBufferExtended(idxrel, MAIN_FORKNUM, blkno,
- RBM_NORMAL, strategy);
+ RBM_NORMAL, strategy, NULL);
brin_page_cleanup(idxrel, buf);
diff --git a/src/backend/access/brin/brin_pageops.c b/src/backend/access/brin/brin_pageops.c
index 6d8dd1512d..392bd54b3f 100644
--- a/src/backend/access/brin/brin_pageops.c
+++ b/src/backend/access/brin/brin_pageops.c
@@ -15,6 +15,7 @@
#include "access/brin_revmap.h"
#include "access/brin_xlog.h"
#include "access/xloginsert.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "storage/freespace.h"
@@ -694,6 +695,7 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
BlockNumber newblk;
Page page;
Size freespace;
+ bool hit = false;
/* callers must have checked */
Assert(itemsz <= BrinMaxItemSize);
@@ -739,7 +741,8 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
LockRelationForExtension(irel, ExclusiveLock);
extensionLockHeld = true;
}
- buf = ReadBuffer(irel, P_NEW);
+ buf = ReadBuffer(irel, P_NEW, &hit);
+ pgstat_count_record_index_buffer(irel, hit);
newblk = BufferGetBlockNumber(buf);
*extended = true;
@@ -756,7 +759,8 @@ brin_getinsertbuffer(Relation irel, Buffer oldbuf, Size itemsz,
}
else
{
- buf = ReadBuffer(irel, newblk);
+ buf = ReadBuffer(irel, newblk, &hit);
+ pgstat_count_record_index_buffer(irel, hit);
}
/*
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 4e380ecc71..bb29738e2b 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -27,6 +27,7 @@
#include "access/brin_xlog.h"
#include "access/rmgr.h"
#include "access/xloginsert.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
@@ -73,8 +74,10 @@ brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
Buffer meta;
BrinMetaPageData *metadata;
Page page;
+ bool hit = false;
- meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
+ meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(idxrel, hit);
LockBuffer(meta, BUFFER_LOCK_SHARE);
page = BufferGetPage(meta);
metadata = (BrinMetaPageData *) PageGetContents(page);
@@ -203,6 +206,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
ItemId lp;
BrinTuple *tup;
ItemPointerData previptr;
+ bool hit = false;
/* normalize the heap block number to be the first page in the range */
heapBlk = (heapBlk / revmap->rm_pagesPerRange) * revmap->rm_pagesPerRange;
@@ -231,7 +235,8 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
ReleaseBuffer(revmap->rm_currBuf);
Assert(mapBlk != InvalidBlockNumber);
- revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk, &hit);
+ pgstat_count_metadata_index_buffer(revmap->rm_irel, hit);
}
LockBuffer(revmap->rm_currBuf, BUFFER_LOCK_SHARE);
@@ -269,7 +274,8 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
{
if (BufferIsValid(*buf))
ReleaseBuffer(*buf);
- *buf = ReadBuffer(idxRel, blk);
+ *buf = ReadBuffer(idxRel, blk, &hit);
+ pgstat_count_metadata_index_buffer(idxRel, hit);
}
LockBuffer(*buf, mode);
page = BufferGetPage(*buf);
@@ -335,6 +341,7 @@ brinRevmapDesummarizeRange(Relation idxrel, BlockNumber heapBlk)
OffsetNumber revmapOffset;
OffsetNumber regOffset;
ItemId lp;
+ bool hit = false;
revmap = brinRevmapInitialize(idxrel, &pagesPerRange);
@@ -363,7 +370,8 @@ brinRevmapDesummarizeRange(Relation idxrel, BlockNumber heapBlk)
return true;
}
- regBuf = ReadBuffer(idxrel, ItemPointerGetBlockNumber(iptr));
+ regBuf = ReadBuffer(idxrel, ItemPointerGetBlockNumber(iptr), &hit);
+ pgstat_count_record_index_buffer(idxrel, hit);
LockBuffer(regBuf, BUFFER_LOCK_EXCLUSIVE);
regPg = BufferGetPage(regBuf);
@@ -463,6 +471,7 @@ static Buffer
revmap_get_buffer(BrinRevmap *revmap, BlockNumber heapBlk)
{
BlockNumber mapBlk;
+ bool hit = false;
/* Translate the heap block number to physical index location. */
mapBlk = revmap_get_blkno(revmap, heapBlk);
@@ -485,7 +494,8 @@ revmap_get_buffer(BrinRevmap *revmap, BlockNumber heapBlk)
if (revmap->rm_currBuf != InvalidBuffer)
ReleaseBuffer(revmap->rm_currBuf);
- revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk, &hit);
+ pgstat_count_metadata_index_buffer(revmap->rm_irel, hit);
}
return revmap->rm_currBuf;
@@ -528,6 +538,7 @@ revmap_physical_extend(BrinRevmap *revmap)
BlockNumber mapBlk;
BlockNumber nblocks;
Relation irel = revmap->rm_irel;
+ bool hit = false;
/*
* Lock the metapage. This locks out concurrent extensions of the revmap,
@@ -553,7 +564,8 @@ revmap_physical_extend(BrinRevmap *revmap)
nblocks = RelationGetNumberOfBlocks(irel);
if (mapBlk < nblocks)
{
- buf = ReadBuffer(irel, mapBlk);
+ buf = ReadBuffer(irel, mapBlk, &hit);
+ pgstat_count_metadata_index_buffer(irel, hit);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(buf);
}
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 26a0bdc206..b35657ee8e 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -18,6 +18,7 @@
#include "access/ginxlog.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/predicate.h"
#include "utils/injection_point.h"
#include "utils/memutils.h"
@@ -84,10 +85,12 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
bool rootConflictCheck)
{
GinBtreeStack *stack;
+ bool hit = false;
stack = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
stack->blkno = btree->rootBlkno;
- stack->buffer = ReadBuffer(btree->index, btree->rootBlkno);
+ stack->buffer = ReadBuffer(btree->index, btree->rootBlkno, &hit);
+ pgstat_count_metadata_index_buffer(btree->index, hit);
stack->parent = NULL;
stack->predictNumber = 1;
@@ -148,7 +151,9 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
{
/* in search mode we may forget path to leaf */
stack->blkno = child;
- stack->buffer = ReleaseAndReadBuffer(stack->buffer, btree->index, stack->blkno);
+ stack->buffer = ReleaseAndReadBuffer(stack->buffer, btree->index, stack->blkno,
+ &hit);
+ pgstat_count_index_buffer(btree->index, GinPageIsLeaf(BufferGetPage(stack->buffer)), hit);
}
else
{
@@ -157,7 +162,8 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
ptr->parent = stack;
stack = ptr;
stack->blkno = child;
- stack->buffer = ReadBuffer(btree->index, stack->blkno);
+ stack->buffer = ReadBuffer(btree->index, stack->blkno, &hit);
+ pgstat_count_index_buffer(btree->index, GinPageIsLeaf(BufferGetPage(stack->buffer)), hit);
stack->predictNumber = 1;
}
}
@@ -177,12 +183,14 @@ Buffer
ginStepRight(Buffer buffer, Relation index, int lockmode)
{
Buffer nextbuffer;
+ bool hit = false;
Page page = BufferGetPage(buffer);
bool isLeaf = GinPageIsLeaf(page);
bool isData = GinPageIsData(page);
BlockNumber blkno = GinPageGetOpaque(page)->rightlink;
- nextbuffer = ReadBuffer(index, blkno);
+ nextbuffer = ReadBuffer(index, blkno, &hit);
+ pgstat_count_index_buffer(index, isLeaf, hit);
LockBuffer(nextbuffer, lockmode);
UnlockReleaseBuffer(buffer);
@@ -224,6 +232,7 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
OffsetNumber offset;
GinBtreeStack *root;
GinBtreeStack *ptr;
+ bool hit = false;
/*
* Unwind the stack all the way up to the root, leaving only the root
@@ -314,7 +323,8 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
/* Descend down to next level */
blkno = leftmostBlkno;
- buffer = ReadBuffer(btree->index, blkno);
+ buffer = ReadBuffer(btree->index, blkno, &hit);
+ pgstat_count_index_buffer(btree->index, GinPageIsLeaf(BufferGetPage(buffer)), hit);
}
}
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index a6d88572cc..bd10910222 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -25,6 +25,7 @@
#include "catalog/pg_am.h"
#include "commands/vacuum.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "port/pg_bitutils.h"
#include "postmaster/autovacuum.h"
#include "storage/indexfsm.h"
@@ -229,6 +230,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
bool needCleanup = false;
int cleanupSize;
bool needWal;
+ bool hit = false;
if (collector->ntuples == 0)
return;
@@ -239,7 +241,8 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
data.ntuples = 0;
data.newRightlink = data.prevTail = InvalidBlockNumber;
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
metapage = BufferGetPage(metabuffer);
/*
@@ -319,7 +322,8 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
data.prevTail = metadata->tail;
data.newRightlink = sublist.head;
- buffer = ReadBuffer(index, metadata->tail);
+ buffer = ReadBuffer(index, metadata->tail, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
@@ -358,7 +362,8 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
CheckForSerializableConflictIn(index, NULL, GIN_METAPAGE_BLKNO);
- buffer = ReadBuffer(index, metadata->tail);
+ buffer = ReadBuffer(index, metadata->tail, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
@@ -557,6 +562,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
Page metapage;
GinMetaPageData *metadata;
BlockNumber blknoToDelete;
+ bool hit = false;
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -575,7 +581,8 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
while (data.ndeleted < GIN_NDELETE_AT_ONCE && blknoToDelete != newHead)
{
freespace[data.ndeleted] = blknoToDelete;
- buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete);
+ buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(buffers[data.ndeleted], GIN_EXCLUSIVE);
page = BufferGetPage(buffers[data.ndeleted]);
@@ -796,6 +803,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
bool cleanupFinish = false;
bool fsm_vac = false;
int workMemory;
+ bool hit = false;
/*
* We would like to prevent concurrent cleanup process. For that we will
@@ -827,7 +835,8 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
workMemory = work_mem;
}
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -850,7 +859,8 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
* Read and lock head of pending list
*/
blkno = metadata->head;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, &hit);
+ pgstat_count_record_index_buffer(index, hit);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
@@ -1003,7 +1013,8 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
* Read next page in pending list
*/
vacuum_delay_point(false);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, &hit);
+ pgstat_count_record_index_buffer(index, hit);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
}
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index f29ccd3c2d..fce1162e6a 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -18,6 +18,7 @@
#include "access/relscan.h"
#include "common/pg_prng.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/predicate.h"
#include "utils/datum.h"
#include "utils/memutils.h"
@@ -1467,6 +1468,7 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
OffsetNumber maxoff;
Page page;
IndexTuple itup;
+ bool hit = false;
ItemPointerSetInvalid(&pos->item);
for (;;)
@@ -1493,7 +1495,8 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
* current page. So, we lock next page before releasing the
* current one
*/
- Buffer tmpbuf = ReadBuffer(scan->indexRelation, blkno);
+ Buffer tmpbuf = ReadBuffer(scan->indexRelation, blkno, &hit);
+ pgstat_count_record_index_buffer(scan->indexRelation, hit);
LockBuffer(tmpbuf, GIN_SHARE);
UnlockReleaseBuffer(pos->pendingBuffer);
@@ -1840,10 +1843,12 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
match;
int i;
pendingPosition pos;
- Buffer metabuffer = ReadBuffer(scan->indexRelation, GIN_METAPAGE_BLKNO);
+ bool hit = false;
+ Buffer metabuffer = ReadBuffer(scan->indexRelation, GIN_METAPAGE_BLKNO, &hit);
Page page;
BlockNumber blkno;
+ pgstat_count_metadata_index_buffer(scan->indexRelation, hit);
*ntids = 0;
/*
@@ -1867,7 +1872,8 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
return;
}
- pos.pendingBuffer = ReadBuffer(scan->indexRelation, blkno);
+ pos.pendingBuffer = ReadBuffer(scan->indexRelation, blkno, &hit);
+ pgstat_count_record_index_buffer(scan->indexRelation, hit);
LockBuffer(pos.pendingBuffer, GIN_SHARE);
pos.firstOffset = FirstOffsetNumber;
UnlockReleaseBuffer(metabuffer);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 78f7b7a249..0d42f4ca87 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -22,6 +22,7 @@
#include "catalog/pg_type.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/indexfsm.h"
#include "utils/builtins.h"
@@ -305,6 +306,7 @@ Buffer
GinNewBuffer(Relation index)
{
Buffer buffer;
+ bool hit = false;
/* First, try to get a page from FSM */
for (;;)
@@ -314,7 +316,8 @@ GinNewBuffer(Relation index)
if (blkno == InvalidBlockNumber)
break;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, &hit);
+ pgstat_count_index_buffer(index, GinPageIsLeaf(BufferGetPage(buffer)), hit);
/*
* We have to guard against the possibility that someone else already
@@ -630,8 +633,10 @@ ginGetStats(Relation index, GinStatsData *stats)
Buffer metabuffer;
Page metapage;
GinMetaPageData *metadata;
+ bool hit = false;
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -657,8 +662,10 @@ ginUpdateStats(Relation index, const GinStatsData *stats, bool is_build)
Buffer metabuffer;
Page metapage;
GinMetaPageData *metadata;
+ bool hit = false;
- metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(metabuffer, GIN_EXCLUSIVE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
diff --git a/src/backend/access/gin/ginvacuum.c b/src/backend/access/gin/ginvacuum.c
index fbbe3a6dd7..250bffa49a 100644
--- a/src/backend/access/gin/ginvacuum.c
+++ b/src/backend/access/gin/ginvacuum.c
@@ -143,11 +143,11 @@ ginDeletePage(GinVacuumState *gvs, BlockNumber deleteBlkno, BlockNumber leftBlkn
* deletable, parent and left pages.
*/
lBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, leftBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
dBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, deleteBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
pBuffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, parentBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
page = BufferGetPage(dBuffer);
rightlink = GinPageGetOpaque(page)->rightlink;
@@ -270,7 +270,7 @@ ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
}
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
if (!isRoot)
LockBuffer(buffer, GIN_EXCLUSIVE);
@@ -355,7 +355,7 @@ ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno)
PostingItem *pitem;
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
@@ -396,7 +396,7 @@ ginVacuumPostingTreeLeaves(GinVacuumState *gvs, BlockNumber blkno)
break;
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
}
@@ -419,7 +419,7 @@ ginVacuumPostingTree(GinVacuumState *gvs, BlockNumber rootBlkno)
*tmp;
buffer = ReadBufferExtended(gvs->index, MAIN_FORKNUM, rootBlkno,
- RBM_NORMAL, gvs->strategy);
+ RBM_NORMAL, gvs->strategy, NULL);
/*
* Lock posting tree root for cleanup to ensure there are no
@@ -598,7 +598,7 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
gvs.result = stats;
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
/* find leaf page */
for (;;)
@@ -631,7 +631,7 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
UnlockReleaseBuffer(buffer);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
}
/* right now we found leftmost page in entry's BTree */
@@ -674,7 +674,7 @@ ginbulkdelete(IndexVacuumInfo *info, IndexBulkDeleteResult *stats,
break;
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, GIN_EXCLUSIVE);
}
@@ -751,7 +751,7 @@ ginvacuumcleanup(IndexVacuumInfo *info, IndexBulkDeleteResult *stats)
vacuum_delay_point(false);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, GIN_SHARE);
page = (Page) BufferGetPage(buffer);
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 7b24380c97..77df96a280 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -20,6 +20,7 @@
#include "catalog/pg_collation.h"
#include "commands/vacuum.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "nodes/execnodes.h"
#include "storage/predicate.h"
#include "utils/fmgrprotos.h"
@@ -645,6 +646,7 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
GISTInsertStack *stack;
GISTInsertState state;
bool xlocked = false;
+ bool hit = false;
memset(&state, 0, sizeof(GISTInsertState));
state.freespace = freespace;
@@ -683,8 +685,10 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
state.stack = stack = stack->parent;
}
- if (XLogRecPtrIsInvalid(stack->lsn))
- stack->buffer = ReadBuffer(state.r, stack->blkno);
+ if (XLogRecPtrIsInvalid(stack->lsn)) {
+ stack->buffer = ReadBuffer(state.r, stack->blkno, &hit);
+ pgstat_count_index_buffer(state.r, !GistPageIsLeaf(BufferGetPage(stack->buffer)), hit);
+ }
/*
* Be optimistic and grab shared lock first. Swap it for an exclusive
@@ -923,6 +927,7 @@ gistFindPath(Relation r, BlockNumber child, OffsetNumber *downlinkoffnum)
GISTInsertStack *top,
*ptr;
BlockNumber blkno;
+ bool hit = false;
top = (GISTInsertStack *) palloc0(sizeof(GISTInsertStack));
top->blkno = GIST_ROOT_BLKNO;
@@ -935,10 +940,11 @@ gistFindPath(Relation r, BlockNumber child, OffsetNumber *downlinkoffnum)
top = linitial(fifo);
fifo = list_delete_first(fifo);
- buffer = ReadBuffer(r, top->blkno);
+ buffer = ReadBuffer(r, top->blkno, &hit);
LockBuffer(buffer, GIST_SHARE);
gistcheckpage(r, buffer);
page = (Page) BufferGetPage(buffer);
+ pgstat_count_index_buffer(r, !GistPageIsLeaf(page), hit);
if (GistPageIsLeaf(page))
{
@@ -1031,6 +1037,7 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
IndexTuple idxtuple;
OffsetNumber maxoff;
GISTInsertStack *ptr;
+ bool hit = false;
gistcheckpage(r, parent->buffer);
parent->page = (Page) BufferGetPage(parent->buffer);
@@ -1095,10 +1102,11 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
*/
break;
}
- parent->buffer = ReadBuffer(r, parent->blkno);
+ parent->buffer = ReadBuffer(r, parent->blkno, &hit);
LockBuffer(parent->buffer, GIST_EXCLUSIVE);
gistcheckpage(r, parent->buffer);
parent->page = (Page) BufferGetPage(parent->buffer);
+ pgstat_count_index_buffer(r, !GistPageIsLeaf(parent->page), hit);
}
/*
@@ -1120,8 +1128,9 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
/* note we don't lock them or gistcheckpage them here! */
while (ptr)
{
- ptr->buffer = ReadBuffer(r, ptr->blkno);
+ ptr->buffer = ReadBuffer(r, ptr->blkno, &hit);
ptr->page = (Page) BufferGetPage(ptr->buffer);
+ pgstat_count_index_buffer(r, !GistPageIsLeaf(ptr->page), hit);
ptr = ptr->parent;
}
@@ -1203,6 +1212,7 @@ gistfixsplit(GISTInsertState *state, GISTSTATE *giststate)
Buffer buf;
Page page;
List *splitinfo = NIL;
+ bool hit = false;
ereport(LOG,
(errmsg("fixing incomplete split in index \"%s\", block %u",
@@ -1235,7 +1245,8 @@ gistfixsplit(GISTInsertState *state, GISTSTATE *giststate)
if (GistFollowRight(page))
{
/* lock next page */
- buf = ReadBuffer(state->r, GistPageGetOpaque(page)->rightlink);
+ buf = ReadBuffer(state->r, GistPageGetOpaque(page)->rightlink, &hit);
+ pgstat_count_index_buffer(state->r, !GistPageIsLeaf(BufferGetPage(buf)), hit);
LockBuffer(buf, GIST_EXCLUSIVE);
}
else
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 9e707167d9..8c7fc6e7d5 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -41,6 +41,7 @@
#include "miscadmin.h"
#include "nodes/execnodes.h"
#include "optimizer/optimizer.h"
+#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/bulk_write.h"
@@ -935,6 +936,7 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
int level;
OffsetNumber downlinkoffnum = InvalidOffsetNumber;
BlockNumber parentblkno = InvalidBlockNumber;
+ bool hit = false;
CHECK_FOR_INTERRUPTS();
@@ -966,10 +968,12 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
* descend down to.
*/
- buffer = ReadBuffer(indexrel, blkno);
+ buffer = ReadBuffer(indexrel, blkno, &hit);
LockBuffer(buffer, GIST_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
+ pgstat_count_index_buffer(indexrel, !GistPageIsLeaf(page), hit);
+
childoffnum = gistchoose(indexrel, page, itup, giststate);
iid = PageGetItemId(page, childoffnum);
idxtuple = (IndexTuple) PageGetItem(page, iid);
@@ -1029,7 +1033,8 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
* We've reached a leaf page. Place the tuple here.
*/
Assert(level == 0);
- buffer = ReadBuffer(indexrel, blkno);
+ buffer = ReadBuffer(indexrel, blkno, &hit);
+ pgstat_count_record_index_buffer(indexrel, hit);
LockBuffer(buffer, GIST_EXCLUSIVE);
gistbufferinginserttuples(buildstate, buffer, level,
&itup, 1, InvalidOffsetNumber,
@@ -1061,6 +1066,7 @@ gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer, int level,
List *splitinfo;
bool is_split;
BlockNumber placed_to_blk = InvalidBlockNumber;
+ bool hit = false;
is_split = gistplacetopage(buildstate->indexrel,
buildstate->freespace,
@@ -1102,7 +1108,8 @@ gistbufferinginserttuples(GISTBuildState *buildstate, Buffer buffer, int level,
ItemId iid = PageGetItemId(page, off);
IndexTuple idxtuple = (IndexTuple) PageGetItem(page, iid);
BlockNumber childblkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
- Buffer childbuf = ReadBuffer(buildstate->indexrel, childblkno);
+ Buffer childbuf = ReadBuffer(buildstate->indexrel, childblkno, &hit);
+ pgstat_count_record_index_buffer(buildstate->indexrel, hit);
LockBuffer(childbuf, GIST_SHARE);
gistMemorizeAllDownlinks(buildstate, childbuf);
@@ -1232,6 +1239,7 @@ gistBufferingFindCorrectParent(GISTBuildState *buildstate,
Page page;
OffsetNumber maxoff;
OffsetNumber off;
+ bool hit = false;
if (level > 0)
parent = gistGetParent(buildstate, childblkno);
@@ -1246,8 +1254,9 @@ gistBufferingFindCorrectParent(GISTBuildState *buildstate,
parent = *parentblkno;
}
- buffer = ReadBuffer(buildstate->indexrel, parent);
+ buffer = ReadBuffer(buildstate->indexrel, parent, &hit);
page = BufferGetPage(buffer);
+ pgstat_count_index_buffer(buildstate->indexrel, !GistPageIsLeaf(page), hit);
LockBuffer(buffer, GIST_EXCLUSIVE);
gistcheckpage(buildstate->indexrel, buffer);
maxoff = PageGetMaxOffsetNumber(page);
@@ -1440,8 +1449,9 @@ gistGetMaxLevel(Relation index)
Buffer buffer;
Page page;
IndexTuple itup;
+ bool hit = false;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, &hit);
/*
* There's no concurrent access during index build, so locking is just
@@ -1454,9 +1464,12 @@ gistGetMaxLevel(Relation index)
{
/* We hit the bottom, so we're done. */
UnlockReleaseBuffer(buffer);
+ pgstat_count_record_index_buffer(index, hit);
break;
}
+ pgstat_count_metadata_index_buffer(index, hit);
+
/*
* Pick the first downlink on the page, and follow it. It doesn't
* matter which downlink we choose, the tree has the same depth
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 387d997234..5f6803e621 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -44,18 +44,20 @@ gistkillitems(IndexScanDesc scan)
ItemId iid;
int i;
bool killedsomething = false;
+ bool hit = false;
Assert(so->curBlkno != InvalidBlockNumber);
Assert(!XLogRecPtrIsInvalid(so->curPageLSN));
Assert(so->killedItems != NULL);
- buffer = ReadBuffer(scan->indexRelation, so->curBlkno);
+ buffer = ReadBuffer(scan->indexRelation, so->curBlkno, &hit);
if (!BufferIsValid(buffer))
return;
LockBuffer(buffer, GIST_SHARE);
gistcheckpage(scan->indexRelation, buffer);
page = BufferGetPage(buffer);
+ pgstat_count_index_buffer(scan->indexRelation, !GistPageIsLeaf(page), hit);
/*
* If page LSN differs it means that the page was modified since the last
@@ -337,14 +339,16 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
OffsetNumber maxoff;
OffsetNumber i;
MemoryContext oldcxt;
+ bool hit = false;
Assert(!GISTSearchItemIsHeap(*pageItem));
- buffer = ReadBuffer(scan->indexRelation, pageItem->blkno);
+ buffer = ReadBuffer(scan->indexRelation, pageItem->blkno, &hit);
LockBuffer(buffer, GIST_SHARE);
PredicateLockPage(r, BufferGetBlockNumber(buffer), scan->xs_snapshot);
gistcheckpage(scan->indexRelation, buffer);
page = BufferGetPage(buffer);
+ pgstat_count_index_buffer(scan->indexRelation, !GistPageIsLeaf(page), hit);
opaque = GistPageGetOpaque(page);
/*
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index a6b701943d..5bbc29a0ee 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -19,6 +19,7 @@
#include "access/htup_details.h"
#include "access/reloptions.h"
#include "common/pg_prng.h"
+#include "pgstat.h"
#include "storage/indexfsm.h"
#include "utils/float.h"
#include "utils/fmgrprotos.h"
@@ -824,6 +825,7 @@ Buffer
gistNewBuffer(Relation r, Relation heaprel)
{
Buffer buffer;
+ bool hit = false;
/* First, try to get a page from FSM */
for (;;)
@@ -833,7 +835,7 @@ gistNewBuffer(Relation r, Relation heaprel)
if (blkno == InvalidBlockNumber)
break; /* nothing left in FSM */
- buffer = ReadBuffer(r, blkno);
+ buffer = ReadBuffer(r, blkno, &hit);
/*
* We have to guard against the possibility that someone else already
@@ -842,6 +844,7 @@ gistNewBuffer(Relation r, Relation heaprel)
if (ConditionalLockBuffer(buffer))
{
Page page = BufferGetPage(buffer);
+ pgstat_count_index_buffer(r, !GistPageIsLeaf(page), hit);
/*
* If the page was never initialized, it's OK to use.
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 6a359c98c6..9cfcbcc3f3 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -491,7 +491,7 @@ restart:
vacuum_delay_point(false);
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- info->strategy);
+ info->strategy, NULL);
goto restart;
}
}
@@ -524,7 +524,7 @@ gistvacuum_delete_empty_pages(IndexVacuumInfo *info, GistVacState *vstate)
int deleted;
buffer = ReadBufferExtended(rel, MAIN_FORKNUM, (BlockNumber) blkno,
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(buffer, GIST_SHARE);
page = (Page) BufferGetPage(buffer);
@@ -590,7 +590,7 @@ gistvacuum_delete_empty_pages(IndexVacuumInfo *info, GistVacState *vstate)
break;
leafbuf = ReadBufferExtended(rel, MAIN_FORKNUM, leafs_to_delete[i],
- RBM_NORMAL, info->strategy);
+ RBM_NORMAL, info->strategy, NULL);
LockBuffer(leafbuf, GIST_EXCLUSIVE);
gistcheckpage(rel, leafbuf);
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 53061c819f..d991b5d5f7 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -505,6 +505,7 @@ loop_top:
HashPageOpaque bucket_opaque;
Page page;
bool split_cleanup = false;
+ bool hit = false;
/* Get address of bucket's start page */
bucket_blkno = BUCKET_TO_BLKNO(cachedmetap, cur_bucket);
@@ -515,7 +516,9 @@ loop_top:
* We need to acquire a cleanup lock on the primary bucket page to out
* wait concurrent scans before deleting the dead tuples.
*/
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, info->strategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, info->strategy,
+ &hit);
+ pgstat_count_record_index_buffer(rel, hit);
LockBufferForCleanup(buf);
_hash_checkpage(rel, buf, LH_BUCKET_PAGE);
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index b8e5bd005e..173b406e0c 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -32,6 +32,7 @@
#include "access/hash_xlog.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "port/pg_bitutils.h"
#include "storage/predicate.h"
#include "storage/smgr.h"
@@ -70,11 +71,13 @@ Buffer
_hash_getbuf(Relation rel, BlockNumber blkno, int access, int flags)
{
Buffer buf;
+ bool hit = false;
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, &hit);
+ pgstat_count_index_buffer(rel, flags == LH_META_PAGE, hit);
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
@@ -96,11 +99,13 @@ Buffer
_hash_getbuf_with_condlock_cleanup(Relation rel, BlockNumber blkno, int flags)
{
Buffer buf;
+ bool hit = false;
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, &hit);
+ pgstat_count_index_buffer(rel, flags == LH_META_PAGE, hit);
if (!ConditionalLockBufferForCleanup(buf))
{
@@ -135,12 +140,14 @@ Buffer
_hash_getinitbuf(Relation rel, BlockNumber blkno)
{
Buffer buf;
+ bool hit = false;
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_ZERO_AND_LOCK,
- NULL);
+ NULL, &hit);
+ pgstat_count_record_index_buffer(rel, hit);
/* ref count and lock type are correct */
@@ -199,6 +206,7 @@ _hash_getnewbuf(Relation rel, BlockNumber blkno, ForkNumber forkNum)
{
BlockNumber nblocks = RelationGetNumberOfBlocksInFork(rel, forkNum);
Buffer buf;
+ bool hit = false;
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
@@ -218,7 +226,8 @@ _hash_getnewbuf(Relation rel, BlockNumber blkno, ForkNumber forkNum)
else
{
buf = ReadBufferExtended(rel, forkNum, blkno, RBM_ZERO_AND_LOCK,
- NULL);
+ NULL, &hit);
+ pgstat_count_record_index_buffer(rel, hit);
}
/* ref count and lock type are correct */
@@ -241,11 +250,13 @@ _hash_getbuf_with_strategy(Relation rel, BlockNumber blkno,
BufferAccessStrategy bstrategy)
{
Buffer buf;
+ bool hit = false;
if (blkno == P_NEW)
elog(ERROR, "hash AM does not use P_NEW");
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy, &hit);
+ pgstat_count_index_buffer(rel, flags == LH_META_PAGE, hit);
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ed2e302179..162620c029 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1586,7 +1586,7 @@ heap_fetch(Relation relation,
/*
* Fetch and pin the appropriate page of the relation.
*/
- buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
+ buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid), NULL);
/*
* Need share lock on buffer to examine tuple commit status.
@@ -1880,7 +1880,7 @@ heap_get_latest_tid(TableScanDesc sscan,
/*
* Read, pin, and lock the page.
*/
- buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid));
+ buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(&ctid), NULL);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
page = BufferGetPage(buffer);
@@ -2776,7 +2776,7 @@ heap_delete(Relation relation, ItemPointer tid,
errmsg("cannot delete tuples during a parallel operation")));
block = ItemPointerGetBlockNumber(tid);
- buffer = ReadBuffer(relation, block);
+ buffer = ReadBuffer(relation, block, NULL);
page = BufferGetPage(buffer);
/*
@@ -3305,7 +3305,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
block = ItemPointerGetBlockNumber(otid);
INJECTION_POINT("heap_update-before-pin");
- buffer = ReadBuffer(relation, block);
+ buffer = ReadBuffer(relation, block, NULL);
page = BufferGetPage(buffer);
/*
@@ -4561,7 +4561,7 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
bool have_tuple_lock = false;
bool cleared_all_frozen = false;
- *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
+ *buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid), NULL);
block = ItemPointerGetBlockNumber(tid);
/*
@@ -6057,7 +6057,7 @@ heap_finish_speculative(Relation relation, ItemPointer tid)
ItemId lp = NULL;
HeapTupleHeader htup;
- buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid));
+ buffer = ReadBuffer(relation, ItemPointerGetBlockNumber(tid), NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
@@ -6148,7 +6148,7 @@ heap_abort_speculative(Relation relation, ItemPointer tid)
Assert(ItemPointerIsValid(tid));
block = ItemPointerGetBlockNumber(tid);
- buffer = ReadBuffer(relation, block);
+ buffer = ReadBuffer(relation, block, NULL);
page = BufferGetPage(buffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
@@ -8231,7 +8231,7 @@ heap_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate)
UnlockReleaseBuffer(buf);
blkno = ItemPointerGetBlockNumber(htid);
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
nblocksaccessed++;
Assert(!delstate->bottomup ||
nblocksaccessed <= BOTTOMUP_MAX_NBLOCKS);
diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
index ac082fefa7..119528481c 100644
--- a/src/backend/access/heap/heapam_handler.c
+++ b/src/backend/access/heap/heapam_handler.c
@@ -132,7 +132,8 @@ heapam_index_fetch_tuple(struct IndexFetchTableData *scan,
hscan->xs_cbuf = ReleaseAndReadBuffer(hscan->xs_cbuf,
hscan->xs_base.rel,
- ItemPointerGetBlockNumber(tid));
+ ItemPointerGetBlockNumber(tid),
+ NULL);
/*
* Prune page, but only if we weren't already on this page
@@ -2250,7 +2251,8 @@ heapam_scan_sample_next_block(TableScanDesc scan, SampleScanState *scanstate)
/* Read page using selected strategy */
hscan->rs_cbuf = ReadBufferExtended(hscan->rs_base.rs_rd, MAIN_FORKNUM,
- blockno, RBM_NORMAL, hscan->rs_strategy);
+ blockno, RBM_NORMAL, hscan->rs_strategy,
+ NULL);
/* in pagemode, prune the page and determine visible tuple offsets */
if (hscan->rs_base.rs_flags & SO_ALLOW_PAGEMODE)
diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index c482c9d61b..7d5afcc6bc 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -93,7 +93,7 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
/* If not bulk-insert, exactly like ReadBuffer */
if (!bistate)
return ReadBufferExtended(relation, MAIN_FORKNUM, targetBlock,
- mode, NULL);
+ mode, NULL, NULL);
/* If we have the desired block already pinned, re-pin and return it */
if (bistate->current_buf != InvalidBuffer)
@@ -117,7 +117,7 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
/* Perform a read using the buffer strategy */
buffer = ReadBufferExtended(relation, MAIN_FORKNUM, targetBlock,
- mode, bistate->strategy);
+ mode, bistate->strategy, NULL);
/* Save the selected block as target for future inserts */
IncrBufferRefCount(buffer);
@@ -640,7 +640,7 @@ loop:
else if (otherBlock < targetBlock)
{
/* lock other buffer first */
- buffer = ReadBuffer(relation, targetBlock);
+ buffer = ReadBuffer(relation, targetBlock, NULL);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
@@ -649,7 +649,7 @@ loop:
else
{
/* lock target buffer first */
- buffer = ReadBuffer(relation, targetBlock);
+ buffer = ReadBuffer(relation, targetBlock, NULL);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f28326bad0..e935192466 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -3422,7 +3422,7 @@ count_nondeletable_pages(LVRelState *vacrel, bool *lock_waiter_detected)
}
buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- vacrel->bstrategy);
+ vacrel->bstrategy, NULL);
/* In this phase we only need shared access to the buffer */
LockBuffer(buf, BUFFER_LOCK_SHARE);
diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
index 745a04ef26..a31f3098de 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -582,7 +582,7 @@ vm_readbuf(Relation rel, BlockNumber blkno, bool extend)
}
else
buf = ReadBufferExtended(rel, VISIBILITYMAP_FORKNUM, blkno,
- RBM_ZERO_ON_ERROR, NULL);
+ RBM_ZERO_ON_ERROR, NULL, NULL);
/*
* Initializing the page when needed is trickier than it looks, because of
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index aa82cede30..a9c83bb575 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -21,6 +21,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "common/pg_prng.h"
+#include "pgstat.h"
#include "lib/qunique.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
@@ -323,7 +324,7 @@ _bt_search_insert(Relation rel, Relation heaprel, BTInsertState insertstate)
if (RelationGetTargetBlock(rel) != InvalidBlockNumber)
{
/* Simulate a _bt_getbuf() call with conditional locking */
- insertstate->buf = ReadBuffer(rel, RelationGetTargetBlock(rel));
+ insertstate->buf = ReadBuffer(rel, RelationGetTargetBlock(rel), NULL);
if (_bt_conditionallockbuf(rel, insertstate->buf))
{
Page page;
@@ -423,6 +424,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
bool inposting = false;
bool prevalldead = true;
int curposti = 0;
+ bool hit = false;
/* Assume unique until we find a duplicate */
*is_unique = true;
@@ -733,9 +735,10 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel,
{
BlockNumber nblkno = opaque->btpo_next;
- nbuf = _bt_relandgetbuf(rel, nbuf, nblkno, BT_READ);
+ nbuf = _bt_relandgetbuf(rel, nbuf, nblkno, BT_READ, &hit);
page = BufferGetPage(nbuf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
if (!P_IGNORE(opaque))
break;
if (P_RIGHTMOST(opaque))
@@ -1040,7 +1043,9 @@ _bt_stepright(Relation rel, Relation heaprel, BTInsertState insertstate,
rblkno = opaque->btpo_next;
for (;;)
{
- rbuf = _bt_relandgetbuf(rel, rbuf, rblkno, BT_WRITE);
+ bool hit = false;
+ rbuf = _bt_relandgetbuf(rel, rbuf, rblkno, BT_WRITE, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(rbuf);
opaque = BTPageGetOpaque(page);
@@ -1256,10 +1261,13 @@ _bt_insertonpg(Relation rel,
*/
if (unlikely(split_only_page))
{
+ bool hit = false;
+
Assert(!isleaf);
Assert(BufferIsValid(cbuf));
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -1890,7 +1898,9 @@ _bt_split(Relation rel, Relation heaprel, BTScanInsert itup_key, Buffer buf,
*/
if (!isrightmost)
{
- sbuf = _bt_getbuf(rel, oopaque->btpo_next, BT_WRITE);
+ bool hit = false;
+ sbuf = _bt_getbuf(rel, oopaque->btpo_next, BT_WRITE, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(oopaque), hit);
spage = BufferGetPage(sbuf);
sopaque = BTPageGetOpaque(spage);
if (sopaque->btpo_prev != origpagenumber)
@@ -2247,12 +2257,14 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
BTPageOpaque rpageop;
bool wasroot;
bool wasonly;
+ bool hit = false;
Assert(P_INCOMPLETE_SPLIT(lpageop));
Assert(heaprel != NULL);
/* Lock right sibling, the one missing the downlink */
- rbuf = _bt_getbuf(rel, lpageop->btpo_next, BT_WRITE);
+ rbuf = _bt_getbuf(rel, lpageop->btpo_next, BT_WRITE, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(lpageop), hit);
rpage = BufferGetPage(rbuf);
rpageop = BTPageGetOpaque(rpage);
@@ -2264,7 +2276,8 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
BTMetaPageData *metad;
/* acquire lock on the metapage */
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -2320,6 +2333,7 @@ _bt_getstackbuf(Relation rel, Relation heaprel, BTStack stack, BlockNumber child
{
BlockNumber blkno;
OffsetNumber start;
+ bool hit = false;
blkno = stack->bts_blkno;
start = stack->bts_offset;
@@ -2330,9 +2344,10 @@ _bt_getstackbuf(Relation rel, Relation heaprel, BTStack stack, BlockNumber child
Page page;
BTPageOpaque opaque;
- buf = _bt_getbuf(rel, blkno, BT_WRITE);
+ buf = _bt_getbuf(rel, blkno, BT_WRITE, &hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
Assert(heaprel != NULL);
if (P_INCOMPLETE_SPLIT(opaque))
@@ -2460,6 +2475,7 @@ _bt_newlevel(Relation rel, Relation heaprel, Buffer lbuf, Buffer rbuf)
Buffer metabuf;
Page metapg;
BTMetaPageData *metad;
+ bool hit = false;
lbkno = BufferGetBlockNumber(lbuf);
rbkno = BufferGetBlockNumber(rbuf);
@@ -2472,9 +2488,10 @@ _bt_newlevel(Relation rel, Relation heaprel, Buffer lbuf, Buffer rbuf)
rootblknum = BufferGetBlockNumber(rootbuf);
/* acquire lock on the metapage */
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
+ pgstat_count_metadata_index_buffer(rel, hit);
/*
* Create downlink item for left page (old root). The key value used is
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index c79dd38ee1..2c7183a41b 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -30,6 +30,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/indexfsm.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
@@ -183,13 +184,15 @@ _bt_vacuum_needs_cleanup(Relation rel)
BTMetaPageData *metad;
uint32 btm_version;
BlockNumber prev_num_delpages;
+ bool hit = false;
/*
* Copy details from metapage to local variables quickly.
*
* Note that we deliberately avoid using cached version of metapage here.
*/
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
btm_version = metad->btm_version;
@@ -234,6 +237,7 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages)
Buffer metabuf;
Page metapg;
BTMetaPageData *metad;
+ bool hit = false;
/*
* On-disk compatibility note: The btm_last_cleanup_num_delpages metapage
@@ -253,7 +257,8 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages)
* no longer used as of PostgreSQL 14. We set it to -1.0 on rewrite, just
* to be consistent.
*/
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -350,6 +355,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
BlockNumber rootblkno;
uint32 rootlevel;
BTMetaPageData *metad;
+ bool hit = false;
Assert(access == BT_READ || heaprel != NULL);
@@ -373,7 +379,8 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
Assert(rootblkno != P_NONE);
rootlevel = metad->btm_fastlevel;
- rootbuf = _bt_getbuf(rel, rootblkno, BT_READ);
+ rootbuf = _bt_getbuf(rel, rootblkno, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -399,7 +406,8 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
rel->rd_amcache = NULL;
}
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metad = _bt_getmeta(rel, metabuf);
/* if no root page initialized yet, do it */
@@ -535,7 +543,8 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
for (;;)
{
- rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -588,6 +597,7 @@ _bt_gettrueroot(Relation rel)
BlockNumber rootblkno;
uint32 rootlevel;
BTMetaPageData *metad;
+ bool hit = false;
/*
* We don't try to use cached metapage data here, since (a) this path is
@@ -599,7 +609,8 @@ _bt_gettrueroot(Relation rel)
pfree(rel->rd_amcache);
rel->rd_amcache = NULL;
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metaopaque = BTPageGetOpaque(metapg);
metad = BTPageGetMeta(metapg);
@@ -638,7 +649,8 @@ _bt_gettrueroot(Relation rel)
for (;;)
{
- rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -679,8 +691,10 @@ _bt_getrootheight(Relation rel)
if (rel->rd_amcache == NULL)
{
Buffer metabuf;
+ bool hit = false;
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -743,8 +757,10 @@ _bt_metaversion(Relation rel, bool *heapkeyspace, bool *allequalimage)
if (rel->rd_amcache == NULL)
{
Buffer metabuf;
+ bool hit = false;
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -842,14 +858,14 @@ _bt_checkpage(Relation rel, Buffer buf)
* as _bt_lockbuf().
*/
Buffer
-_bt_getbuf(Relation rel, BlockNumber blkno, int access)
+_bt_getbuf(Relation rel, BlockNumber blkno, int access, bool *hit)
{
Buffer buf;
Assert(BlockNumberIsValid(blkno));
/* Read an existing block of the relation */
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, hit);
_bt_lockbuf(rel, buf, access);
_bt_checkpage(rel, buf);
@@ -903,7 +919,7 @@ _bt_allocbuf(Relation rel, Relation heaprel)
blkno = GetFreeIndexPage(rel);
if (blkno == InvalidBlockNumber)
break;
- buf = ReadBuffer(rel, blkno);
+ buf = ReadBuffer(rel, blkno, NULL);
if (_bt_conditionallockbuf(rel, buf))
{
page = BufferGetPage(buf);
@@ -1000,14 +1016,14 @@ _bt_allocbuf(Relation rel, Relation heaprel)
* is when the target page is the same one already in the buffer.
*/
Buffer
-_bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access)
+_bt_relandgetbuf(Relation rel, Buffer obuf, BlockNumber blkno, int access, bool *hit)
{
Buffer buf;
Assert(BlockNumberIsValid(blkno));
if (BufferIsValid(obuf))
_bt_unlockbuf(rel, obuf);
- buf = ReleaseAndReadBuffer(obuf, rel, blkno);
+ buf = ReleaseAndReadBuffer(obuf, rel, blkno, hit);
_bt_lockbuf(rel, buf, access);
_bt_checkpage(rel, buf);
@@ -1698,14 +1714,16 @@ _bt_leftsib_splitflag(Relation rel, BlockNumber leftsib, BlockNumber target)
Page page;
BTPageOpaque opaque;
bool result;
+ bool hit = false;
/* Easy case: No left sibling */
if (leftsib == P_NONE)
return false;
- buf = _bt_getbuf(rel, leftsib, BT_READ);
+ buf = _bt_getbuf(rel, leftsib, BT_READ, &hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
/*
* If the left sibling was concurrently split, so that its next-pointer
@@ -1758,7 +1776,7 @@ _bt_rightsib_halfdeadflag(Relation rel, BlockNumber leafrightsib)
Assert(leafrightsib != P_NONE);
- buf = _bt_getbuf(rel, leafrightsib, BT_READ);
+ buf = _bt_getbuf(rel, leafrightsib, BT_READ, NULL);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
@@ -2062,7 +2080,7 @@ _bt_pagedel(Relation rel, Buffer leafbuf, BTVacState *vstate)
if (!rightsib_empty)
break;
- leafbuf = _bt_getbuf(rel, rightsib, BT_WRITE);
+ leafbuf = _bt_getbuf(rel, rightsib, BT_WRITE, NULL);
}
}
@@ -2335,6 +2353,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
uint32 targetlevel;
IndexTuple leafhikey;
BlockNumber leaftopparent;
+ bool hit = false;
page = BufferGetPage(leafbuf);
opaque = BTPageGetOpaque(page);
@@ -2374,7 +2393,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
Assert(target != leafblkno);
/* Fetch the block number of the target's left sibling */
- buf = _bt_getbuf(rel, target, BT_READ);
+ buf = _bt_getbuf(rel, target, BT_READ, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
leftsib = opaque->btpo_prev;
@@ -2401,7 +2421,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
_bt_lockbuf(rel, leafbuf, BT_WRITE);
if (leftsib != P_NONE)
{
- lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
+ lbuf = _bt_getbuf(rel, leftsib, BT_WRITE, NULL);
page = BufferGetPage(lbuf);
opaque = BTPageGetOpaque(page);
while (P_ISDELETED(opaque) || opaque->btpo_next != target)
@@ -2449,7 +2469,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
CHECK_FOR_INTERRUPTS();
/* step right one page */
- lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);
+ lbuf = _bt_getbuf(rel, leftsib, BT_WRITE, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(lbuf);
opaque = BTPageGetOpaque(page);
}
@@ -2513,7 +2534,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
* And next write-lock the (current) right sibling.
*/
rightsib = opaque->btpo_next;
- rbuf = _bt_getbuf(rel, rightsib, BT_WRITE);
+ rbuf = _bt_getbuf(rel, rightsib, BT_WRITE, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(rbuf);
opaque = BTPageGetOpaque(page);
@@ -2569,7 +2591,8 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
if (P_RIGHTMOST(opaque))
{
/* rightsib will be the only one left on the level */
- metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE, &hit);
+ pgstat_count_metadata_index_buffer(rel, hit);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index accc7fe8bb..18afe6ec45 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -1641,7 +1641,7 @@ backtrack:
* nondefault buffer access strategy.
*/
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL,
- info->strategy);
+ info->strategy, NULL);
goto backtrack;
}
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index f69397623d..30d10bff4b 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -126,6 +126,7 @@ _bt_search(Relation rel, Relation heaprel, BTScanInsert key, Buffer *bufP,
IndexTuple itup;
BlockNumber child;
BTStack new_stack;
+ bool hit = false;
/*
* Race -- the page we just grabbed may have split since we read its
@@ -178,7 +179,8 @@ _bt_search(Relation rel, Relation heaprel, BTScanInsert key, Buffer *bufP,
page_access = BT_WRITE;
/* drop the read lock on the page, then acquire one on its child */
- *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access);
+ *bufP = _bt_relandgetbuf(rel, *bufP, child, page_access, &hit);
+ pgstat_count_index_buffer(rel, opaque->btpo_level != 1, hit);
/* okay, all set to move down a level */
stack_in = new_stack;
@@ -249,6 +251,7 @@ _bt_moveright(Relation rel,
Page page;
BTPageOpaque opaque;
int32 cmpval;
+ bool hit = false;
Assert(!forupdate || heaprel != NULL);
@@ -299,14 +302,16 @@ _bt_moveright(Relation rel,
_bt_relbuf(rel, buf);
/* re-acquire the lock in the right mode, and re-check */
- buf = _bt_getbuf(rel, blkno, access);
+ buf = _bt_getbuf(rel, blkno, access, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
continue;
}
if (P_IGNORE(opaque) || _bt_compare(rel, key, page, P_HIKEY) >= cmpval)
{
/* step right one page */
- buf = _bt_relandgetbuf(rel, buf, opaque->btpo_next, access);
+ buf = _bt_relandgetbuf(rel, buf, opaque->btpo_next, access, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
continue;
}
else
@@ -2301,6 +2306,7 @@ static bool
_bt_readnextpage(IndexScanDesc scan, BlockNumber blkno,
BlockNumber lastcurrblkno, ScanDirection dir, bool seized)
{
+ bool hit = false;
Relation rel = scan->indexRelation;
BTScanOpaque so = (BTScanOpaque) scan->opaque;
@@ -2348,7 +2354,8 @@ _bt_readnextpage(IndexScanDesc scan, BlockNumber blkno,
{
/* read blkno, but check for interrupts first */
CHECK_FOR_INTERRUPTS();
- so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ);
+ so->currPos.buf = _bt_getbuf(rel, blkno, BT_READ, &hit);
+ pgstat_count_record_index_buffer(rel, hit);
}
else
{
@@ -2444,10 +2451,11 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
Page page;
BTPageOpaque opaque;
int tries;
+ bool hit = false;
/* check for interrupts while we're not holding any buffer lock */
CHECK_FOR_INTERRUPTS();
- buf = _bt_getbuf(rel, *blkno, BT_READ);
+ buf = _bt_getbuf(rel, *blkno, BT_READ, &hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
@@ -2474,7 +2482,8 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
break;
/* step right */
*blkno = opaque->btpo_next;
- buf = _bt_relandgetbuf(rel, buf, *blkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, *blkno, BT_READ, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
}
@@ -2484,9 +2493,10 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
* _bt_readpage, which is passed by caller as lastcurrblkno) to see
* what's up with its prev sibling link
*/
- buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ, &hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
if (P_ISDELETED(opaque))
{
/*
@@ -2501,7 +2511,8 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
elog(ERROR, "fell off the end of index \"%s\"",
RelationGetRelationName(rel));
lastcurrblkno = opaque->btpo_next;
- buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ, &hit);
+ pgstat_count_index_buffer(rel, !P_ISLEAF(opaque), hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
if (!P_ISDELETED(opaque))
@@ -2558,6 +2569,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
OffsetNumber offnum;
BlockNumber blkno;
IndexTuple itup;
+ bool hit = false;
/*
* If we are looking for a leaf page, okay to descend from fast root;
@@ -2590,7 +2602,8 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
if (blkno == P_NONE)
elog(ERROR, "fell off the end of index \"%s\"",
RelationGetRelationName(rel));
- buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ, &hit);
+ pgstat_count_record_index_buffer(rel, hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
}
@@ -2613,7 +2626,8 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum));
blkno = BTreeTupleGetDownLink(itup);
- buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
+ buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ, &hit);
+ pgstat_count_record_index_buffer(rel, hit);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
}
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 9e27302fe8..c705559362 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -20,6 +20,7 @@
#include "access/nbtree.h"
#include "access/reloptions.h"
#include "commands/progress.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "utils/datum.h"
#include "utils/lsyscache.h"
@@ -3318,6 +3319,7 @@ _bt_killitems(IndexScanDesc scan)
int numKilled = so->numKilled;
bool killedsomething = false;
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
+ bool hit = false;
Assert(BTScanPosIsValid(so->currPos));
@@ -3346,9 +3348,10 @@ _bt_killitems(IndexScanDesc scan)
droppedpin = true;
/* Attempt to re-read the buffer, getting pin and lock. */
- buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
+ buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ, &hit);
page = BufferGetPage(buf);
+ pgstat_count_index_buffer(scan->indexRelation, !P_ISLEAF(BTPageGetOpaque(page)), hit);
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
so->currPos.buf = buf;
else
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index af6b27b213..7673162c5a 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -21,6 +21,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "common/pg_prng.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
@@ -1925,6 +1926,7 @@ spgdoinsert(Relation index, SpGistState *state,
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ bool hit = false;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
@@ -2065,13 +2067,15 @@ spgdoinsert(Relation index, SpGistState *state,
else if (parent.buffer == InvalidBuffer)
{
/* we hold no parent-page lock, so no deadlock is possible */
- current.buffer = ReadBuffer(index, current.blkno);
+ current.buffer = ReadBuffer(index, current.blkno, &hit);
+ pgstat_count_record_index_buffer(index, hit);
LockBuffer(current.buffer, BUFFER_LOCK_EXCLUSIVE);
}
else if (current.blkno != parent.blkno)
{
/* descend to a new child page */
- current.buffer = ReadBuffer(index, current.blkno);
+ current.buffer = ReadBuffer(index, current.blkno, &hit);
+ pgstat_count_record_index_buffer(index, hit);
/*
* Attempt to acquire lock on child page. We must beware of
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 25893050c5..49976f03ff 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -846,16 +846,18 @@ redirect:
OffsetNumber offset = ItemPointerGetOffsetNumber(&item->heapPtr);
Page page;
bool isnull;
+ bool hit = false;
if (buffer == InvalidBuffer)
{
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, &hit);
+
LockBuffer(buffer, BUFFER_LOCK_SHARE);
}
else if (blkno != BufferGetBlockNumber(buffer))
{
UnlockReleaseBuffer(buffer);
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, &hit);
LockBuffer(buffer, BUFFER_LOCK_SHARE);
}
@@ -869,6 +871,7 @@ redirect:
{
/* Page is a leaf - that is, all its tuples are heap items */
OffsetNumber max = PageGetMaxOffsetNumber(page);
+ pgstat_count_record_index_buffer(index, hit);
if (SpGistBlockIsRoot(blkno))
{
@@ -897,6 +900,7 @@ redirect:
SpGistInnerTuple innerTuple = (SpGistInnerTuple)
PageGetItem(page, PageGetItemId(page, offset));
+ pgstat_count_metadata_index_buffer(index, hit);
if (innerTuple->tupstate != SPGIST_LIVE)
{
if (innerTuple->tupstate == SPGIST_REDIRECT)
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 95fea74e29..6f6876df47 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -26,6 +26,7 @@
#include "commands/vacuum.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_coerce.h"
+#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/indexfsm.h"
#include "utils/catcache.h"
@@ -269,8 +270,10 @@ spgGetCache(Relation index)
{
Buffer metabuffer;
SpGistMetaPageData *metadata;
+ bool hit = false;
- metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
@@ -394,6 +397,7 @@ Buffer
SpGistNewBuffer(Relation index)
{
Buffer buffer;
+ bool hit = false;
/* First, try to get a page from FSM */
for (;;)
@@ -410,7 +414,7 @@ SpGistNewBuffer(Relation index)
if (SpGistBlockIsFixed(blkno))
continue;
- buffer = ReadBuffer(index, blkno);
+ buffer = ReadBuffer(index, blkno, &hit);
/*
* We have to guard against the possibility that someone else already
@@ -419,6 +423,7 @@ SpGistNewBuffer(Relation index)
if (ConditionalLockBuffer(buffer))
{
Page page = BufferGetPage(buffer);
+ pgstat_count_record_index_buffer(index, hit);
if (PageIsNew(page))
return buffer; /* OK to use, if never initialized */
@@ -449,13 +454,15 @@ SpGistNewBuffer(Relation index)
void
SpGistUpdateMetaPage(Relation index)
{
+ bool hit = false;
SpGistCache *cache = (SpGistCache *) index->rd_amcache;
if (cache != NULL)
{
Buffer metabuffer;
- metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO, &hit);
+ pgstat_count_metadata_index_buffer(index, hit);
if (ConditionalLockBuffer(metabuffer))
{
@@ -568,6 +575,7 @@ allocNewBuffer(Relation index, int flags)
Buffer
SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
{
+ bool hit = false;
SpGistCache *cache = spgGetCache(index);
SpGistLastUsedPage *lup;
@@ -604,7 +612,7 @@ SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
Buffer buffer;
Page page;
- buffer = ReadBuffer(index, lup->blkno);
+ buffer = ReadBuffer(index, lup->blkno, &hit);
if (!ConditionalLockBuffer(buffer))
{
@@ -617,6 +625,7 @@ SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
}
page = BufferGetPage(buffer);
+ pgstat_count_record_index_buffer(index, hit);
if (PageIsNew(page) || SpGistPageIsDeleted(page) || PageIsEmpty(page))
{
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 81171f3545..2161d69452 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -705,7 +705,7 @@ spgprocesspending(spgBulkDeleteState *bds)
/* examine the referenced page */
blkno = ItemPointerGetBlockNumber(&pitem->tid);
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
- RBM_NORMAL, bds->info->strategy);
+ RBM_NORMAL, bds->info->strategy, NULL);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
diff --git a/src/backend/access/transam/xloginsert.c b/src/backend/access/transam/xloginsert.c
index 5ee9d0b028..dd71a1d0c6 100644
--- a/src/backend/access/transam/xloginsert.c
+++ b/src/backend/access/transam/xloginsert.c
@@ -1300,7 +1300,7 @@ log_newpage_range(Relation rel, ForkNumber forknum,
while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
{
Buffer buf = ReadBufferExtended(rel, forknum, blkno,
- RBM_NORMAL, NULL);
+ RBM_NORMAL, NULL, NULL);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 15efb02bad..b8edae1fdc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -771,6 +771,10 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
I.idx_blks_read AS idx_blks_read,
I.idx_blks_hit AS idx_blks_hit,
+ I.idx_metadata_blks_read AS idx_metadata_blks_read,
+ I.idx_metadata_blks_hit AS idx_metadata_blks_hit,
+ I.idx_record_blks_read AS idx_record_blks_read,
+ I.idx_record_blks_hit AS idx_record_blks_hit,
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
@@ -784,7 +788,17 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(indexrelid))::bigint
AS idx_blks_read,
sum(pg_stat_get_blocks_hit(indexrelid))::bigint
- AS idx_blks_hit
+ AS idx_blks_hit,
+ sum(pg_stat_get_metadata_blocks_fetched(indexrelid) -
+ pg_stat_get_metadata_blocks_hit(indexrelid))::bigint
+ AS idx_metadata_blks_read,
+ sum(pg_stat_get_metadata_blocks_hit(indexrelid))::bigint
+ AS idx_metadata_blks_hit,
+ sum(pg_stat_get_record_blocks_fetched(indexrelid) -
+ pg_stat_get_record_blocks_hit(indexrelid))::bigint
+ AS idx_record_blks_read,
+ sum(pg_stat_get_record_blocks_hit(indexrelid))::bigint
+ AS idx_record_blks_hit
FROM pg_index WHERE indrelid = C.oid ) I ON true
LEFT JOIN LATERAL (
SELECT sum(pg_stat_get_blocks_fetched(indexrelid) -
@@ -841,7 +855,13 @@ CREATE VIEW pg_statio_all_indexes AS
I.relname AS indexrelname,
pg_stat_get_blocks_fetched(I.oid) -
pg_stat_get_blocks_hit(I.oid) AS idx_blks_read,
- pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit
+ pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit,
+ pg_stat_get_metadata_blocks_fetched(I.oid) -
+ pg_stat_get_metadata_blocks_hit(I.oid) AS idx_metadata_blks_read,
+ pg_stat_get_metadata_blocks_hit(I.oid) AS idx_metadata_blks_hit,
+ pg_stat_get_record_blocks_fetched(I.oid) -
+ pg_stat_get_record_blocks_hit(I.oid) AS idx_record_blks_read,
+ pg_stat_get_record_blocks_hit(I.oid) AS idx_record_blks_hit
FROM pg_class C JOIN
pg_index X ON C.oid = X.indrelid JOIN
pg_class I ON I.oid = X.indexrelid
@@ -1076,6 +1096,12 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_blocks_fetched(D.oid) -
pg_stat_get_db_blocks_hit(D.oid) AS blks_read,
pg_stat_get_db_blocks_hit(D.oid) AS blks_hit,
+ pg_stat_get_db_metadata_blocks_fetched(D.oid) -
+ pg_stat_get_db_metadata_blocks_hit(D.oid) AS metadata_blks_read,
+ pg_stat_get_db_metadata_blocks_hit(D.oid) AS metadata_blks_hit,
+ pg_stat_get_db_record_blocks_fetched(D.oid) -
+ pg_stat_get_db_record_blocks_hit(D.oid) AS record_blks_read,
+ pg_stat_get_db_record_blocks_hit(D.oid) AS record_blks_hit,
pg_stat_get_db_tuples_returned(D.oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c
index 451ae6f7f6..13937d0dfe 100644
--- a/src/backend/commands/sequence.c
+++ b/src/backend/commands/sequence.c
@@ -1194,7 +1194,7 @@ read_seq_tuple(Relation rel, Buffer *buf, HeapTuple seqdatatuple)
sequence_magic *sm;
Form_pg_sequence_data seq;
- *buf = ReadBuffer(rel, 0);
+ *buf = ReadBuffer(rel, 0, NULL);
LockBuffer(*buf, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(*buf);
diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c
index 0e7f5557f5..7c371ec776 100644
--- a/src/backend/storage/aio/read_stream.c
+++ b/src/backend/storage/aio/read_stream.c
@@ -336,7 +336,8 @@ read_stream_start_pending_read(ReadStream *stream)
&stream->buffers[buffer_index],
stream->pending_read_blocknum,
&nblocks,
- flags);
+ flags,
+ NULL);
stream->pinned_buffers += nblocks;
/* Remember whether we need to wait before returning this buffer. */
@@ -825,7 +826,8 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
if (likely(!StartReadBuffer(&stream->ios[0].op,
&stream->buffers[oldest_buffer_index],
next_blocknum,
- flags)))
+ flags,
+ NULL)))
{
/* Fast return. */
return buffer;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1f2a9fe997..2bad31af5a 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -495,7 +495,8 @@ ForgetPrivateRefCountEntry(PrivateRefCountEntry *ref)
static Buffer ReadBuffer_common(Relation rel,
SMgrRelation smgr, char smgr_persistence,
ForkNumber forkNum, BlockNumber blockNum,
- ReadBufferMode mode, BufferAccessStrategy strategy);
+ ReadBufferMode mode, BufferAccessStrategy strategy,
+ bool *hit);
static BlockNumber ExtendBufferedRelCommon(BufferManagerRelation bmr,
ForkNumber fork,
BufferAccessStrategy strategy,
@@ -755,9 +756,10 @@ ReadRecentBuffer(RelFileLocator rlocator, ForkNumber forkNum, BlockNumber blockN
* fork with RBM_NORMAL mode and default strategy.
*/
Buffer
-ReadBuffer(Relation reln, BlockNumber blockNum)
+ReadBuffer(Relation reln, BlockNumber blockNum, bool *hit)
{
- return ReadBufferExtended(reln, MAIN_FORKNUM, blockNum, RBM_NORMAL, NULL);
+ return ReadBufferExtended(reln, MAIN_FORKNUM, blockNum, RBM_NORMAL, NULL,
+ hit);
}
/*
@@ -803,7 +805,8 @@ ReadBuffer(Relation reln, BlockNumber blockNum)
*/
inline Buffer
ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
- ReadBufferMode mode, BufferAccessStrategy strategy)
+ ReadBufferMode mode, BufferAccessStrategy strategy,
+ bool *hit)
{
Buffer buf;
@@ -822,7 +825,7 @@ ReadBufferExtended(Relation reln, ForkNumber forkNum, BlockNumber blockNum,
* miss.
*/
buf = ReadBuffer_common(reln, RelationGetSmgr(reln), 0,
- forkNum, blockNum, mode, strategy);
+ forkNum, blockNum, mode, strategy, hit);
return buf;
}
@@ -848,7 +851,7 @@ ReadBufferWithoutRelcache(RelFileLocator rlocator, ForkNumber forkNum,
return ReadBuffer_common(NULL, smgr,
permanent ? RELPERSISTENCE_PERMANENT : RELPERSISTENCE_UNLOGGED,
forkNum, blockNum,
- mode, strategy);
+ mode, strategy, NULL);
}
/*
@@ -1016,7 +1019,7 @@ ExtendBufferedRelTo(BufferManagerRelation bmr,
{
Assert(extended_by == 0);
buffer = ReadBuffer_common(bmr.rel, bmr.smgr, bmr.relpersistence,
- fork, extend_to - 1, mode, strategy);
+ fork, extend_to - 1, mode, strategy, NULL);
}
return buffer;
@@ -1113,7 +1116,8 @@ PinBufferForBlock(Relation rel,
ForkNumber forkNum,
BlockNumber blockNum,
BufferAccessStrategy strategy,
- bool *foundPtr)
+ bool *foundPtr,
+ bool *hit)
{
BufferDesc *bufHdr;
IOContext io_context;
@@ -1164,8 +1168,11 @@ PinBufferForBlock(Relation rel,
* zeroed instead), the per-relation stats always count them.
*/
pgstat_count_buffer_read(rel);
- if (*foundPtr)
+ if (*foundPtr) {
+ if (hit)
+ *hit = true;
pgstat_count_buffer_hit(rel);
+ }
}
if (*foundPtr)
{
@@ -1193,7 +1200,8 @@ static pg_attribute_always_inline Buffer
ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
ForkNumber forkNum,
BlockNumber blockNum, ReadBufferMode mode,
- BufferAccessStrategy strategy)
+ BufferAccessStrategy strategy,
+ bool *hit)
{
ReadBuffersOperation operation;
Buffer buffer;
@@ -1231,7 +1239,7 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
bool found;
buffer = PinBufferForBlock(rel, smgr, persistence,
- forkNum, blockNum, strategy, &found);
+ forkNum, blockNum, strategy, &found, hit);
ZeroAndLockBuffer(buffer, mode, found);
return buffer;
}
@@ -1252,7 +1260,8 @@ ReadBuffer_common(Relation rel, SMgrRelation smgr, char smgr_persistence,
if (StartReadBuffer(&operation,
&buffer,
blockNum,
- flags))
+ flags,
+ hit))
WaitReadBuffers(&operation);
return buffer;
@@ -1264,7 +1273,8 @@ StartReadBuffersImpl(ReadBuffersOperation *operation,
BlockNumber blockNum,
int *nblocks,
int flags,
- bool allow_forwarding)
+ bool allow_forwarding,
+ bool *hit)
{
int actual_nblocks = *nblocks;
int maxcombine = 0;
@@ -1322,7 +1332,8 @@ StartReadBuffersImpl(ReadBuffersOperation *operation,
operation->forknum,
blockNum + i,
operation->strategy,
- &found);
+ &found,
+ hit);
}
if (found)
@@ -1495,10 +1506,11 @@ StartReadBuffers(ReadBuffersOperation *operation,
Buffer *buffers,
BlockNumber blockNum,
int *nblocks,
- int flags)
+ int flags,
+ bool *hit)
{
return StartReadBuffersImpl(operation, buffers, blockNum, nblocks, flags,
- true /* expect forwarded buffers */ );
+ true /* expect forwarded buffers */, hit);
}
/*
@@ -1513,13 +1525,14 @@ bool
StartReadBuffer(ReadBuffersOperation *operation,
Buffer *buffer,
BlockNumber blocknum,
- int flags)
+ int flags,
+ bool *hit)
{
int nblocks = 1;
bool result;
result = StartReadBuffersImpl(operation, buffer, blocknum, &nblocks, flags,
- false /* single block, no forwarding */ );
+ false /* single block, no forwarding */, hit);
Assert(nblocks == 1); /* single block can't be short */
return result;
@@ -3013,7 +3026,8 @@ MarkBufferDirty(Buffer buffer)
Buffer
ReleaseAndReadBuffer(Buffer buffer,
Relation relation,
- BlockNumber blockNum)
+ BlockNumber blockNum,
+ bool *hit)
{
ForkNumber forkNum = MAIN_FORKNUM;
BufferDesc *bufHdr;
@@ -3042,7 +3056,7 @@ ReleaseAndReadBuffer(Buffer buffer,
}
}
- return ReadBuffer(relation, blockNum);
+ return ReadBuffer(relation, blockNum, hit);
}
/*
diff --git a/src/backend/storage/freespace/freespace.c b/src/backend/storage/freespace/freespace.c
index 4773a9cc65..f9f6259ad4 100644
--- a/src/backend/storage/freespace/freespace.c
+++ b/src/backend/storage/freespace/freespace.c
@@ -593,7 +593,7 @@ fsm_readbuf(Relation rel, FSMAddress addr, bool extend)
return InvalidBuffer;
}
else
- buf = ReadBufferExtended(rel, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR, NULL);
+ buf = ReadBufferExtended(rel, FSM_FORKNUM, blkno, RBM_ZERO_ON_ERROR, NULL, NULL);
/*
* Initializing the page when needed is trickier than it looks, because of
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 52d82d2535..36913aea93 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -443,6 +443,10 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
PGSTAT_ACCUM_DBCOUNT(xact_rollback);
PGSTAT_ACCUM_DBCOUNT(blocks_fetched);
PGSTAT_ACCUM_DBCOUNT(blocks_hit);
+ PGSTAT_ACCUM_DBCOUNT(metadata_blocks_fetched);
+ PGSTAT_ACCUM_DBCOUNT(metadata_blocks_hit);
+ PGSTAT_ACCUM_DBCOUNT(record_blocks_fetched);
+ PGSTAT_ACCUM_DBCOUNT(record_blocks_hit);
PGSTAT_ACCUM_DBCOUNT(tuples_returned);
PGSTAT_ACCUM_DBCOUNT(tuples_fetched);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index eeb2d43cb1..ef8a40f4f0 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -880,6 +880,10 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
tabentry->blocks_fetched += lstats->counts.blocks_fetched;
tabentry->blocks_hit += lstats->counts.blocks_hit;
+ tabentry->metadata_blocks_fetched += lstats->counts.metadata_blocks_fetched;
+ tabentry->metadata_blocks_hit += lstats->counts.metadata_blocks_hit;
+ tabentry->record_blocks_fetched += lstats->counts.record_blocks_fetched;
+ tabentry->record_blocks_hit += lstats->counts.record_blocks_hit;
/* Clamp live_tuples in case of negative delta_live_tuples */
tabentry->live_tuples = Max(tabentry->live_tuples, 0);
@@ -897,6 +901,10 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
dbentry->tuples_deleted += lstats->counts.tuples_deleted;
dbentry->blocks_fetched += lstats->counts.blocks_fetched;
dbentry->blocks_hit += lstats->counts.blocks_hit;
+ dbentry->metadata_blocks_fetched += lstats->counts.metadata_blocks_fetched;
+ dbentry->metadata_blocks_hit += lstats->counts.metadata_blocks_hit;
+ dbentry->record_blocks_fetched += lstats->counts.record_blocks_fetched;
+ dbentry->record_blocks_hit += lstats->counts.record_blocks_hit;
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97af7c6554..4b8aab1465 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -67,6 +67,18 @@ PG_STAT_GET_RELENTRY_INT64(blocks_fetched)
/* pg_stat_get_blocks_hit */
PG_STAT_GET_RELENTRY_INT64(blocks_hit)
+/* pg_stat_get_metadata_blocks_fetched */
+PG_STAT_GET_RELENTRY_INT64(metadata_blocks_fetched)
+
+/* pg_stat_get_metadata_blocks_hit */
+PG_STAT_GET_RELENTRY_INT64(metadata_blocks_hit)
+
+/* pg_stat_get_record_blocks_fetched */
+PG_STAT_GET_RELENTRY_INT64(record_blocks_fetched)
+
+/* pg_stat_get_record_blocks_hit */
+PG_STAT_GET_RELENTRY_INT64(record_blocks_hit)
+
/* pg_stat_get_dead_tuples */
PG_STAT_GET_RELENTRY_INT64(dead_tuples)
@@ -1034,6 +1046,18 @@ PG_STAT_GET_DBENTRY_INT64(blocks_fetched)
/* pg_stat_get_db_blocks_hit */
PG_STAT_GET_DBENTRY_INT64(blocks_hit)
+/* pg_stat_get_db_metadata_blocks_fetched */
+PG_STAT_GET_DBENTRY_INT64(metadata_blocks_fetched)
+
+/* pg_stat_get_db_metadata_blocks_hit */
+PG_STAT_GET_DBENTRY_INT64(metadata_blocks_hit)
+
+/* pg_stat_get_db_record_blocks_fetched */
+PG_STAT_GET_DBENTRY_INT64(record_blocks_fetched)
+
+/* pg_stat_get_db_record_blocks_hit */
+PG_STAT_GET_DBENTRY_INT64(record_blocks_hit)
+
/* pg_stat_get_db_conflict_bufferpin */
PG_STAT_GET_DBENTRY_INT64(conflict_bufferpin)
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index ebca02588d..9e1effdf2d 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -1271,10 +1271,10 @@ extern int _bt_getrootheight(Relation rel);
extern void _bt_metaversion(Relation rel, bool *heapkeyspace,
bool *allequalimage);
extern void _bt_checkpage(Relation rel, Buffer buf);
-extern Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access);
+extern Buffer _bt_getbuf(Relation rel, BlockNumber blkno, int access, bool *hit);
extern Buffer _bt_allocbuf(Relation rel, Relation heaprel);
extern Buffer _bt_relandgetbuf(Relation rel, Buffer obuf,
- BlockNumber blkno, int access);
+ BlockNumber blkno, int access, bool *hit);
extern void _bt_relbuf(Relation rel, Buffer buf);
extern void _bt_lockbuf(Relation rel, Buffer buf, int access);
extern void _bt_unlockbuf(Relation rel, Buffer buf);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62beb71da2..3fbcc71eb4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5579,6 +5579,22 @@
proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8888', descr => 'statistics: number of record blocks fetched',
+ proname => 'pg_stat_get_record_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_record_blocks_fetched' },
+{ oid => '8889', descr => 'statistics: number of record blocks found in cache',
+ proname => 'pg_stat_get_record_blocks_hit', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_record_blocks_hit' },
+{ oid => '8890', descr => 'statistics: number of metadata blocks fetched',
+ proname => 'pg_stat_get_metadata_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_metadata_blocks_fetched' },
+{ oid => '8891', descr => 'statistics: number of metadata blocks found in cache',
+ proname => 'pg_stat_get_metadata_blocks_hit', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_metadata_blocks_hit' },
{ oid => '2781', descr => 'statistics: last manual vacuum time for a table',
proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5779,6 +5795,22 @@
proname => 'pg_stat_get_db_blocks_hit', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_blocks_hit' },
+{ oid => '8892', descr => 'statistics: number of db record blocks fetched',
+ proname => 'pg_stat_get_db_record_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_record_blocks_fetched' },
+{ oid => '8893', descr => 'statistics: blocks found in cache for database',
+ proname => 'pg_stat_get_db_record_blocks_hit', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_record_blocks_hit' },
+{ oid => '8894', descr => 'statistics: number of metadata blocks fetched',
+ proname => 'pg_stat_get_db_metadata_blocks_fetched', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_metadata_blocks_fetched' },
+{ oid => '8895', descr => 'statistics: number of metadata blocks found in cache',
+ proname => 'pg_stat_get_db_metadata_blocks_hit', provolatile => 's', proparallel => 'r',
+ prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_metadata_blocks_hit' },
{ oid => '2758', descr => 'statistics: tuples returned for database',
proname => 'pg_stat_get_db_tuples_returned', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 378f2f2c2b..28ed6dc471 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -153,6 +153,11 @@ typedef struct PgStat_TableCounts
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+
+ PgStat_Counter metadata_blocks_fetched;
+ PgStat_Counter metadata_blocks_hit;
+ PgStat_Counter record_blocks_fetched;
+ PgStat_Counter record_blocks_hit;
} PgStat_TableCounts;
/* ----------
@@ -345,6 +350,10 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter xact_rollback;
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter metadata_blocks_fetched;
+ PgStat_Counter metadata_blocks_hit;
+ PgStat_Counter record_blocks_fetched;
+ PgStat_Counter record_blocks_hit;
PgStat_Counter tuples_returned;
PgStat_Counter tuples_fetched;
PgStat_Counter tuples_inserted;
@@ -440,6 +449,11 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter metadata_blocks_fetched;
+ PgStat_Counter metadata_blocks_hit;
+ PgStat_Counter record_blocks_fetched;
+ PgStat_Counter record_blocks_hit;
+
TimestampTz last_vacuum_time; /* user initiated vacuum */
PgStat_Counter vacuum_count;
TimestampTz last_autovacuum_time; /* autovacuum initiated */
@@ -711,6 +725,35 @@ extern void pgstat_report_analyze(Relation rel,
if (pgstat_should_count_relation(rel)) \
(rel)->pgstat_info->counts.blocks_hit++; \
} while (0)
+#define pgstat_count_metadata_index_buffer(rel, hit) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ (rel)->pgstat_info->counts.metadata_blocks_fetched++; \
+ (rel)->pgstat_info->counts.metadata_blocks_hit += (hit); \
+ } \
+ } while (0)
+#define pgstat_count_record_index_buffer(rel, hit) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ (rel)->pgstat_info->counts.record_blocks_fetched++; \
+ (rel)->pgstat_info->counts.record_blocks_hit += (hit); \
+ } \
+ } while (0)
+#define pgstat_count_index_buffer(rel, metadata, hit) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ if ((metadata)) { \
+ (rel)->pgstat_info->counts.metadata_blocks_fetched++;\
+ if ((hit)) \
+ (rel)->pgstat_info->counts.metadata_blocks_hit++;\
+ } \
+ else { \
+ (rel)->pgstat_info->counts.record_blocks_fetched++; \
+ if ((hit)) \
+ (rel)->pgstat_info->counts.record_blocks_hit++; \
+ } \
+ } \
+ } while (0)
extern void pgstat_count_heap_insert(Relation rel, PgStat_Counter n);
extern void pgstat_count_heap_update(Relation rel, bool hot, bool newpage);
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 41fdc1e769..aa079fde48 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -208,10 +208,10 @@ extern PrefetchBufferResult PrefetchBuffer(Relation reln, ForkNumber forkNum,
BlockNumber blockNum);
extern bool ReadRecentBuffer(RelFileLocator rlocator, ForkNumber forkNum,
BlockNumber blockNum, Buffer recent_buffer);
-extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
+extern Buffer ReadBuffer(Relation reln, BlockNumber blockNum, bool *hit);
extern Buffer ReadBufferExtended(Relation reln, ForkNumber forkNum,
BlockNumber blockNum, ReadBufferMode mode,
- BufferAccessStrategy strategy);
+ BufferAccessStrategy strategy, bool *hit);
extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator,
ForkNumber forkNum, BlockNumber blockNum,
ReadBufferMode mode, BufferAccessStrategy strategy,
@@ -220,12 +220,14 @@ extern Buffer ReadBufferWithoutRelcache(RelFileLocator rlocator,
extern bool StartReadBuffer(ReadBuffersOperation *operation,
Buffer *buffer,
BlockNumber blocknum,
- int flags);
+ int flags,
+ bool *hit);
extern bool StartReadBuffers(ReadBuffersOperation *operation,
Buffer *buffers,
BlockNumber blockNum,
int *nblocks,
- int flags);
+ int flags,
+ bool *hit);
extern void WaitReadBuffers(ReadBuffersOperation *operation);
extern void ReleaseBuffer(Buffer buffer);
@@ -236,7 +238,7 @@ extern void MarkBufferDirty(Buffer buffer);
extern void IncrBufferRefCount(Buffer buffer);
extern void CheckBufferIsPinnedOnce(Buffer buffer);
extern Buffer ReleaseAndReadBuffer(Buffer buffer, Relation relation,
- BlockNumber blockNum);
+ BlockNumber blockNum, bool *hit);
extern Buffer ExtendBufferedRel(BufferManagerRelation bmr,
ForkNumber forkNum,
diff --git a/src/test/modules/test_aio/test_aio.c b/src/test/modules/test_aio/test_aio.c
index 1d776010ef..dbcb9d0f4c 100644
--- a/src/test/modules/test_aio/test_aio.c
+++ b/src/test/modules/test_aio/test_aio.c
@@ -211,7 +211,7 @@ modify_rel_block(PG_FUNCTION_ARGS)
rel = relation_open(relid, AccessExclusiveLock);
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno,
- RBM_ZERO_ON_ERROR, NULL);
+ RBM_ZERO_ON_ERROR, NULL, NULL);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
@@ -312,7 +312,7 @@ create_toy_buffer(Relation rel, BlockNumber blkno)
bool was_pinned = false;
/* place buffer in shared buffers without erroring out */
- buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_ZERO_AND_LOCK, NULL);
+ buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_ZERO_AND_LOCK, NULL, NULL);
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
if (RelationUsesLocalBuffers(rel))
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6cf828ca8d..62d1ec90f2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1868,6 +1868,10 @@ pg_stat_database| SELECT oid AS datid,
pg_stat_get_db_xact_rollback(oid) AS xact_rollback,
(pg_stat_get_db_blocks_fetched(oid) - pg_stat_get_db_blocks_hit(oid)) AS blks_read,
pg_stat_get_db_blocks_hit(oid) AS blks_hit,
+ (pg_stat_get_db_metadata_blocks_fetched(oid) - pg_stat_get_db_metadata_blocks_hit(oid)) AS metadata_blks_read,
+ pg_stat_get_db_metadata_blocks_hit(oid) AS metadata_blks_hit,
+ (pg_stat_get_db_record_blocks_fetched(oid) - pg_stat_get_db_record_blocks_hit(oid)) AS record_blks_read,
+ pg_stat_get_db_record_blocks_hit(oid) AS record_blks_hit,
pg_stat_get_db_tuples_returned(oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(oid) AS tup_inserted,
@@ -2360,7 +2364,11 @@ pg_statio_all_indexes| SELECT c.oid AS relid,
c.relname,
i.relname AS indexrelname,
(pg_stat_get_blocks_fetched(i.oid) - pg_stat_get_blocks_hit(i.oid)) AS idx_blks_read,
- pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit
+ pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit,
+ (pg_stat_get_metadata_blocks_fetched(i.oid) - pg_stat_get_metadata_blocks_hit(i.oid)) AS idx_metadata_blks_read,
+ pg_stat_get_metadata_blocks_hit(i.oid) AS idx_metadata_blks_hit,
+ (pg_stat_get_record_blocks_fetched(i.oid) - pg_stat_get_record_blocks_hit(i.oid)) AS idx_record_blks_read,
+ pg_stat_get_record_blocks_hit(i.oid) AS idx_record_blks_hit
FROM (((pg_class c
JOIN pg_index x ON ((c.oid = x.indrelid)))
JOIN pg_class i ON ((i.oid = x.indexrelid)))
@@ -2381,6 +2389,10 @@ pg_statio_all_tables| SELECT c.oid AS relid,
pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
i.idx_blks_read,
i.idx_blks_hit,
+ i.idx_metadata_blks_read,
+ i.idx_metadata_blks_hit,
+ i.idx_record_blks_read,
+ i.idx_record_blks_hit,
(pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read,
pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
x.idx_blks_read AS tidx_blks_read,
@@ -2389,7 +2401,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
- (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
+ (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit,
+ (sum((pg_stat_get_metadata_blocks_fetched(pg_index.indexrelid) - pg_stat_get_metadata_blocks_hit(pg_index.indexrelid))))::bigint AS idx_metadata_blks_read,
+ (sum(pg_stat_get_metadata_blocks_hit(pg_index.indexrelid)))::bigint AS idx_metadata_blks_hit,
+ (sum((pg_stat_get_record_blocks_fetched(pg_index.indexrelid) - pg_stat_get_record_blocks_hit(pg_index.indexrelid))))::bigint AS idx_record_blks_read,
+ (sum(pg_stat_get_record_blocks_hit(pg_index.indexrelid)))::bigint AS idx_record_blks_hit
FROM pg_index
WHERE (pg_index.indrelid = c.oid)) i ON (true))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2403,7 +2419,11 @@ pg_statio_sys_indexes| SELECT relid,
relname,
indexrelname,
idx_blks_read,
- idx_blks_hit
+ idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit
FROM pg_statio_all_indexes
WHERE ((schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (schemaname ~ '^pg_toast'::text));
pg_statio_sys_sequences| SELECT relid,
@@ -2420,6 +2440,10 @@ pg_statio_sys_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
@@ -2432,7 +2456,11 @@ pg_statio_user_indexes| SELECT relid,
relname,
indexrelname,
idx_blks_read,
- idx_blks_hit
+ idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit
FROM pg_statio_all_indexes
WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text));
pg_statio_user_sequences| SELECT relid,
@@ -2449,6 +2477,10 @@ pg_statio_user_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks_read,
+ idx_metadata_blks_hit,
+ idx_record_blks_read,
+ idx_record_blks_hit,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 776f1ad0e5..da41526f16 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1868,4 +1868,148 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
(1 row)
DROP TABLE table_fillfactor;
+-- brin indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+SELECT count(*)
+ FROM brin_test
+ WHERE a = 19 AND b = 89;
+ count
+-------
+ 1
+(1 row)
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='brin_test_a_idx';
+ ?column? | ?column?
+----------+----------
+ t | t
+(1 row)
+
+COMMIT;
+-- gist indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+select count(*) from gist_point_tbl where p <@ box(point(0,0), point(200, 200));
+ count
+-------
+ 10
+(1 row)
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='gist_pointidx';
+ ?column? | ?column?
+----------+----------
+ t | t
+(1 row)
+
+COMMIT;
+-- hash indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+SELECT count(*)
+ FROM hash_name_heap
+ WHERE random = '1505703298';
+ count
+-------
+ 1
+(1 row)
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='hash_name_index';
+ ?column? | ?column?
+----------+----------
+ t | t
+(1 row)
+
+COMMIT;
+-- spgist indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+select count(*) from spgist_point_tbl where p <@ box(point(0,0), point(200, 200));
+ count
+-------
+ 9
+(1 row)
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='spgist_point_idx';
+ ?column? | ?column?
+----------+----------
+ t | t
+(1 row)
+
+COMMIT;
+-- b-tree indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+select count(*) from tenk2 where unique1 = '1504';
+ count
+-------
+ 1
+(1 row)
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='tenk2_unique1';
+ ?column? | ?column?
+----------+----------
+ t | t
+(1 row)
+
+COMMIT;
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 232ab8db8f..7815b8d1c0 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -925,4 +925,98 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
DROP TABLE table_fillfactor;
+-- brin indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+SELECT count(*)
+ FROM brin_test
+ WHERE a = 19 AND b = 89;
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='brin_test_a_idx';
+
+COMMIT;
+
+-- gist indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+select count(*) from gist_point_tbl where p <@ box(point(0,0), point(200, 200));
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='gist_pointidx';
+
+COMMIT;
+
+-- hash indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+SELECT count(*)
+ FROM hash_name_heap
+ WHERE random = '1505703298';
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='hash_name_index';
+
+COMMIT;
+
+-- spgist indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+select count(*) from spgist_point_tbl where p <@ box(point(0,0), point(200, 200));
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='spgist_point_idx';
+
+COMMIT;
+
+-- b-tree indexes: test stats collection for metadata and record index block
+-- hits and reads adding up to idx_blks_read and idx_blks_hit respectively
+select count(*) from tenk2 where unique1 = '1504';
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+
+SELECT idx_metadata_blks_hit + idx_record_blks_hit = idx_blks_hit,
+ idx_metadata_blks_read + idx_record_blks_read = idx_blks_read
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='tenk2_unique1';
+
+COMMIT;
+
-- End of Stats Test
--
2.39.5 (Apple Git-154)
Hi,
Just attaching v2 of the patch. It’s a trimmed down version compared to
what I started with. For context, this is the original discussion on
which this work is based on:
/messages/by-id/CAH2-WzmdZqxCS1widYzjDAM+Z-Jz=ejJoaWXDVw9Qy1UsK0tLA@mail.gmail.com
After applying the patch, you can run the following:
create table test (id bigint primary key);
insert into test select * from generate_series(1, 10_000);
select * from pg_statio_all_indexes where indexrelname = 'test_pkey';
The result will contain a new column: idx_metadata_blks.
Looking forward to your feedback.
Thanks,
Mircea
Attachments:
v2-0001-Add-metadata-non-leaf-index-block-statistics-to-p.patchtext/plain; charset=UTF-8; name=v2-0001-Add-metadata-non-leaf-index-block-statistics-to-p.patchDownload
From d99d1d4d1f39c0068be34fa5c9d0f6efa6c70d2c Mon Sep 17 00:00:00 2001
From: Mircea Cadariu <cadariu.mircea@gmail.com>
Date: Mon, 30 Jun 2025 08:28:37 +0100
Subject: [PATCH v2] Add metadata (non-leaf) index block statistics to pg_stat
functions and system views.
This commit contains the changes for counting and exposing when index metadata block
reads (either hits or from disk) are issued from the index handling code.
---
doc/src/sgml/monitoring.sgml | 28 ++++++++++++++++++++
src/backend/access/brin/brin.c | 1 +
src/backend/access/brin/brin_revmap.c | 5 ++++
src/backend/access/gin/ginbtree.c | 7 +++++
src/backend/access/gin/ginfast.c | 8 ++++++
src/backend/access/gin/ginget.c | 7 ++++-
src/backend/access/gin/ginutil.c | 3 +++
src/backend/access/gist/gist.c | 11 ++++++++
src/backend/access/gist/gistbuild.c | 5 ++++
src/backend/access/gist/gistget.c | 2 ++
src/backend/access/hash/hashpage.c | 7 +++++
src/backend/access/nbtree/nbtinsert.c | 7 +++++
src/backend/access/nbtree/nbtpage.c | 12 +++++++++
src/backend/access/nbtree/nbtsearch.c | 13 +++++++++
src/backend/access/nbtree/nbtutils.c | 1 +
src/backend/access/spgist/spgdoinsert.c | 3 +++
src/backend/access/spgist/spgscan.c | 2 ++
src/backend/access/spgist/spgutils.c | 6 +++++
src/backend/catalog/system_views.sql | 9 +++++--
src/backend/utils/activity/pgstat_database.c | 1 +
src/backend/utils/activity/pgstat_relation.c | 2 ++
src/backend/utils/adt/pgstatfuncs.c | 6 +++++
src/include/catalog/pg_proc.dat | 8 ++++++
src/include/pgstat.h | 16 +++++++++++
src/test/regress/expected/rules.out | 16 ++++++++---
src/test/regress/expected/stats.out | 27 +++++++++++++++++++
src/test/regress/sql/stats.sql | 17 ++++++++++++
27 files changed, 223 insertions(+), 7 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4265a22d4d..beb1dded0b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3420,6 +3420,15 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks hit or read in this database
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>tup_returned</structfield> <type>bigint</type>
@@ -4384,6 +4393,15 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks hit or read from all indexes in this table
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>toast_blks_read</structfield> <type>bigint</type>
@@ -4519,6 +4537,16 @@ description | Waiting for a newly initialized WAL file to reach durable storage
Number of buffer hits in this index
</para></entry>
</row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks read or hit in this index
+ </para></entry>
+ </row>
+
</tbody>
</tgroup>
</table>
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index 4204088fa0..0e7b6f3db3 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1652,6 +1652,7 @@ brinGetStats(Relation index, BrinStatsData *stats)
BrinMetaPageData *metadata;
metabuffer = ReadBuffer(index, BRIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = (BrinMetaPageData *) PageGetContents(metapage);
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 4e380ecc71..0172ad83d6 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -27,6 +27,7 @@
#include "access/brin_xlog.h"
#include "access/rmgr.h"
#include "access/xloginsert.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
@@ -75,6 +76,7 @@ brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
Page page;
meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(idxrel);
LockBuffer(meta, BUFFER_LOCK_SHARE);
page = BufferGetPage(meta);
metadata = (BrinMetaPageData *) PageGetContents(page);
@@ -232,6 +234,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
Assert(mapBlk != InvalidBlockNumber);
revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ pgstat_count_metadata_buffer(idxRel);
}
LockBuffer(revmap->rm_currBuf, BUFFER_LOCK_SHARE);
@@ -486,6 +489,7 @@ revmap_get_buffer(BrinRevmap *revmap, BlockNumber heapBlk)
ReleaseBuffer(revmap->rm_currBuf);
revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ pgstat_count_metadata_buffer(revmap->rm_irel);
}
return revmap->rm_currBuf;
@@ -554,6 +558,7 @@ revmap_physical_extend(BrinRevmap *revmap)
if (mapBlk < nblocks)
{
buf = ReadBuffer(irel, mapBlk);
+ pgstat_count_metadata_buffer(irel);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(buf);
}
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 644d484ea5..0d8778f589 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -18,6 +18,7 @@
#include "access/ginxlog.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/predicate.h"
#include "utils/injection_point.h"
#include "utils/memutils.h"
@@ -104,6 +105,8 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
page = BufferGetPage(stack->buffer);
+ pgstat_count_metadata_buffer_if(!GinPageIsLeaf(page), btree->index);
+
access = ginTraverseLock(stack->buffer, searchMode);
/*
@@ -191,6 +194,8 @@ ginStepRight(Buffer buffer, Relation index, int lockmode)
if (isLeaf != GinPageIsLeaf(page) || isData != GinPageIsData(page))
elog(ERROR, "right sibling of GIN page is of different type");
+ pgstat_count_metadata_buffer_if(!GinPageIsLeaf(page), index);
+
return nextbuffer;
}
@@ -254,6 +259,8 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
page = BufferGetPage(buffer);
if (GinPageIsLeaf(page))
elog(ERROR, "Lost path");
+ else
+ pgstat_count_metadata_buffer(btree->index);
if (GinPageIsIncompleteSplit(page))
{
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index a6d88572cc..328c3c19e5 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -25,6 +25,7 @@
#include "catalog/pg_am.h"
#include "commands/vacuum.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "port/pg_bitutils.h"
#include "postmaster/autovacuum.h"
#include "storage/indexfsm.h"
@@ -240,6 +241,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
data.newRightlink = data.prevTail = InvalidBlockNumber;
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
metapage = BufferGetPage(metabuffer);
/*
@@ -320,6 +322,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
data.newRightlink = sublist.head;
buffer = ReadBuffer(index, metadata->tail);
+ pgstat_count_metadata_buffer(index);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
@@ -359,6 +362,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
CheckForSerializableConflictIn(index, NULL, GIN_METAPAGE_BLKNO);
buffer = ReadBuffer(index, metadata->tail);
+ pgstat_count_metadata_buffer(index);
LockBuffer(buffer, GIN_EXCLUSIVE);
page = BufferGetPage(buffer);
@@ -576,6 +580,7 @@ shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
{
freespace[data.ndeleted] = blknoToDelete;
buffers[data.ndeleted] = ReadBuffer(index, blknoToDelete);
+ pgstat_count_metadata_buffer(index);
LockBuffer(buffers[data.ndeleted], GIN_EXCLUSIVE);
page = BufferGetPage(buffers[data.ndeleted]);
@@ -828,6 +833,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
}
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -853,6 +859,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
buffer = ReadBuffer(index, blkno);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, GIN_UNLOCK);
@@ -1004,6 +1011,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
*/
vacuum_delay_point(false);
buffer = ReadBuffer(index, blkno);
+ pgstat_count_metadata_buffer(index);
LockBuffer(buffer, GIN_SHARE);
page = BufferGetPage(buffer);
}
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index f29ccd3c2d..86747beb39 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -18,6 +18,7 @@
#include "access/relscan.h"
#include "common/pg_prng.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/predicate.h"
#include "utils/datum.h"
#include "utils/memutils.h"
@@ -1491,9 +1492,10 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
* Here we must prevent deletion of next page by insertcleanup
* process, which may be trying to obtain exclusive lock on
* current page. So, we lock next page before releasing the
- * current one
+ * current one.
*/
Buffer tmpbuf = ReadBuffer(scan->indexRelation, blkno);
+ pgstat_count_metadata_buffer(scan->indexRelation);
LockBuffer(tmpbuf, GIN_SHARE);
UnlockReleaseBuffer(pos->pendingBuffer);
@@ -1844,6 +1846,8 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
Page page;
BlockNumber blkno;
+ pgstat_count_metadata_buffer(scan->indexRelation);
+
*ntids = 0;
/*
@@ -1868,6 +1872,7 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
}
pos.pendingBuffer = ReadBuffer(scan->indexRelation, blkno);
+ pgstat_count_metadata_buffer(scan->indexRelation);
LockBuffer(pos.pendingBuffer, GIN_SHARE);
pos.firstOffset = FirstOffsetNumber;
UnlockReleaseBuffer(metabuffer);
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 78f7b7a249..9cc6e6d6c3 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -22,6 +22,7 @@
#include "catalog/pg_type.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/indexfsm.h"
#include "utils/builtins.h"
@@ -632,6 +633,7 @@ ginGetStats(Relation index, GinStatsData *stats)
GinMetaPageData *metadata;
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -659,6 +661,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats, bool is_build)
GinMetaPageData *metadata;
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, GIN_EXCLUSIVE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 7b24380c97..2e5cc91296 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -20,6 +20,7 @@
#include "catalog/pg_collation.h"
#include "commands/vacuum.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "nodes/execnodes.h"
#include "storage/predicate.h"
#include "utils/fmgrprotos.h"
@@ -684,7 +685,10 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
}
if (XLogRecPtrIsInvalid(stack->lsn))
+ {
stack->buffer = ReadBuffer(state.r, stack->blkno);
+ pgstat_count_metadata_buffer(state.r);
+ }
/*
* Be optimistic and grab shared lock first. Swap it for an exclusive
@@ -949,6 +953,8 @@ gistFindPath(Relation r, BlockNumber child, OffsetNumber *downlinkoffnum)
UnlockReleaseBuffer(buffer);
break;
}
+ else
+ pgstat_count_metadata_buffer(r);
/* currently, internal pages are never deleted */
Assert(!GistPageIsDeleted(page));
@@ -1096,6 +1102,9 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
break;
}
parent->buffer = ReadBuffer(r, parent->blkno);
+
+ pgstat_count_metadata_buffer(r);
+
LockBuffer(parent->buffer, GIST_EXCLUSIVE);
gistcheckpage(r, parent->buffer);
parent->page = (Page) BufferGetPage(parent->buffer);
@@ -1122,6 +1131,7 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
{
ptr->buffer = ReadBuffer(r, ptr->blkno);
ptr->page = (Page) BufferGetPage(ptr->buffer);
+ pgstat_count_metadata_buffer_if(!GistPageIsLeaf(ptr->page), r);
ptr = ptr->parent;
}
@@ -1236,6 +1246,7 @@ gistfixsplit(GISTInsertState *state, GISTSTATE *giststate)
{
/* lock next page */
buf = ReadBuffer(state->r, GistPageGetOpaque(page)->rightlink);
+ pgstat_count_metadata_buffer_if(!GistPageIsLeaf((Page) page), state->r);
LockBuffer(buf, GIST_EXCLUSIVE);
}
else
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index 9e707167d9..8b2abc30ec 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -39,6 +39,7 @@
#include "access/tableam.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "nodes/execnodes.h"
#include "optimizer/optimizer.h"
#include "storage/bufmgr.h"
@@ -967,6 +968,7 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
*/
buffer = ReadBuffer(indexrel, blkno);
+ pgstat_count_metadata_buffer(indexrel);
LockBuffer(buffer, GIST_EXCLUSIVE);
page = (Page) BufferGetPage(buffer);
@@ -1248,6 +1250,7 @@ gistBufferingFindCorrectParent(GISTBuildState *buildstate,
buffer = ReadBuffer(buildstate->indexrel, parent);
page = BufferGetPage(buffer);
+ pgstat_count_metadata_buffer(buildstate->indexrel);
LockBuffer(buffer, GIST_EXCLUSIVE);
gistcheckpage(buildstate->indexrel, buffer);
maxoff = PageGetMaxOffsetNumber(page);
@@ -1457,6 +1460,8 @@ gistGetMaxLevel(Relation index)
break;
}
+ pgstat_count_metadata_buffer(index);
+
/*
* Pick the first downlink on the page, and follow it. It doesn't
* matter which downlink we choose, the tree has the same depth
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 387d997234..7f31331eed 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -346,6 +346,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
gistcheckpage(scan->indexRelation, buffer);
page = BufferGetPage(buffer);
opaque = GistPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!GistPageIsLeaf((Page) page),
+ scan->indexRelation);
/*
* Check if we need to follow the rightlink. We need to follow it if the
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index b8e5bd005e..ac282ec11e 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -32,6 +32,7 @@
#include "access/hash_xlog.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "port/pg_bitutils.h"
#include "storage/predicate.h"
#include "storage/smgr.h"
@@ -76,6 +77,8 @@ _hash_getbuf(Relation rel, BlockNumber blkno, int access, int flags)
buf = ReadBuffer(rel, blkno);
+ pgstat_count_metadata_buffer_if(flags == LH_META_PAGE, rel);
+
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
@@ -102,6 +105,8 @@ _hash_getbuf_with_condlock_cleanup(Relation rel, BlockNumber blkno, int flags)
buf = ReadBuffer(rel, blkno);
+ pgstat_count_metadata_buffer_if(flags == LH_META_PAGE, rel);
+
if (!ConditionalLockBufferForCleanup(buf))
{
ReleaseBuffer(buf);
@@ -247,6 +252,8 @@ _hash_getbuf_with_strategy(Relation rel, BlockNumber blkno,
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ pgstat_count_metadata_buffer_if(flags == LH_META_PAGE, rel);
+
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index aa82cede30..7802be2d9e 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -21,6 +21,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "common/pg_prng.h"
+#include "pgstat.h"
#include "lib/qunique.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
@@ -1260,6 +1261,7 @@ _bt_insertonpg(Relation rel,
Assert(BufferIsValid(cbuf));
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -2256,6 +2258,8 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
rpage = BufferGetPage(rbuf);
rpageop = BTPageGetOpaque(rpage);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(rpageop), rel);
+
/* Could this be a root split? */
if (!stack)
{
@@ -2265,6 +2269,7 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
/* acquire lock on the metapage */
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -2333,6 +2338,7 @@ _bt_getstackbuf(Relation rel, Relation heaprel, BTStack stack, BlockNumber child
buf = _bt_getbuf(rel, blkno, BT_WRITE);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer(rel);
Assert(heaprel != NULL);
if (P_INCOMPLETE_SPLIT(opaque))
@@ -2473,6 +2479,7 @@ _bt_newlevel(Relation rel, Relation heaprel, Buffer lbuf, Buffer rbuf)
/* acquire lock on the metapage */
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index c79dd38ee1..ed812c6f9e 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -30,6 +30,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/indexfsm.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
@@ -190,6 +191,7 @@ _bt_vacuum_needs_cleanup(Relation rel)
* Note that we deliberately avoid using cached version of metapage here.
*/
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
btm_version = metad->btm_version;
@@ -254,6 +256,7 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages)
* to be consistent.
*/
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -374,6 +377,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
rootlevel = metad->btm_fastlevel;
rootbuf = _bt_getbuf(rel, rootblkno, BT_READ);
+ pgstat_count_metadata_buffer(rel);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -400,6 +404,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
}
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metad = _bt_getmeta(rel, metabuf);
/* if no root page initialized yet, do it */
@@ -536,6 +541,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
for (;;)
{
rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ pgstat_count_metadata_buffer(rel);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -600,6 +606,7 @@ _bt_gettrueroot(Relation rel)
rel->rd_amcache = NULL;
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metaopaque = BTPageGetOpaque(metapg);
metad = BTPageGetMeta(metapg);
@@ -639,6 +646,7 @@ _bt_gettrueroot(Relation rel)
for (;;)
{
rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ pgstat_count_metadata_buffer(rel);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -681,6 +689,7 @@ _bt_getrootheight(Relation rel)
Buffer metabuf;
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -745,6 +754,7 @@ _bt_metaversion(Relation rel, bool *heapkeyspace, bool *allequalimage)
Buffer metabuf;
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -2375,6 +2385,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
/* Fetch the block number of the target's left sibling */
buf = _bt_getbuf(rel, target, BT_READ);
+ pgstat_count_metadata_buffer(rel);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
leftsib = opaque->btpo_prev;
@@ -2570,6 +2581,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
{
/* rightsib will be the only one left on the level */
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 4af1ff1e9e..a0bb9b5485 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -185,6 +185,8 @@ _bt_search(Relation rel, Relation heaprel, BTScanInsert key, Buffer *bufP,
/* drop the read lock on the page, then acquire one on its child */
*bufP = _bt_relandgetbuf(rel, *bufP, child, page_access);
+ pgstat_count_metadata_buffer_if(opaque->btpo_level != 1, rel);
+
/* okay, all set to move down a level */
stack_in = new_stack;
}
@@ -305,6 +307,9 @@ _bt_moveright(Relation rel,
/* re-acquire the lock in the right mode, and re-check */
buf = _bt_getbuf(rel, blkno, access);
+
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
+
continue;
}
@@ -312,6 +317,7 @@ _bt_moveright(Relation rel,
{
/* step right one page */
buf = _bt_relandgetbuf(rel, buf, opaque->btpo_next, access);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
continue;
}
else
@@ -2512,6 +2518,7 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_getbuf(rel, *blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
/*
* If this isn't the page we want, walk right till we find what we
@@ -2539,6 +2546,7 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_relandgetbuf(rel, buf, *blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
}
/*
@@ -2549,6 +2557,8 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
+
if (P_ISDELETED(opaque))
{
/*
@@ -2566,6 +2576,7 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
if (!P_ISDELETED(opaque))
break;
}
@@ -2655,6 +2666,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
}
/* Done? */
@@ -2678,6 +2690,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
}
return buf;
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 9aed207995..54a8e8e603 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -21,6 +21,7 @@
#include "access/reloptions.h"
#include "commands/progress.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "utils/datum.h"
#include "utils/lsyscache.h"
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index af6b27b213..424639fbb5 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,6 +22,7 @@
#include "common/int.h"
#include "common/pg_prng.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
@@ -2160,6 +2161,8 @@ spgdoinsert(Relation index, SpGistState *state,
spgChooseIn in;
spgChooseOut out;
+ pgstat_count_metadata_buffer(index);
+
/*
* spgAddNode and spgSplitTuple cases will loop back to here to
* complete the insertion operation. Just in case the choose
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 25893050c5..d0be58fb78 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -897,6 +897,8 @@ redirect:
SpGistInnerTuple innerTuple = (SpGistInnerTuple)
PageGetItem(page, PageGetItemId(page, offset));
+ pgstat_count_metadata_buffer(index);
+
if (innerTuple->tupstate != SPGIST_LIVE)
{
if (innerTuple->tupstate == SPGIST_REDIRECT)
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 95fea74e29..3e1b705a20 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -26,6 +26,7 @@
#include "commands/vacuum.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_coerce.h"
+#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/indexfsm.h"
#include "utils/catcache.h"
@@ -271,6 +272,7 @@ spgGetCache(Relation index)
SpGistMetaPageData *metadata;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
@@ -456,11 +458,13 @@ SpGistUpdateMetaPage(Relation index)
Buffer metabuffer;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
if (ConditionalLockBuffer(metabuffer))
{
Page metapage = BufferGetPage(metabuffer);
SpGistMetaPageData *metadata = SpGistPageGetMeta(metapage);
+ pgstat_count_metadata_buffer(index);
metadata->lastUsedPages = cache->lastUsedPages;
@@ -650,6 +654,8 @@ SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
return buffer;
}
}
+ else
+ pgstat_count_metadata_buffer(index);
/*
* fallback to allocation of new buffer
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index e5dbbe61b8..3e13252a53 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -766,6 +766,7 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
I.idx_blks_read AS idx_blks_read,
I.idx_blks_hit AS idx_blks_hit,
+ I.idx_metadata_blks AS idx_metadata_blks,
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
@@ -779,7 +780,9 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(indexrelid))::bigint
AS idx_blks_read,
sum(pg_stat_get_blocks_hit(indexrelid))::bigint
- AS idx_blks_hit
+ AS idx_blks_hit,
+ sum(pg_stat_get_idx_metadata_blocks(indexrelid))::bigint
+ AS idx_metadata_blks
FROM pg_index WHERE indrelid = C.oid ) I ON true
LEFT JOIN LATERAL (
SELECT sum(pg_stat_get_blocks_fetched(indexrelid) -
@@ -836,7 +839,8 @@ CREATE VIEW pg_statio_all_indexes AS
I.relname AS indexrelname,
pg_stat_get_blocks_fetched(I.oid) -
pg_stat_get_blocks_hit(I.oid) AS idx_blks_read,
- pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit
+ pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit,
+ pg_stat_get_idx_metadata_blocks(I.oid) AS idx_metadata_blks
FROM pg_class C JOIN
pg_index X ON C.oid = X.indrelid JOIN
pg_class I ON I.oid = X.indexrelid
@@ -1071,6 +1075,7 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_blocks_fetched(D.oid) -
pg_stat_get_db_blocks_hit(D.oid) AS blks_read,
pg_stat_get_db_blocks_hit(D.oid) AS blks_hit,
+ pg_stat_get_db_idx_metadata_blocks(D.oid) AS idx_metadata_blks,
pg_stat_get_db_tuples_returned(D.oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index b31f20d41b..2f4a065af9 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -443,6 +443,7 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
PGSTAT_ACCUM_DBCOUNT(xact_rollback);
PGSTAT_ACCUM_DBCOUNT(blocks_fetched);
PGSTAT_ACCUM_DBCOUNT(blocks_hit);
+ PGSTAT_ACCUM_DBCOUNT(idx_metadata_blocks);
PGSTAT_ACCUM_DBCOUNT(tuples_returned);
PGSTAT_ACCUM_DBCOUNT(tuples_fetched);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 28587e2916..3de47c0b76 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -880,6 +880,7 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
tabentry->blocks_fetched += lstats->counts.blocks_fetched;
tabentry->blocks_hit += lstats->counts.blocks_hit;
+ tabentry->idx_metadata_blocks += lstats->counts.idx_metadata_blocks;
/* Clamp live_tuples in case of negative delta_live_tuples */
tabentry->live_tuples = Max(tabentry->live_tuples, 0);
@@ -897,6 +898,7 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
dbentry->tuples_deleted += lstats->counts.tuples_deleted;
dbentry->blocks_fetched += lstats->counts.blocks_fetched;
dbentry->blocks_hit += lstats->counts.blocks_hit;
+ dbentry->idx_metadata_blocks += lstats->counts.idx_metadata_blocks;
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1c12ddbae4..1bc93e6fe1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -67,6 +67,9 @@ PG_STAT_GET_RELENTRY_INT64(blocks_fetched)
/* pg_stat_get_blocks_hit */
PG_STAT_GET_RELENTRY_INT64(blocks_hit)
+/* pg_stat_get_metadata_blocks */
+PG_STAT_GET_RELENTRY_INT64(idx_metadata_blocks)
+
/* pg_stat_get_dead_tuples */
PG_STAT_GET_RELENTRY_INT64(dead_tuples)
@@ -1034,6 +1037,9 @@ PG_STAT_GET_DBENTRY_INT64(blocks_fetched)
/* pg_stat_get_db_blocks_hit */
PG_STAT_GET_DBENTRY_INT64(blocks_hit)
+/* pg_stat_get_db_metadata_blocks */
+PG_STAT_GET_DBENTRY_INT64(idx_metadata_blocks)
+
/* pg_stat_get_db_conflict_bufferpin */
PG_STAT_GET_DBENTRY_INT64(conflict_bufferpin)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d4650947c6..a8b60d5ae6 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5580,6 +5580,10 @@
proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8888', descr => 'statistics: number of metadata blocks',
+ proname => 'pg_stat_get_idx_metadata_blocks', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_idx_metadata_blocks' },
{ oid => '2781', descr => 'statistics: last manual vacuum time for a table',
proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5792,6 +5796,10 @@
proname => 'pg_stat_get_db_tuples_inserted', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_tuples_inserted' },
+{ oid => '8892', descr => 'statistics: number of db metadata blocks',
+ proname => 'pg_stat_get_db_idx_metadata_blocks', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_idx_metadata_blocks' },
{ oid => '2761', descr => 'statistics: tuples updated in database',
proname => 'pg_stat_get_db_tuples_updated', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 378f2f2c2b..5870195f2a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -153,6 +153,8 @@ typedef struct PgStat_TableCounts
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+
+ PgStat_Counter idx_metadata_blocks;
} PgStat_TableCounts;
/* ----------
@@ -345,6 +347,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter xact_rollback;
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter idx_metadata_blocks;
PgStat_Counter tuples_returned;
PgStat_Counter tuples_fetched;
PgStat_Counter tuples_inserted;
@@ -439,6 +442,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter idx_metadata_blocks;
TimestampTz last_vacuum_time; /* user initiated vacuum */
PgStat_Counter vacuum_count;
@@ -711,6 +715,18 @@ extern void pgstat_report_analyze(Relation rel,
if (pgstat_should_count_relation(rel)) \
(rel)->pgstat_info->counts.blocks_hit++; \
} while (0)
+#define pgstat_count_metadata_buffer(rel) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ (rel)->pgstat_info->counts.idx_metadata_blocks++; \
+ } \
+ } while (0)
+#define pgstat_count_metadata_buffer_if(is_metadata, rel) \
+ do { \
+ if (pgstat_should_count_relation(rel) && (is_metadata)) { \
+ (rel)->pgstat_info->counts.idx_metadata_blocks++; \
+ } \
+ } while (0)
extern void pgstat_count_heap_insert(Relation rel, PgStat_Counter n);
extern void pgstat_count_heap_update(Relation rel, bool hot, bool newpage);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6cf828ca8d..16c2b39b81 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1868,6 +1868,7 @@ pg_stat_database| SELECT oid AS datid,
pg_stat_get_db_xact_rollback(oid) AS xact_rollback,
(pg_stat_get_db_blocks_fetched(oid) - pg_stat_get_db_blocks_hit(oid)) AS blks_read,
pg_stat_get_db_blocks_hit(oid) AS blks_hit,
+ pg_stat_get_db_idx_metadata_blocks(oid) AS idx_metadata_blks,
pg_stat_get_db_tuples_returned(oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(oid) AS tup_inserted,
@@ -2360,7 +2361,8 @@ pg_statio_all_indexes| SELECT c.oid AS relid,
c.relname,
i.relname AS indexrelname,
(pg_stat_get_blocks_fetched(i.oid) - pg_stat_get_blocks_hit(i.oid)) AS idx_blks_read,
- pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit
+ pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit,
+ pg_stat_get_idx_metadata_blocks(i.oid) AS idx_metadata_blks
FROM (((pg_class c
JOIN pg_index x ON ((c.oid = x.indrelid)))
JOIN pg_class i ON ((i.oid = x.indexrelid)))
@@ -2381,6 +2383,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
i.idx_blks_read,
i.idx_blks_hit,
+ i.idx_metadata_blks,
(pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read,
pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
x.idx_blks_read AS tidx_blks_read,
@@ -2389,7 +2392,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
- (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
+ (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit,
+ (sum(pg_stat_get_idx_metadata_blocks(pg_index.indexrelid)))::bigint AS idx_metadata_blks
FROM pg_index
WHERE (pg_index.indrelid = c.oid)) i ON (true))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2403,7 +2407,8 @@ pg_statio_sys_indexes| SELECT relid,
relname,
indexrelname,
idx_blks_read,
- idx_blks_hit
+ idx_blks_hit,
+ idx_metadata_blks
FROM pg_statio_all_indexes
WHERE ((schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (schemaname ~ '^pg_toast'::text));
pg_statio_sys_sequences| SELECT relid,
@@ -2420,6 +2425,7 @@ pg_statio_sys_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
@@ -2432,7 +2438,8 @@ pg_statio_user_indexes| SELECT relid,
relname,
indexrelname,
idx_blks_read,
- idx_blks_hit
+ idx_blks_hit,
+ idx_metadata_blks
FROM pg_statio_all_indexes
WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text));
pg_statio_user_sequences| SELECT relid,
@@ -2449,6 +2456,7 @@ pg_statio_user_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 776f1ad0e5..eabf251653 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1868,4 +1868,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
(1 row)
DROP TABLE table_fillfactor;
+-- b-tree indexes: test stats collection for metadata index blocks
+select count(*) from tenk2 where unique1 = '1504';
+ count
+-------
+ 1
+(1 row)
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT idx_metadata_blks < idx_blks_hit + idx_blks_read,
+ idx_metadata_blks > 0
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='tenk2_unique1';
+ ?column? | ?column?
+----------+----------
+ t | t
+(1 row)
+
+COMMIT;
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 232ab8db8f..ab9179c224 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -925,4 +925,21 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
DROP TABLE table_fillfactor;
+-- b-tree indexes: test stats collection for metadata index blocks
+select count(*) from tenk2 where unique1 = '1504';
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+
+SELECT idx_metadata_blks < idx_blks_hit + idx_blks_read,
+ idx_metadata_blks > 0
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='tenk2_unique1';
+
+COMMIT;
+
-- End of Stats Test
--
2.39.5 (Apple Git-154)
On 7/4/25 18:00, Mircea Cadariu wrote:
Just attaching v2 of the patch.
Hi Mircea,
Your patch applies cleanly and seems to work well.
IIUC, the index hit ratio should be computed with the following formula:
(idx_blks_hit - idx_metadata_blks) / (idx_blks_hit - idx_metadata_blks +
idx_blks_read)
because most of the index non-leaf pages should be in the cache. Right?
This should probably be documented somewhere?
Here is my testing:
# select tree_level, internal_pages, leaf_pages from
pgstatindex('pgbench_accounts_pkey');
tree_level | internal_pages | leaf_pages
------------+----------------+------------
2 | 20 | 5465
(1 row)
# SELECT DISTINCT pg_buffercache_evict(bufferid)
FROM pg_buffercache
WHERE relfilenode = pg_relation_filenode('pgbench_accounts_pkey');
pg_buffercache_evict
----------------------
(t,f)
(1 row)
# SELECT pg_stat_reset();
pg_stat_reset
---------------
(1 row)
# SELECT max(abalance) FROM pgbench_accounts WHERE aid = 100;
max
-----
0
(1 row)
# select idx_blks_read, idx_blks_hit, idx_metadata_blks from
pg_statio_all_indexes where indexrelname = 'pgbench_accounts_pkey';
idx_blks_read | idx_blks_hit | idx_metadata_blks
---------------+--------------+-------------------
3 | 0 | 2
(1 row)
--> 3 pages: the root of the tree, one internal page and one leaf
#
\q
fyhuel@framework:~$ psql bench
psql (19devel)
Type "help" for help.
# SELECT max(abalance) FROM pgbench_accounts WHERE aid = 100;
max
-----
0
(1 row)
primary sleaf bench [42323] # select idx_blks_read, idx_blks_hit,
idx_metadata_blks from pg_statio_all_indexes where indexrelname =
'pgbench_accounts_pkey';
idx_blks_read | idx_blks_hit | idx_metadata_blks
---------------+--------------+-------------------
4 | 3 | 5
--> 4 more pages: same as before, already in cache, plus the index meta
page, read outside shared buffers because we started a new session?
Hi Frédéric,
Thanks a lot for trying out my (first) patch! Much appreciated.
On 20/07/2025 21:54, Frédéric Yhuel wrote:
Your patch applies cleanly and seems to work well.
Cool!
because most of the index non-leaf pages should be in the cache. Right?
Yes indeed, it's an assumption in the implementation, that the non-leaf
pages will roughly always be in the cache.
This should probably be documented somewhere?
I'm still familiarising myself about what to document where, whether
things should be in the official docs or separate blog posts. In the
patch I only documented the new column next to the existing ones for now.
--> 3 pages: the root of the tree, one internal page and one leaf
Yes, this is correct.
primary sleaf bench [42323] # select idx_blks_read, idx_blks_hit,
idx_metadata_blks from pg_statio_all_indexes where indexrelname =
'pgbench_accounts_pkey';
idx_blks_read | idx_blks_hit | idx_metadata_blks
---------------+--------------+-------------------
4 | 3 | 5--> 4 more pages: same as before, already in cache, plus the index
meta page, read outside shared buffers because we started a new session?
Yes, that's my understanding too.
Thanks!
Kind regards,
Mircea Cadariu
Rebased and dusted off this patch.
Attachments:
v3-0001-Add-metadata-non-leaf-index-block-statistics-to-p.patchtext/plain; charset=UTF-8; name=v3-0001-Add-metadata-non-leaf-index-block-statistics-to-p.patchDownload
From 443dbc15033112f6ec18ef869c7476774600f635 Mon Sep 17 00:00:00 2001
From: Mircea Cadariu <cadariu.mircea@gmail.com>
Date: Thu, 20 Nov 2025 11:41:51 +0000
Subject: [PATCH v1] Add metadata (non-leaf) index block statistics to pg_stat
functions and system views.
This commit contains the changes for counting and exposing when index metadata block
reads (either hits or from disk) are issued from the index handling code.
---
doc/src/sgml/monitoring.sgml | 27 ++++++++++++++++++++
src/backend/access/brin/brin.c | 1 +
src/backend/access/brin/brin_revmap.c | 5 ++++
src/backend/access/gin/ginbtree.c | 7 +++++
src/backend/access/gin/ginfast.c | 3 +++
src/backend/access/gin/ginget.c | 5 +++-
src/backend/access/gin/ginutil.c | 3 +++
src/backend/access/gist/gist.c | 3 +++
src/backend/access/gist/gistbuild.c | 5 ++++
src/backend/access/gist/gistget.c | 2 ++
src/backend/access/hash/hashpage.c | 7 +++++
src/backend/access/nbtree/nbtinsert.c | 7 +++++
src/backend/access/nbtree/nbtpage.c | 12 +++++++++
src/backend/access/nbtree/nbtsearch.c | 13 ++++++++++
src/backend/access/nbtree/nbtutils.c | 1 +
src/backend/access/spgist/spgdoinsert.c | 3 +++
src/backend/access/spgist/spgscan.c | 2 ++
src/backend/access/spgist/spgutils.c | 5 ++++
src/backend/catalog/system_views.sql | 7 ++++-
src/backend/utils/activity/pgstat_database.c | 1 +
src/backend/utils/activity/pgstat_relation.c | 2 ++
src/backend/utils/adt/pgstatfuncs.c | 6 +++++
src/include/catalog/pg_proc.dat | 8 ++++++
src/include/pgstat.h | 16 ++++++++++++
src/test/regress/expected/rules.out | 10 +++++++-
src/test/regress/expected/stats.out | 27 ++++++++++++++++++++
src/test/regress/sql/stats.sql | 17 ++++++++++++
27 files changed, 202 insertions(+), 3 deletions(-)
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 436ef0e8bd..fc51ab9693 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3467,6 +3467,15 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks hit or read in this database
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>tup_returned</structfield> <type>bigint</type>
@@ -4451,6 +4460,15 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks hit or read from all indexes in this table
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>toast_blks_read</structfield> <type>bigint</type>
@@ -4596,6 +4614,15 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>idx_metadata_blks</structfield> <type>bigint</type>
+ </para>
+ <para>
+ Number of metadata (non-leaf) index disk blocks read or hit in this index
+ </para></entry>
+ </row>
+
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index cb3331921c..ee8a2315e3 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -1652,6 +1652,7 @@ brinGetStats(Relation index, BrinStatsData *stats)
BrinMetaPageData *metadata;
metabuffer = ReadBuffer(index, BRIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = (BrinMetaPageData *) PageGetContents(metapage);
diff --git a/src/backend/access/brin/brin_revmap.c b/src/backend/access/brin/brin_revmap.c
index 4e380ecc71..0172ad83d6 100644
--- a/src/backend/access/brin/brin_revmap.c
+++ b/src/backend/access/brin/brin_revmap.c
@@ -27,6 +27,7 @@
#include "access/brin_xlog.h"
#include "access/rmgr.h"
#include "access/xloginsert.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
@@ -75,6 +76,7 @@ brinRevmapInitialize(Relation idxrel, BlockNumber *pagesPerRange)
Page page;
meta = ReadBuffer(idxrel, BRIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(idxrel);
LockBuffer(meta, BUFFER_LOCK_SHARE);
page = BufferGetPage(meta);
metadata = (BrinMetaPageData *) PageGetContents(page);
@@ -232,6 +234,7 @@ brinGetTupleForHeapBlock(BrinRevmap *revmap, BlockNumber heapBlk,
Assert(mapBlk != InvalidBlockNumber);
revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ pgstat_count_metadata_buffer(idxRel);
}
LockBuffer(revmap->rm_currBuf, BUFFER_LOCK_SHARE);
@@ -486,6 +489,7 @@ revmap_get_buffer(BrinRevmap *revmap, BlockNumber heapBlk)
ReleaseBuffer(revmap->rm_currBuf);
revmap->rm_currBuf = ReadBuffer(revmap->rm_irel, mapBlk);
+ pgstat_count_metadata_buffer(revmap->rm_irel);
}
return revmap->rm_currBuf;
@@ -554,6 +558,7 @@ revmap_physical_extend(BrinRevmap *revmap)
if (mapBlk < nblocks)
{
buf = ReadBuffer(irel, mapBlk);
+ pgstat_count_metadata_buffer(irel);
LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
page = BufferGetPage(buf);
}
diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index 644d484ea5..0d8778f589 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -18,6 +18,7 @@
#include "access/ginxlog.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/predicate.h"
#include "utils/injection_point.h"
#include "utils/memutils.h"
@@ -104,6 +105,8 @@ ginFindLeafPage(GinBtree btree, bool searchMode,
page = BufferGetPage(stack->buffer);
+ pgstat_count_metadata_buffer_if(!GinPageIsLeaf(page), btree->index);
+
access = ginTraverseLock(stack->buffer, searchMode);
/*
@@ -191,6 +194,8 @@ ginStepRight(Buffer buffer, Relation index, int lockmode)
if (isLeaf != GinPageIsLeaf(page) || isData != GinPageIsData(page))
elog(ERROR, "right sibling of GIN page is of different type");
+ pgstat_count_metadata_buffer_if(!GinPageIsLeaf(page), index);
+
return nextbuffer;
}
@@ -254,6 +259,8 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
page = BufferGetPage(buffer);
if (GinPageIsLeaf(page))
elog(ERROR, "Lost path");
+ else
+ pgstat_count_metadata_buffer(btree->index);
if (GinPageIsIncompleteSplit(page))
{
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 33816f8551..7b5b872586 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -25,6 +25,7 @@
#include "catalog/pg_am.h"
#include "commands/vacuum.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "port/pg_bitutils.h"
#include "postmaster/autovacuum.h"
#include "storage/indexfsm.h"
@@ -240,6 +241,7 @@ ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
data.newRightlink = data.prevTail = InvalidBlockNumber;
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
metapage = BufferGetPage(metabuffer);
/*
@@ -828,6 +830,7 @@ ginInsertCleanup(GinState *ginstate, bool full_clean,
}
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index 0d4108d05a..c62e068a68 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -18,6 +18,7 @@
#include "access/relscan.h"
#include "common/pg_prng.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/predicate.h"
#include "utils/datum.h"
#include "utils/memutils.h"
@@ -1493,7 +1494,7 @@ scanGetCandidate(IndexScanDesc scan, pendingPosition *pos)
* Here we must prevent deletion of next page by insertcleanup
* process, which may be trying to obtain exclusive lock on
* current page. So, we lock next page before releasing the
- * current one
+ * current one.
*/
Buffer tmpbuf = ReadBuffer(scan->indexRelation, blkno);
@@ -1846,6 +1847,8 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
Page page;
BlockNumber blkno;
+ pgstat_count_metadata_buffer(scan->indexRelation);
+
*ntids = 0;
/*
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 78f7b7a249..9cc6e6d6c3 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -22,6 +22,7 @@
#include "catalog/pg_type.h"
#include "commands/progress.h"
#include "commands/vacuum.h"
+#include "pgstat.h"
#include "miscadmin.h"
#include "storage/indexfsm.h"
#include "utils/builtins.h"
@@ -632,6 +633,7 @@ ginGetStats(Relation index, GinStatsData *stats)
GinMetaPageData *metadata;
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, GIN_SHARE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
@@ -659,6 +661,7 @@ ginUpdateStats(Relation index, const GinStatsData *stats, bool is_build)
GinMetaPageData *metadata;
metabuffer = ReadBuffer(index, GIN_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, GIN_EXCLUSIVE);
metapage = BufferGetPage(metabuffer);
metadata = GinPageGetMeta(metapage);
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 3fb1a1285c..54986b4f44 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -21,6 +21,7 @@
#include "commands/vacuum.h"
#include "miscadmin.h"
#include "nodes/execnodes.h"
+#include "pgstat.h"
#include "storage/predicate.h"
#include "utils/fmgrprotos.h"
#include "utils/index_selfuncs.h"
@@ -696,6 +697,7 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
}
stack->page = BufferGetPage(stack->buffer);
+ pgstat_count_metadata_buffer_if(!GistPageIsLeaf(stack->page), state.r);
stack->lsn = xlocked ?
PageGetLSN(stack->page) : BufferGetLSNAtomic(stack->buffer);
Assert(!RelationNeedsWAL(state.r) || XLogRecPtrIsValid(stack->lsn));
@@ -1121,6 +1123,7 @@ gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
{
ptr->buffer = ReadBuffer(r, ptr->blkno);
ptr->page = BufferGetPage(ptr->buffer);
+ pgstat_count_metadata_buffer_if(!GistPageIsLeaf(ptr->page), r);
ptr = ptr->parent;
}
diff --git a/src/backend/access/gist/gistbuild.c b/src/backend/access/gist/gistbuild.c
index be0fd5b753..8e21751797 100644
--- a/src/backend/access/gist/gistbuild.c
+++ b/src/backend/access/gist/gistbuild.c
@@ -39,6 +39,7 @@
#include "access/tableam.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "nodes/execnodes.h"
#include "optimizer/optimizer.h"
#include "storage/bufmgr.h"
@@ -967,6 +968,7 @@ gistProcessItup(GISTBuildState *buildstate, IndexTuple itup,
*/
buffer = ReadBuffer(indexrel, blkno);
+ pgstat_count_metadata_buffer(indexrel);
LockBuffer(buffer, GIST_EXCLUSIVE);
page = BufferGetPage(buffer);
@@ -1248,6 +1250,7 @@ gistBufferingFindCorrectParent(GISTBuildState *buildstate,
buffer = ReadBuffer(buildstate->indexrel, parent);
page = BufferGetPage(buffer);
+ pgstat_count_metadata_buffer(buildstate->indexrel);
LockBuffer(buffer, GIST_EXCLUSIVE);
gistcheckpage(buildstate->indexrel, buffer);
maxoff = PageGetMaxOffsetNumber(page);
@@ -1457,6 +1460,8 @@ gistGetMaxLevel(Relation index)
break;
}
+ pgstat_count_metadata_buffer(index);
+
/*
* Pick the first downlink on the page, and follow it. It doesn't
* matter which downlink we choose, the tree has the same depth
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 9ba45acfff..57651cfd27 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -346,6 +346,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
gistcheckpage(scan->indexRelation, buffer);
page = BufferGetPage(buffer);
opaque = GistPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!GistPageIsLeaf((Page) page),
+ scan->indexRelation);
/*
* Check if we need to follow the rightlink. We need to follow it if the
diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index b8e5bd005e..2e94856929 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -32,6 +32,7 @@
#include "access/hash_xlog.h"
#include "access/xloginsert.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "port/pg_bitutils.h"
#include "storage/predicate.h"
#include "storage/smgr.h"
@@ -76,6 +77,8 @@ _hash_getbuf(Relation rel, BlockNumber blkno, int access, int flags)
buf = ReadBuffer(rel, blkno);
+ pgstat_count_metadata_buffer_if(flags == LH_META_PAGE || flags == LH_BITMAP_PAGE, rel);
+
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
@@ -102,6 +105,8 @@ _hash_getbuf_with_condlock_cleanup(Relation rel, BlockNumber blkno, int flags)
buf = ReadBuffer(rel, blkno);
+ pgstat_count_metadata_buffer_if(flags == LH_META_PAGE || flags == LH_BITMAP_PAGE, rel);
+
if (!ConditionalLockBufferForCleanup(buf))
{
ReleaseBuffer(buf);
@@ -247,6 +252,8 @@ _hash_getbuf_with_strategy(Relation rel, BlockNumber blkno,
buf = ReadBufferExtended(rel, MAIN_FORKNUM, blkno, RBM_NORMAL, bstrategy);
+ pgstat_count_metadata_buffer_if(flags == LH_META_PAGE || flags == LH_BITMAP_PAGE, rel);
+
if (access != HASH_NOLOCK)
LockBuffer(buf, access);
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 7c113c007e..48c9dcf9a6 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -22,6 +22,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "common/pg_prng.h"
+#include "pgstat.h"
#include "lib/qunique.h"
#include "miscadmin.h"
#include "storage/lmgr.h"
@@ -1261,6 +1262,7 @@ _bt_insertonpg(Relation rel,
Assert(BufferIsValid(cbuf));
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -2263,6 +2265,8 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
rpage = BufferGetPage(rbuf);
rpageop = BTPageGetOpaque(rpage);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(rpageop), rel);
+
/* Could this be a root split? */
if (!stack)
{
@@ -2272,6 +2276,7 @@ _bt_finish_split(Relation rel, Relation heaprel, Buffer lbuf, BTStack stack)
/* acquire lock on the metapage */
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -2340,6 +2345,7 @@ _bt_getstackbuf(Relation rel, Relation heaprel, BTStack stack, BlockNumber child
buf = _bt_getbuf(rel, blkno, BT_WRITE);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer(rel);
Assert(heaprel != NULL);
if (P_INCOMPLETE_SPLIT(opaque))
@@ -2480,6 +2486,7 @@ _bt_newlevel(Relation rel, Relation heaprel, Buffer lbuf, Buffer rbuf)
/* acquire lock on the metapage */
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
diff --git a/src/backend/access/nbtree/nbtpage.c b/src/backend/access/nbtree/nbtpage.c
index 30b43a4dd1..5ccb0a5501 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -30,6 +30,7 @@
#include "access/xloginsert.h"
#include "common/int.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/indexfsm.h"
#include "storage/predicate.h"
#include "storage/procarray.h"
@@ -190,6 +191,7 @@ _bt_vacuum_needs_cleanup(Relation rel)
* Note that we deliberately avoid using cached version of metapage here.
*/
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
btm_version = metad->btm_version;
@@ -254,6 +256,7 @@ _bt_set_cleanup_info(Relation rel, BlockNumber num_delpages)
* to be consistent.
*/
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
@@ -374,6 +377,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
rootlevel = metad->btm_fastlevel;
rootbuf = _bt_getbuf(rel, rootblkno, BT_READ);
+ pgstat_count_metadata_buffer_if(rootlevel > 0, rel);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -400,6 +404,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
}
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metad = _bt_getmeta(rel, metabuf);
/* if no root page initialized yet, do it */
@@ -536,6 +541,7 @@ _bt_getroot(Relation rel, Relation heaprel, int access)
for (;;)
{
rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ pgstat_count_metadata_buffer_if(rootlevel > 0, rel);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -600,6 +606,7 @@ _bt_gettrueroot(Relation rel)
rel->rd_amcache = NULL;
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metaopaque = BTPageGetOpaque(metapg);
metad = BTPageGetMeta(metapg);
@@ -639,6 +646,7 @@ _bt_gettrueroot(Relation rel)
for (;;)
{
rootbuf = _bt_relandgetbuf(rel, rootbuf, rootblkno, BT_READ);
+ pgstat_count_metadata_buffer_if(rootlevel > 0, rel);
rootpage = BufferGetPage(rootbuf);
rootopaque = BTPageGetOpaque(rootpage);
@@ -681,6 +689,7 @@ _bt_getrootheight(Relation rel)
Buffer metabuf;
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -745,6 +754,7 @@ _bt_metaversion(Relation rel, bool *heapkeyspace, bool *allequalimage)
Buffer metabuf;
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_READ);
+ pgstat_count_metadata_buffer(rel);
metad = _bt_getmeta(rel, metabuf);
/*
@@ -2372,6 +2382,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
/* Fetch the block number of the target's left sibling */
buf = _bt_getbuf(rel, target, BT_READ);
+ pgstat_count_metadata_buffer(rel);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
leftsib = opaque->btpo_prev;
@@ -2567,6 +2578,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
{
/* rightsib will be the only one left on the level */
metabuf = _bt_getbuf(rel, BTREE_METAPAGE, BT_WRITE);
+ pgstat_count_metadata_buffer(rel);
metapg = BufferGetPage(metabuf);
metad = BTPageGetMeta(metapg);
diff --git a/src/backend/access/nbtree/nbtsearch.c b/src/backend/access/nbtree/nbtsearch.c
index 0605356ec9..00705a8a82 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -185,6 +185,8 @@ _bt_search(Relation rel, Relation heaprel, BTScanInsert key, Buffer *bufP,
/* drop the read lock on the page, then acquire one on its child */
*bufP = _bt_relandgetbuf(rel, *bufP, child, page_access);
+ pgstat_count_metadata_buffer_if(opaque->btpo_level != 1, rel);
+
/* okay, all set to move down a level */
stack_in = new_stack;
}
@@ -305,6 +307,9 @@ _bt_moveright(Relation rel,
/* re-acquire the lock in the right mode, and re-check */
buf = _bt_getbuf(rel, blkno, access);
+
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
+
continue;
}
@@ -312,6 +317,7 @@ _bt_moveright(Relation rel,
{
/* step right one page */
buf = _bt_relandgetbuf(rel, buf, opaque->btpo_next, access);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
continue;
}
else
@@ -2509,6 +2515,7 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_getbuf(rel, *blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
/*
* If this isn't the page we want, walk right till we find what we
@@ -2536,6 +2543,7 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_relandgetbuf(rel, buf, *blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
}
/*
@@ -2546,6 +2554,8 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
+
if (P_ISDELETED(opaque))
{
/*
@@ -2563,6 +2573,7 @@ _bt_lock_and_validate_left(Relation rel, BlockNumber *blkno,
buf = _bt_relandgetbuf(rel, buf, lastcurrblkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
if (!P_ISDELETED(opaque))
break;
}
@@ -2652,6 +2663,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
}
/* Done? */
@@ -2675,6 +2687,7 @@ _bt_get_endpoint(Relation rel, uint32 level, bool rightmost)
buf = _bt_relandgetbuf(rel, buf, blkno, BT_READ);
page = BufferGetPage(buf);
opaque = BTPageGetOpaque(page);
+ pgstat_count_metadata_buffer_if(!P_ISLEAF(opaque), rel);
}
return buf;
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index ab0f98b028..912edf03dd 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -22,6 +22,7 @@
#include "access/relscan.h"
#include "commands/progress.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "utils/datum.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 4eadb51877..f777b3dec5 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,6 +22,7 @@
#include "common/int.h"
#include "common/pg_prng.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
@@ -2156,6 +2157,8 @@ spgdoinsert(Relation index, SpGistState *state,
spgChooseIn in;
spgChooseOut out;
+ pgstat_count_metadata_buffer(index);
+
/*
* spgAddNode and spgSplitTuple cases will loop back to here to
* complete the insertion operation. Just in case the choose
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 25893050c5..d0be58fb78 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -897,6 +897,8 @@ redirect:
SpGistInnerTuple innerTuple = (SpGistInnerTuple)
PageGetItem(page, PageGetItemId(page, offset));
+ pgstat_count_metadata_buffer(index);
+
if (innerTuple->tupstate != SPGIST_LIVE)
{
if (innerTuple->tupstate == SPGIST_REDIRECT)
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 87c31da71a..267c3f25b7 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -26,6 +26,7 @@
#include "commands/vacuum.h"
#include "nodes/nodeFuncs.h"
#include "parser/parse_coerce.h"
+#include "pgstat.h"
#include "storage/bufmgr.h"
#include "storage/indexfsm.h"
#include "utils/catcache.h"
@@ -271,6 +272,7 @@ spgGetCache(Relation index)
SpGistMetaPageData *metadata;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
metadata = SpGistPageGetMeta(BufferGetPage(metabuffer));
@@ -456,6 +458,7 @@ SpGistUpdateMetaPage(Relation index)
Buffer metabuffer;
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
+ pgstat_count_metadata_buffer(index);
if (ConditionalLockBuffer(metabuffer))
{
@@ -650,6 +653,8 @@ SpGistGetBuffer(Relation index, int flags, int needSpace, bool *isNew)
return buffer;
}
}
+ else if (!SpGistPageIsLeaf(page))
+ pgstat_count_metadata_buffer(index);
/*
* fallback to allocation of new buffer
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 95ad29a64b..d140ec46fe 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -785,6 +785,7 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
I.idx_blks_read AS idx_blks_read,
I.idx_blks_hit AS idx_blks_hit,
+ I.idx_metadata_blks AS idx_metadata_blks,
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
@@ -799,7 +800,9 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_hit(indexrelid))::bigint
AS idx_blks_read,
sum(pg_stat_get_blocks_hit(indexrelid))::bigint
- AS idx_blks_hit
+ AS idx_blks_hit,
+ sum(pg_stat_get_idx_metadata_blocks(indexrelid))::bigint
+ AS idx_metadata_blks
FROM pg_index WHERE indrelid = C.oid ) I ON true
LEFT JOIN LATERAL (
SELECT sum(pg_stat_get_blocks_fetched(indexrelid) -
@@ -858,6 +861,7 @@ CREATE VIEW pg_statio_all_indexes AS
pg_stat_get_blocks_fetched(I.oid) -
pg_stat_get_blocks_hit(I.oid) AS idx_blks_read,
pg_stat_get_blocks_hit(I.oid) AS idx_blks_hit,
+ pg_stat_get_idx_metadata_blocks(I.oid) AS idx_metadata_blks,
pg_stat_get_stat_reset_time(I.oid) AS stats_reset
FROM pg_class C JOIN
pg_index X ON C.oid = X.indrelid JOIN
@@ -1094,6 +1098,7 @@ CREATE VIEW pg_stat_database AS
pg_stat_get_db_blocks_fetched(D.oid) -
pg_stat_get_db_blocks_hit(D.oid) AS blks_read,
pg_stat_get_db_blocks_hit(D.oid) AS blks_hit,
+ pg_stat_get_db_idx_metadata_blocks(D.oid) AS idx_metadata_blks,
pg_stat_get_db_tuples_returned(D.oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(D.oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(D.oid) AS tup_inserted,
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index b31f20d41b..2f4a065af9 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -443,6 +443,7 @@ pgstat_database_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
PGSTAT_ACCUM_DBCOUNT(xact_rollback);
PGSTAT_ACCUM_DBCOUNT(blocks_fetched);
PGSTAT_ACCUM_DBCOUNT(blocks_hit);
+ PGSTAT_ACCUM_DBCOUNT(idx_metadata_blocks);
PGSTAT_ACCUM_DBCOUNT(tuples_returned);
PGSTAT_ACCUM_DBCOUNT(tuples_fetched);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 1de477cbee..48b2b28e62 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -880,6 +880,7 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
tabentry->blocks_fetched += lstats->counts.blocks_fetched;
tabentry->blocks_hit += lstats->counts.blocks_hit;
+ tabentry->idx_metadata_blocks += lstats->counts.idx_metadata_blocks;
/* Clamp live_tuples in case of negative delta_live_tuples */
tabentry->live_tuples = Max(tabentry->live_tuples, 0);
@@ -897,6 +898,7 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
dbentry->tuples_deleted += lstats->counts.tuples_deleted;
dbentry->blocks_fetched += lstats->counts.blocks_fetched;
dbentry->blocks_hit += lstats->counts.blocks_hit;
+ dbentry->idx_metadata_blocks += lstats->counts.idx_metadata_blocks;
return true;
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3d98d064a9..84e3e5bc45 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -67,6 +67,9 @@ PG_STAT_GET_RELENTRY_INT64(blocks_fetched)
/* pg_stat_get_blocks_hit */
PG_STAT_GET_RELENTRY_INT64(blocks_hit)
+/* pg_stat_get_metadata_blocks */
+PG_STAT_GET_RELENTRY_INT64(idx_metadata_blocks)
+
/* pg_stat_get_dead_tuples */
PG_STAT_GET_RELENTRY_INT64(dead_tuples)
@@ -1055,6 +1058,9 @@ PG_STAT_GET_DBENTRY_INT64(blocks_fetched)
/* pg_stat_get_db_blocks_hit */
PG_STAT_GET_DBENTRY_INT64(blocks_hit)
+/* pg_stat_get_db_metadata_blocks */
+PG_STAT_GET_DBENTRY_INT64(idx_metadata_blocks)
+
/* pg_stat_get_db_conflict_bufferpin */
PG_STAT_GET_DBENTRY_INT64(conflict_bufferpin)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index aaadfd8c74..abdd1e421c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5596,6 +5596,10 @@
proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8888', descr => 'statistics: number of metadata blocks',
+ proname => 'pg_stat_get_idx_metadata_blocks', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_idx_metadata_blocks' },
{ oid => '2781', descr => 'statistics: last manual vacuum time for a table',
proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5808,6 +5812,10 @@
proname => 'pg_stat_get_db_tuples_inserted', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
prosrc => 'pg_stat_get_db_tuples_inserted' },
+{ oid => '8892', descr => 'statistics: number of db metadata blocks',
+ proname => 'pg_stat_get_db_idx_metadata_blocks', provolatile => 's',
+ proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+ prosrc => 'pg_stat_get_db_idx_metadata_blocks' },
{ oid => '2761', descr => 'statistics: tuples updated in database',
proname => 'pg_stat_get_db_tuples_updated', provolatile => 's',
proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a68e725259..db40efedc8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -156,6 +156,8 @@ typedef struct PgStat_TableCounts
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+
+ PgStat_Counter idx_metadata_blocks;
} PgStat_TableCounts;
/* ----------
@@ -348,6 +350,7 @@ typedef struct PgStat_StatDBEntry
PgStat_Counter xact_rollback;
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter idx_metadata_blocks;
PgStat_Counter tuples_returned;
PgStat_Counter tuples_fetched;
PgStat_Counter tuples_inserted;
@@ -445,6 +448,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter blocks_fetched;
PgStat_Counter blocks_hit;
+ PgStat_Counter idx_metadata_blocks;
TimestampTz last_vacuum_time; /* user initiated vacuum */
PgStat_Counter vacuum_count;
@@ -720,6 +724,18 @@ extern void pgstat_report_analyze(Relation rel,
if (pgstat_should_count_relation(rel)) \
(rel)->pgstat_info->counts.blocks_hit++; \
} while (0)
+#define pgstat_count_metadata_buffer(rel) \
+ do { \
+ if (pgstat_should_count_relation(rel)) { \
+ (rel)->pgstat_info->counts.idx_metadata_blocks++; \
+ } \
+ } while (0)
+#define pgstat_count_metadata_buffer_if(is_metadata, rel) \
+ do { \
+ if (pgstat_should_count_relation(rel) && (is_metadata)) { \
+ (rel)->pgstat_info->counts.idx_metadata_blocks++; \
+ } \
+ } while (0)
extern void pgstat_count_heap_insert(Relation rel, PgStat_Counter n);
extern void pgstat_count_heap_update(Relation rel, bool hot, bool newpage);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 372a2188c2..37cd0969a6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1882,6 +1882,7 @@ pg_stat_database| SELECT oid AS datid,
pg_stat_get_db_xact_rollback(oid) AS xact_rollback,
(pg_stat_get_db_blocks_fetched(oid) - pg_stat_get_db_blocks_hit(oid)) AS blks_read,
pg_stat_get_db_blocks_hit(oid) AS blks_hit,
+ pg_stat_get_db_idx_metadata_blocks(oid) AS idx_metadata_blks,
pg_stat_get_db_tuples_returned(oid) AS tup_returned,
pg_stat_get_db_tuples_fetched(oid) AS tup_fetched,
pg_stat_get_db_tuples_inserted(oid) AS tup_inserted,
@@ -2389,6 +2390,7 @@ pg_statio_all_indexes| SELECT c.oid AS relid,
i.relname AS indexrelname,
(pg_stat_get_blocks_fetched(i.oid) - pg_stat_get_blocks_hit(i.oid)) AS idx_blks_read,
pg_stat_get_blocks_hit(i.oid) AS idx_blks_hit,
+ pg_stat_get_idx_metadata_blocks(i.oid) AS idx_metadata_blks,
pg_stat_get_stat_reset_time(i.oid) AS stats_reset
FROM (((pg_class c
JOIN pg_index x ON ((c.oid = x.indrelid)))
@@ -2410,6 +2412,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
i.idx_blks_read,
i.idx_blks_hit,
+ i.idx_metadata_blks,
(pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read,
pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
x.idx_blks_read AS tidx_blks_read,
@@ -2419,7 +2422,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
- (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
+ (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit,
+ (sum(pg_stat_get_idx_metadata_blocks(pg_index.indexrelid)))::bigint AS idx_metadata_blks
FROM pg_index
WHERE (pg_index.indrelid = c.oid)) i ON (true))
LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2434,6 +2438,7 @@ pg_statio_sys_indexes| SELECT relid,
indexrelname,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks,
stats_reset
FROM pg_statio_all_indexes
WHERE ((schemaname = ANY (ARRAY['pg_catalog'::name, 'information_schema'::name])) OR (schemaname ~ '^pg_toast'::text));
@@ -2451,6 +2456,7 @@ pg_statio_sys_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
@@ -2465,6 +2471,7 @@ pg_statio_user_indexes| SELECT relid,
indexrelname,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks,
stats_reset
FROM pg_statio_all_indexes
WHERE ((schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (schemaname !~ '^pg_toast'::text));
@@ -2482,6 +2489,7 @@ pg_statio_user_tables| SELECT relid,
heap_blks_hit,
idx_blks_read,
idx_blks_hit,
+ idx_metadata_blks,
toast_blks_read,
toast_blks_hit,
tidx_blks_read,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 67e1860e98..c78ea27c04 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1910,4 +1910,31 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
(1 row)
DROP TABLE table_fillfactor;
+-- b-tree indexes: test stats collection for metadata index blocks
+select count(*) from tenk2 where unique1 = '1504';
+ count
+-------
+ 1
+(1 row)
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT idx_metadata_blks < idx_blks_hit + idx_blks_read,
+ idx_metadata_blks > 0
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='tenk2_unique1';
+ ?column? | ?column?
+----------+----------
+ t | t
+(1 row)
+
+COMMIT;
-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 8768e0f27f..3c61cab7cc 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -944,4 +944,21 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
DROP TABLE table_fillfactor;
+-- b-tree indexes: test stats collection for metadata index blocks
+select count(*) from tenk2 where unique1 = '1504';
+
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+
+SELECT idx_metadata_blks < idx_blks_hit + idx_blks_read,
+ idx_metadata_blks > 0
+ FROM pg_statio_all_indexes
+ WHERE indexrelname='tenk2_unique1';
+
+COMMIT;
+
-- End of Stats Test
--
2.39.5 (Apple Git-154)
Hi Mircea,
Rebased and dusted off this patch.
Thanks for the patch. Here are my two cents.
IMO it would be helpful if you could come up with a few more practical
use cases. This change is going to affect pretty much everyone. If
only a few users will benefit from it once in several years, the value
of the patch is arguably low. As an example, can you think of how the
new counters can be used for debugging, checking index integrity,
writing new access methods or perhaps writing property-based tests?
Just several examples that came to my mind first.
Also I'm a bit concerned about the performance impact. It's probably
next to nothing, but if you could measure it on a relatively large
amount of data that would be great. Note that it's not uncommon to
have dozens of different indexes for a single table.
--
Best regards,
Aleksander Alekseev
Hi, Aleksander!
On 20/11/2025 14:56, Aleksander Alekseev wrote:
Hi Mircea,
Rebased and dusted off this patch.
Thanks for the patch. Here are my two cents.
IMO it would be helpful if you could come up with a few more practical
use cases. This change is going to affect pretty much everyone. If
only a few users will benefit from it once in several years, the value
of the patch is arguably low. As an example, can you think of how the
new counters can be used for debugging, checking index integrity,
writing new access methods or perhaps writing property-based tests?
Just several examples that came to my mind first.Also I'm a bit concerned about the performance impact. It's probably
next to nothing, but if you could measure it on a relatively large
amount of data that would be great. Note that it's not uncommon to
have dozens of different indexes for a single table.
Thanks! It hasn't caught on, might just stay an educational exercise on
my side on patch writing and we move on. Thanks for thinking out loud,
your ideas lead me to this: when working on improving the performance of
index traversals, it could be a direct way to show improvement and even
write regression tests: lower amount of internal pages are read than
before. But, one could also just use the existing stats and still be
able to track this..
For the performance checks, indeed, I'm using this script, I observe
no meaningful difference on my laptop:
https://gist.github.com/mcadariu/fc4a6d4eccd56b4447d1d9d05f9b5d79.
--
Thanks,
Mircea Cadariu
Hi Aleksander,
On 20/11/2025 16:56, Aleksander Alekseev wrote:
Thanks for the patch. Here are my two cents.
This is a follow-up on my earlier answers to your questions. Below is a
motivating example and a performance comparison with HEAD.
For database performance we want to ensure our working set fits in
memory (shared buffers and OS page cache).
Index leaf-page hit ratios allows a DBA to detect when actual index
lookups hit disk, which the overall ratio can mask, because the internal
pages are typically cached.
To see this in action, we can do the following. After applying the patch:
CREATE TABLE test_data_large (
id SERIAL PRIMARY KEY,
search_key INTEGER,
data_col TEXT
);
INSERT INTO test_data_large (search_key, data_col)
SELECT i, 'Data-' || i
FROM generate_series(1, 3200000) i;
CREATE INDEX idx_search_key_large ON test_data_large(search_key);
ANALYZE test_data_large;
SELECT pg_stat_reset();
This is our workload we'll run and collect stats for:
DO $$
BEGIN
FOR i IN 1..20000 LOOP
PERFORM 1
FROM test_data_large
WHERE search_key = ((i * 160) % 3200000)
LIMIT 1;
END LOOP;
END $$;
SELECT pg_stat_force_next_flush();
Now we can inspect the stats:
SELECT
ROUND(100.0 * idx_blks_hit / (idx_blks_read + idx_blks_hit), 2) ||
'%' AS "Overall hit ratio",
ROUND(100.0 * (idx_blks_hit - idx_metadata_blks) / ((idx_blks_read
+ idx_blks_hit) - idx_metadata_blks), 2) || '%' AS "Leaf hit ratio"
FROM pg_statio_all_indexes
WHERE indexrelname = 'idx_search_key_large';
This is the result:
Cache hit ratio | Leaf hit ratio
-------------+-----------
85.37% | 56.11%
(1 row)
The overall cache hit ratio looks healthy at ~85%, but the leaf-page hit
ratio is much lower at ~56%, indicating that a large share of index leaf
blocks are actually being read from disk. This suggests that internal
index pages remain cached (as expected, B-tree hierarchy) while the
working set of leaf pages does not fit in memory, leading to more disk
I/O during actual lookups.
Now showing the performance comparison of running this script [1]https://gist.github.com/mcadariu/fc4a6d4eccd56b4447d1d9d05f9b5d79 on my
laptop on HEAD and patch.
HEAD
====
Point Queries
-----------
+------+--------+
| Run | TPS |
+------+--------+
| 1 | 30098 |
| 2 | 30164 |
| 3 | 29903 |
| 4 | 30024 |
| 5 | 29898 |
| 6 | 30023 |
| 7 | 29952 |
| 8 | 29413 |
| 9 | 30062 |
| 10 | 29821 |
+------+--------+
Median: 29988
Range Scans
----------
+------+------+
| Run | TPS |
+------+------+
| 1 | 586 |
| 2 | 584 |
| 3 | 584 |
| 4 | 584 |
| 5 | 584 |
| 6 | 587 |
| 7 | 578 |
| 8 | 562 |
| 9 | 583 |
| 10 | 586 |
+------+------+
Median: 584
Mixed Load
----------
+------+--------+
| Run | TPS |
+------+--------+
| 1 | 16446 |
| 2 | 15842 |
| 3 | 16701 |
| 4 | 16293 |
| 5 | 16633 |
| 6 | 16292 |
| 7 | 16753 |
| 8 | 17047 |
| 9 | 17094 |
| 10 | 17078 |
+------+--------+
Median: 16667
Patch
===
Point Queries
-----------
+------+--------+
| Run | TPS |
+------+--------+
| 1 | 30335 |
| 2 | 30448 |
| 3 | 30372 |
| 4 | 30447 |
| 5 | 30478 |
| 6 | 30482 |
| 7 | 30428 |
| 8 | 30443 |
| 9 | 30433 |
| 10 | 30478 |
+------+--------+
Median: 30445
Range Scans
----------
+------+------+
| Run | TPS |
+------+------+
| 1 | 578 |
| 2 | 586 |
| 3 | 585 |
| 4 | 586 |
| 5 | 587 |
| 6 | 585 |
| 7 | 586 |
| 8 | 586 |
| 9 | 586 |
| 10 | 586 |
+------+------+
Median: 586
Mixed Load
---------
+------+--------+
| Run | TPS |
+------+--------+
| 1 | 17002 |
| 2 | 17078 |
| 3 | 17042 |
| 4 | 17046 |
| 5 | 17007 |
| 6 | 17023 |
| 7 | 17056 |
| 8 | 17071 |
| 9 | 17084 |
| 10 | 17068 |
+------+--------+
Median: 17051
HEAD vs Patch Summary
===============
+----------------------+-----------+-----------+
| Test | HEAD TPS | Patch TPS |
+----------------------+-----------+-----------+
| Point Queries Median | 29988 | 30445 |
| Range Scans Median | 584 | 586 |
| Mixed Load Median | 16667 | 17051 |
+----------------------+-----------+-----------+
Any feedback appreciated on the performance methodology/patch as a whole
welcome.
[1]: https://gist.github.com/mcadariu/fc4a6d4eccd56b4447d1d9d05f9b5d79
--
Thanks,
Mircea Cadariu