relfilenode statistics

Started by Bertrand Drouvotover 1 year ago49 messages
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com
1 attachment(s)

Hi hackers,

Please find attached a POC patch to implement $SUBJECT.

Adding relfilenode statistics has been proposed in [1]/messages/by-id/20231113204439.r4lmys73tessqmak@awork3.anarazel.de. The idea is to allow
tracking dirtied blocks, written blocks,... on a per relation basis.

The attached patch is not in a fully "polished" state yet: there is more places
we should add relfilenode counters, create more APIS to retrieve the relfilenode
stats....

But I think that it is in a state that can be used to discuss the approach it
is implementing (so that we can agree or not on it) before moving forward.

The approach that is implemented in this patch is the following:

- A new PGSTAT_KIND_RELFILENODE is added
- A new attribute (aka relfile) has been added to PgStat_HashKey so that we
can record (dboid, spcOid and relfile) to identify a relfilenode entry
- pgstat_create_transactional() is used in RelationCreateStorage()
- pgstat_drop_transactional() is used in RelationDropStorage()
- RelationPreserveStorage() will remove the entry from the list of dropped stats

The current approach to deal with table rewrite is to:

- copy the relfilenode stats in table_relation_set_new_filelocator() from
the relfilenode stats entry to the shared table stats entry
- in the pg_statio_all_tables view: add the table stats entry (that contains
"previous" relfilenode stats (due to the above) that were linked to this relation
) to the current relfilenode stats linked to the relation

An example is done in the attached patch for the new heap_blks_written field
in pg_statio_all_tables. Outcome is:

"
postgres=# create table bdt (a int);
CREATE TABLE
postgres=# select heap_blks_written from pg_statio_all_tables where relname = 'bdt';
heap_blks_written
-------------------
0
(1 row)

postgres=# insert into bdt select generate_series(1,10000);
INSERT 0 10000
postgres=# select heap_blks_written from pg_statio_all_tables where relname = 'bdt';
heap_blks_written
-------------------
0
(1 row)

postgres=# checkpoint;
CHECKPOINT
postgres=# select heap_blks_written from pg_statio_all_tables where relname = 'bdt';
heap_blks_written
-------------------
45
(1 row)

postgres=# truncate table bdt;
TRUNCATE TABLE
postgres=# select heap_blks_written from pg_statio_all_tables where relname = 'bdt';
heap_blks_written
-------------------
45
(1 row)

postgres=# insert into bdt select generate_series(1,10000);
INSERT 0 10000
postgres=# select heap_blks_written from pg_statio_all_tables where relname = 'bdt';
heap_blks_written
-------------------
45
(1 row)

postgres=# checkpoint;
CHECKPOINT
postgres=# select heap_blks_written from pg_statio_all_tables where relname = 'bdt';
heap_blks_written
-------------------
90
(1 row)
"

Some remarks:

- My first attempt has been to call the pgstat_create_transactional() and
pgstat_drop_transactional() at the same places it is done for the relations but
that did not work well (mainly due to corner cases in case of rewrite).

- Please don't take care of the pgstat_count_buffer_read() and
pgstat_count_buffer_hit() calls in pgstat_report_relfilenode_buffer_read()
and pgstat_report_relfilenode_buffer_hit(). Those stats will follow the same
flow as the one done and explained above for the new heap_blks_written one (
should we agree on it).

Looking forward to your comments, feedback.

Regards,

[1]: /messages/by-id/20231113204439.r4lmys73tessqmak@awork3.anarazel.de

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Provide-relfilenode-statistics.patchtext/x-diff; charset=us-asciiDownload
From e102c9d15c08c638879ece26008faee58cf4a07e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Nov 2023 02:30:01 +0000
Subject: [PATCH v1] Provide relfilenode statistics

---
 src/backend/access/rmgrdesc/xactdesc.c        |   5 +-
 src/backend/catalog/storage.c                 |   8 ++
 src/backend/catalog/system_functions.sql      |   2 +-
 src/backend/catalog/system_views.sql          |   5 +-
 src/backend/postmaster/checkpointer.c         |   5 +
 src/backend/storage/buffer/bufmgr.c           |   6 +-
 src/backend/storage/smgr/md.c                 |   7 ++
 src/backend/utils/activity/pgstat.c           |  39 ++++--
 src/backend/utils/activity/pgstat_database.c  |  12 +-
 src/backend/utils/activity/pgstat_function.c  |  13 +-
 src/backend/utils/activity/pgstat_relation.c  | 112 ++++++++++++++++--
 src/backend/utils/activity/pgstat_replslot.c  |  13 +-
 src/backend/utils/activity/pgstat_shmem.c     |  19 ++-
 .../utils/activity/pgstat_subscription.c      |  12 +-
 src/backend/utils/activity/pgstat_xact.c      |  60 +++++++---
 src/backend/utils/adt/pgstatfuncs.c           |  34 +++++-
 src/include/access/tableam.h                  |  19 +++
 src/include/access/xact.h                     |   1 +
 src/include/catalog/pg_proc.dat               |  14 ++-
 src/include/pgstat.h                          |  19 ++-
 src/include/utils/pgstat_internal.h           |  34 ++++--
 src/test/recovery/t/029_stats_restart.pl      |  40 +++----
 .../recovery/t/030_stats_cleanup_replica.pl   |   6 +-
 src/test/regress/expected/rules.out           |  12 +-
 src/test/regress/expected/stats.out           |  30 ++---
 src/test/regress/sql/stats.sql                |  30 ++---
 src/test/subscription/t/026_stats.pl          |   4 +-
 src/tools/pgindent/typedefs.list              |   1 +
 28 files changed, 415 insertions(+), 147 deletions(-)
   4.6% src/backend/catalog/
  47.8% src/backend/utils/activity/
   6.5% src/backend/utils/adt/
   3.7% src/backend/
   3.3% src/include/access/
   3.3% src/include/catalog/
   6.2% src/include/utils/
   3.3% src/include/
  12.1% src/test/recovery/t/
   5.5% src/test/regress/expected/
   3.0% src/test/

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index dccca201e0..c02b079645 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -319,10 +319,11 @@ xact_desc_stats(StringInfo buf, const char *label,
 		appendStringInfo(buf, "; %sdropped stats:", label);
 		for (i = 0; i < ndropped; i++)
 		{
-			appendStringInfo(buf, " %d/%u/%u",
+			appendStringInfo(buf, " %d/%u/%u/%u",
 							 dropped_stats[i].kind,
 							 dropped_stats[i].dboid,
-							 dropped_stats[i].objoid);
+							 dropped_stats[i].objoid,
+							 dropped_stats[i].relfile);
 		}
 	}
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index f56b3cc0f2..db6107cd90 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -33,6 +33,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 
 /* GUC variables */
@@ -152,6 +153,7 @@ RelationCreateStorage(RelFileLocator rlocator, char relpersistence,
 	if (needs_wal)
 		log_smgrcreate(&srel->smgr_rlocator.locator, MAIN_FORKNUM);
 
+	pgstat_create_transactional(PGSTAT_KIND_RELFILENODE, rlocator.dbOid, rlocator.spcOid, rlocator.relNumber);
 	/*
 	 * Add the relation to the list of stuff to delete at abort, if we are
 	 * asked to do so.
@@ -227,6 +229,8 @@ RelationDropStorage(Relation rel)
 	 * for now I'll keep the logic simple.
 	 */
 
+	pgstat_drop_transactional(PGSTAT_KIND_RELFILENODE, rel->rd_locator.dbOid, rel->rd_locator.spcOid,  rel->rd_locator.relNumber);
+
 	RelationCloseSmgr(rel);
 }
 
@@ -253,6 +257,9 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 	PendingRelDelete *pending;
 	PendingRelDelete *prev;
 	PendingRelDelete *next;
+	PgStat_SubXactStatus *xact_state;
+
+	xact_state = pgStatXactStack;
 
 	prev = NULL;
 	for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -267,6 +274,7 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 			else
 				pendingDeletes = next;
 			pfree(pending);
+			PgStat_RemoveRelFileNodeFromDroppedStats(xact_state, rlocator);
 			/* prev does not change */
 		}
 		else
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index ae099e328c..140c8d556c 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -681,7 +681,7 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM publ
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid, oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_stats(oid) FROM public;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 53047cab5f..b0d7af6df0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -750,6 +750,7 @@ CREATE VIEW pg_statio_all_tables AS
             C.relname AS relname,
             pg_stat_get_blocks_fetched(C.oid) -
                     pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
+			pg_stat_get_blocks_written(C.oid) + pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END, C.relfilenode) AS heap_blks_written,
             pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
             I.idx_blks_read AS idx_blks_read,
             I.idx_blks_hit AS idx_blks_hit,
@@ -758,7 +759,7 @@ CREATE VIEW pg_statio_all_tables AS
             pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
             X.idx_blks_read AS tidx_blks_read,
             X.idx_blks_hit AS tidx_blks_hit
-    FROM pg_class C LEFT JOIN
+    FROM pg_database d, pg_class C LEFT JOIN
             pg_class T ON C.reltoastrelid = T.oid
             LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
             LEFT JOIN LATERAL (
@@ -775,7 +776,7 @@ CREATE VIEW pg_statio_all_tables AS
                      sum(pg_stat_get_blocks_hit(indexrelid))::bigint
                      AS idx_blks_hit
               FROM pg_index WHERE indrelid = T.oid ) X ON true
-    WHERE C.relkind IN ('r', 't', 'm');
+    WHERE C.relkind IN ('r', 't', 'm') AND d.datname = current_database();
 
 CREATE VIEW pg_statio_sys_tables AS
     SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3c68a9904d..0ff2812218 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -519,6 +519,11 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_checkpointer();
 		pgstat_report_wal(true);
+		/*
+		 *  No need to check for transaction state in checkpointer before
+		 *  calling pgstat_report_stat().
+		 */
+		pgstat_report_stat(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 49637284f9..06d89ba26b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1121,9 +1121,9 @@ PinBufferForBlock(Relation rel,
 		 * WaitReadBuffers() (so, not for hits, and not for buffers that are
 		 * zeroed instead), the per-relation stats always count them.
 		 */
-		pgstat_count_buffer_read(rel);
+		pgstat_report_relfilenode_buffer_read(rel);
 		if (*foundPtr)
-			pgstat_count_buffer_hit(rel);
+			pgstat_report_relfilenode_buffer_hit(rel);
 	}
 	if (*foundPtr)
 	{
@@ -3838,6 +3838,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 
 	pgBufferUsage.shared_blks_written++;
 
+	pgstat_report_relfilenode_blks_written(reln->smgr_rlocator.locator);
+
 	/*
 	 * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
 	 * end the BM_IO_IN_PROGRESS state.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index bf0f3ca76d..3576749d2d 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1447,12 +1447,16 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 {
 	SMgrRelation *srels;
 	int			i;
+	int         not_freed_count = 0;
 
 	srels = palloc(sizeof(SMgrRelation) * ndelrels);
 	for (i = 0; i < ndelrels; i++)
 	{
 		SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
 
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELFILENODE, delrels[i].dbOid, delrels[i].spcOid, delrels[i].relNumber))
+			not_freed_count++;
+
 		if (isRedo)
 		{
 			ForkNumber	fork;
@@ -1463,6 +1467,9 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 		srels[i] = srel;
 	}
 
+	if (not_freed_count > 0)
+		pgstat_request_entry_refs_gc();
+
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index dcc2ad8d95..e3b6f45828 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -288,6 +288,19 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
+	[PGSTAT_KIND_RELFILENODE] = {
+		.name = "relfilenode",
+
+		.fixed_amount = false,
+
+		.shared_size = sizeof(PgStatShared_RelFileNode),
+		.shared_data_off = offsetof(PgStatShared_RelFileNode, stats),
+		.shared_data_len = sizeof(((PgStatShared_RelFileNode *) 0)->stats),
+		.pending_size = sizeof(PgStat_StatRelFileNodeEntry),
+
+		.flush_pending_cb = pgstat_relfilenode_flush_cb,
+	},
+
 	[PGSTAT_KIND_FUNCTION] = {
 		.name = "function",
 
@@ -651,7 +664,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / relfilenode / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush IO stats */
@@ -731,7 +744,7 @@ pgstat_reset_counters(void)
  * GRANT system.
  */
 void
-pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 	TimestampTz ts = GetCurrentTimestamp();
@@ -740,7 +753,7 @@ pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
 
 	/* reset the "single counter" */
-	pgstat_reset_entry(kind, dboid, objoid, ts);
+	pgstat_reset_entry(kind, dboid, objoid, relfile, ts);
 
 	if (!kind_info->accessed_across_databases)
 		pgstat_reset_database_timestamp(dboid, ts);
@@ -809,7 +822,7 @@ pgstat_clear_snapshot(void)
 }
 
 void *
-pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_HashKey key;
 	PgStat_EntryRef *entry_ref;
@@ -825,6 +838,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objoid = objoid;
+	key.relfile = relfile;
 
 	/* if we need to build a full snapshot, do so */
 	if (pgstat_fetch_consistency == PGSTAT_FETCH_CONSISTENCY_SNAPSHOT)
@@ -850,7 +864,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 
 	pgStatLocal.snapshot.mode = pgstat_fetch_consistency;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->shared_entry->dropped)
 	{
@@ -919,13 +933,13 @@ pgstat_get_stat_snapshot_timestamp(bool *have_snapshot)
 }
 
 bool
-pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	/* fixed-numbered stats always exist */
 	if (pgstat_get_kind_info(kind)->fixed_amount)
 		return true;
 
-	return pgstat_get_entry_ref(kind, dboid, objoid, false, NULL) != NULL;
+	return pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL) != NULL;
 }
 
 /*
@@ -1102,7 +1116,8 @@ pgstat_build_snapshot_fixed(PgStat_Kind kind)
  * created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry)
+pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid,
+						  RelFileNumber relfile, bool *created_entry)
 {
 	PgStat_EntryRef *entry_ref;
 
@@ -1117,7 +1132,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
 								  ALLOCSET_SMALL_SIZES);
 	}
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid,
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile,
 									 true, created_entry);
 
 	if (entry_ref->pending == NULL)
@@ -1140,11 +1155,11 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
  * that it shouldn't be needed.
  */
 PgStat_EntryRef *
-pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->pending == NULL)
 		return NULL;
@@ -1173,7 +1188,7 @@ pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
 }
 
 /*
- * Flush out pending stats for database objects (databases, relations,
+ * Flush out pending stats for database objects (databases, relations, relfilenodes,
  * functions).
  */
 static bool
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 29bc090974..cf77f2dbdb 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -43,7 +43,7 @@ static PgStat_Counter pgLastSessionReportTime = 0;
 void
 pgstat_drop_database(Oid databaseid)
 {
-	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid);
+	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid, InvalidOid);
 }
 
 /*
@@ -66,7 +66,7 @@ pgstat_report_autovac(Oid dboid)
 	 * operation so it doesn't matter if we get blocked here a little.
 	 */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE,
-											dboid, InvalidOid, false);
+											dboid, InvalidOid, InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) entry_ref->shared_stats;
 	dbentry->stats.last_autovac_time = GetCurrentTimestamp();
@@ -150,7 +150,7 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
 	 * common enough for that to be a problem.
 	 */
 	entry_ref =
-		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, false);
+		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid, false);
 
 	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
 	sharedent->stats.checksum_failures += failurecount;
@@ -242,7 +242,7 @@ PgStat_StatDBEntry *
 pgstat_fetch_stat_dbentry(Oid dboid)
 {
 	return (PgStat_StatDBEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid);
+		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid);
 }
 
 void
@@ -341,7 +341,7 @@ pgstat_prep_database_pending(Oid dboid)
 	Assert(!OidIsValid(dboid) || OidIsValid(MyDatabaseId));
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid,
-										  NULL);
+										  InvalidOid, NULL);
 
 	return entry_ref->pending;
 }
@@ -357,7 +357,7 @@ pgstat_reset_database_timestamp(Oid dboid, TimestampTz ts)
 	PgStatShared_Database *dbentry;
 
 	dbref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, MyDatabaseId, InvalidOid,
-										false);
+										InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) dbref->shared_stats;
 	dbentry->stats.stat_reset_timestamp = ts;
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d26da551a4..440e44e300 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -46,7 +46,8 @@ pgstat_create_function(Oid proid)
 {
 	pgstat_create_transactional(PGSTAT_KIND_FUNCTION,
 								MyDatabaseId,
-								proid);
+								proid,
+								InvalidOid);
 }
 
 /*
@@ -61,7 +62,8 @@ pgstat_drop_function(Oid proid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_FUNCTION,
 							  MyDatabaseId,
-							  proid);
+							  proid,
+							  InvalidOid);
 }
 
 /*
@@ -86,6 +88,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_FUNCTION,
 										  MyDatabaseId,
 										  fcinfo->flinfo->fn_oid,
+										  InvalidOid,
 										  &created_entry);
 
 	/*
@@ -113,7 +116,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 		if (!SearchSysCacheExists1(PROCOID, ObjectIdGetDatum(fcinfo->flinfo->fn_oid)))
 		{
 			pgstat_drop_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId,
-							  fcinfo->flinfo->fn_oid);
+							  fcinfo->flinfo->fn_oid, InvalidOid);
 			ereport(ERROR, errcode(ERRCODE_UNDEFINED_FUNCTION),
 					errmsg("function call to dropped function"));
 		}
@@ -224,7 +227,7 @@ find_funcstat_entry(Oid func_id)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 
 	if (entry_ref)
 		return entry_ref->pending;
@@ -239,5 +242,5 @@ PgStat_StatFuncEntry *
 pgstat_fetch_stat_funcentry(Oid func_id)
 {
 	return (PgStat_StatFuncEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 }
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 8a3f7d434c..136dd6c85b 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -44,6 +44,7 @@ typedef struct TwoPhasePgStatRecord
 
 
 static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+PgStat_StatRelFileNodeEntry *pgstat_prep_relfilenode_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -69,6 +70,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
 										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
 										  RelationGetRelid(dst),
+										  InvalidOid,
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
@@ -170,7 +172,7 @@ pgstat_create_relation(Relation rel)
 {
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
 								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								RelationGetRelid(rel), InvalidOid);
 }
 
 /*
@@ -184,7 +186,7 @@ pgstat_drop_relation(Relation rel)
 
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
 							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  RelationGetRelid(rel), InvalidOid);
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -225,7 +227,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											dboid, tableoid, InvalidOid, false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -318,6 +320,7 @@ pgstat_report_analyze(Relation rel,
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
 											RelationGetRelid(rel),
+											InvalidOid,
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -458,6 +461,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
 	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
 }
 
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the collected statistics for one relfilenode or NULL. NULL doesn't mean
+ * that the relfilenode doesn't exist, just that there are no statistics, so the
+ * caller is better off to report ZERO instead.
+ */
+PgStat_StatRelFileNodeEntry *
+pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile)
+{
+	return (PgStat_StatRelFileNodeEntry *)
+		pgstat_fetch_entry(PGSTAT_KIND_RELFILENODE, dboid, spcOid, relfile);
+}
+
 /*
  * More efficient version of pgstat_fetch_stat_tabentry(), allowing to specify
  * whether the to-be-accessed table is a shared relation or not.
@@ -468,7 +484,7 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
 	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 
 	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid, InvalidOid);
 }
 
 /*
@@ -491,10 +507,10 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id, InvalidOid);
 	if (!entry_ref)
 	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id, InvalidOid);
 		if (!entry_ref)
 			return tablestatus;
 	}
@@ -881,6 +897,38 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	return true;
 }
 
+/*
+ * Flush out pending stats for the relfilenode entry
+ *
+ * If nowait is true, this function returns false if lock could not
+ * immediately acquired, otherwise true is returned.
+ */
+bool
+pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_RelFileNode *sharedent;
+	PgStat_StatRelFileNodeEntry *pendingent;
+
+	pendingent = (PgStat_StatRelFileNodeEntry *) entry_ref->pending;
+	sharedent = (PgStatShared_RelFileNode *) entry_ref->shared_stats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+#define PGSTAT_ACCUM_RELFILENODECOUNT(item)      \
+		(sharedent)->stats.item += (pendingent)->item
+
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_fetched);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_hit);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_written);
+
+	pgstat_unlock_entry(entry_ref);
+
+	memset(pendingent, 0, sizeof(*pendingent));
+
+	return true;
+}
+
 void
 pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref)
 {
@@ -902,7 +950,7 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
 										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  rel_id, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->id = rel_id;
 	pending->shared = isshared;
@@ -910,6 +958,56 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 	return pending;
 }
 
+PgStat_StatRelFileNodeEntry *
+pgstat_prep_relfilenode_pending(RelFileLocator locator)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELFILENODE, locator.dbOid,
+										  locator.spcOid, locator.relNumber, NULL);
+
+	return entry_ref->pending;
+}
+
+void
+pgstat_report_relfilenode_blks_written(RelFileLocator locator)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	relfileentry = pgstat_prep_relfilenode_pending(locator);
+
+	if (relfileentry)
+		relfileentry->blocks_written++;
+}
+
+void
+pgstat_report_relfilenode_buffer_read(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_read(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_fetched++;
+}
+
+void
+pgstat_report_relfilenode_buffer_hit(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_hit(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_hit++;
+}
+
 /*
  * add a new (sub)transaction state record
  */
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index 889e86ac5a..96c6621477 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -62,7 +62,7 @@ pgstat_reset_replslot(const char *name)
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
-					 ReplicationSlotIndex(slot));
+					 ReplicationSlotIndex(slot), InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 }
@@ -82,7 +82,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
 	PgStat_StatReplSlotEntry *statent;
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 	statent = &shstatent->stats;
 
@@ -116,7 +116,7 @@ pgstat_create_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 
 	/*
@@ -146,7 +146,7 @@ void
 pgstat_acquire_replslot(ReplicationSlot *slot)
 {
 	pgstat_get_entry_ref(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						 ReplicationSlotIndex(slot), true, NULL);
+						 ReplicationSlotIndex(slot), InvalidOid, true, NULL);
 }
 
 /*
@@ -158,7 +158,7 @@ pgstat_drop_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid,
-					  ReplicationSlotIndex(slot));
+					  ReplicationSlotIndex(slot), InvalidOid);
 }
 
 /*
@@ -177,7 +177,7 @@ pgstat_fetch_replslot(NameData slotname)
 
 	if (idx != -1)
 		slotentry = (PgStat_StatReplSlotEntry *) pgstat_fetch_entry(PGSTAT_KIND_REPLSLOT,
-																	InvalidOid, idx);
+																	InvalidOid, idx, InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 
@@ -209,6 +209,7 @@ pgstat_replslot_from_serialized_name_cb(const NameData *name, PgStat_HashKey *ke
 	key->kind = PGSTAT_KIND_REPLSLOT;
 	key->dboid = InvalidOid;
 	key->objoid = idx;
+	key->relfile = InvalidOid;
 
 	return true;
 }
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 91591da395..d74b07e414 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -395,10 +395,10 @@ pgstat_get_entry_ref_cached(PgStat_HashKey key, PgStat_EntryRef **entry_ref_p)
  * if the entry is newly created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, bool create,
-					 bool *created_entry)
+pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile,
+					 bool create, bool *created_entry)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shhashent;
 	PgStatShared_Common *shheader = NULL;
 	PgStat_EntryRef *entry_ref;
@@ -611,12 +611,12 @@ pgstat_unlock_entry(PgStat_EntryRef *entry_ref)
  */
 PgStat_EntryRef *
 pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-							bool nowait)
+							RelFileNumber relfile, bool nowait)
 {
 	PgStat_EntryRef *entry_ref;
 
 	/* find shared table stats entry corresponding to the local entry */
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, true, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, true, NULL);
 
 	/* lock the shared entry to protect the content, skip if failed */
 	if (!pgstat_lock_entry(entry_ref, nowait))
@@ -856,9 +856,9 @@ pgstat_drop_database_and_contents(Oid dboid)
 }
 
 bool
-pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shent;
 	bool		freed = true;
 
@@ -931,13 +931,12 @@ shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header,
  * Reset one variable-numbered stats entry.
  */
 void
-pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts)
+pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts)
 {
 	PgStat_EntryRef *entry_ref;
 
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
-
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 	if (!entry_ref || entry_ref->shared_entry->dropped)
 		return;
 
diff --git a/src/backend/utils/activity/pgstat_subscription.c b/src/backend/utils/activity/pgstat_subscription.c
index d9af8de658..9b9ab2861b 100644
--- a/src/backend/utils/activity/pgstat_subscription.c
+++ b/src/backend/utils/activity/pgstat_subscription.c
@@ -30,7 +30,7 @@ pgstat_report_subscription_error(Oid subid, bool is_apply_error)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 
 	if (is_apply_error)
@@ -47,12 +47,12 @@ pgstat_create_subscription(Oid subid)
 {
 	/* Ensures that stats are dropped if transaction rolls back */
 	pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
-								InvalidOid, subid);
+								InvalidOid, subid, InvalidOid);
 
 	/* Create and initialize the subscription stats entry */
-	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid,
+	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid,
 						 true, NULL);
-	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, 0);
+	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid, 0);
 }
 
 /*
@@ -64,7 +64,7 @@ void
 pgstat_drop_subscription(Oid subid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_SUBSCRIPTION,
-							  InvalidOid, subid);
+							  InvalidOid, subid, InvalidOid);
 }
 
 /*
@@ -75,7 +75,7 @@ PgStat_StatSubEntry *
 pgstat_fetch_stat_subscription(Oid subid)
 {
 	return (PgStat_StatSubEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 1877d22f14..b25df5112b 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -30,7 +30,7 @@ static void AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool
 static void AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 											bool isCommit, int nestDepth);
 
-static PgStat_SubXactStatus *pgStatXactStack = NULL;
+PgStat_SubXactStatus *pgStatXactStack = NULL;
 
 
 /*
@@ -84,7 +84,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that dropped an object committed. Drop the stats
 			 * too.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 		else if (!isCommit && pending->is_create)
@@ -93,7 +93,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that created an object aborted. Drop the stats
 			 * associated with the object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 
@@ -105,6 +105,33 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 		pgstat_request_entry_refs_gc();
 }
 
+/*
+ * Remove a relfilenode stat from the list of stats to be dropped.
+ */
+void
+PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator)
+{
+	dlist_mutable_iter iter;
+
+	if (dclist_count(&xact_state->pending_drops) == 0)
+		return;
+
+	dclist_foreach_modify(iter, &xact_state->pending_drops)
+	{
+		PgStat_PendingDroppedStatsItem *pending =
+			dclist_container(PgStat_PendingDroppedStatsItem, node, iter.cur);
+		xl_xact_stats_item *it = &pending->item;
+
+		if (it->kind == PGSTAT_KIND_RELFILENODE && it->dboid == rlocator.dbOid
+			&& it->objoid == rlocator.spcOid && it->relfile == rlocator.relNumber)
+		{
+			dclist_delete_from(&xact_state->pending_drops, &pending->node);
+			pfree(pending);
+			return;
+		}
+	}
+}
+
 /*
  * Called from access/transam/xact.c at subtransaction commit/abort.
  */
@@ -158,7 +185,7 @@ AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 			 * Subtransaction creating a new stats object aborted. Drop the
 			 * stats object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 			pfree(pending);
 		}
@@ -320,7 +347,11 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 	{
 		xl_xact_stats_item *it = &items[i];
 
-		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+		/* leave it to pgstat_drop_transactional() in RelationDropStorage() */
+		if (it->kind == PGSTAT_KIND_RELFILENODE)
+			continue;
+
+		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 			not_freed_count++;
 	}
 
@@ -329,7 +360,7 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 }
 
 static void
-create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool is_create)
+create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, bool is_create)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_SubXactStatus *xact_state;
@@ -342,6 +373,7 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
 	drop->item.kind = kind;
 	drop->item.dboid = dboid;
 	drop->item.objoid = objoid;
+	drop->item.relfile = relfile;
 
 	dclist_push_tail(&xact_state->pending_drops, &drop->node);
 }
@@ -354,18 +386,18 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
  * dropped.
  */
 void
-pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objoid, false, NULL))
+	if (pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL))
 	{
 		ereport(WARNING,
-				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u",
-					   (pgstat_get_kind_info(kind))->name, dboid, objoid));
+				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u, relfile=%u",
+					   (pgstat_get_kind_info(kind))->name, dboid, objoid, relfile));
 
-		pgstat_reset(kind, dboid, objoid);
+		pgstat_reset(kind, dboid, objoid, relfile);
 	}
 
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ true);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ true);
 }
 
 /*
@@ -376,7 +408,7 @@ pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
  * alive.
  */
 void
-pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ false);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ false);
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3876339ee1..e266d96f5e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -106,6 +106,30 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated)
 /* pg_stat_get_vacuum_count */
 PG_STAT_GET_RELENTRY_INT64(vacuum_count)
 
+#define PG_STAT_GET_RELFILEENTRY_INT64(stat)						\
+Datum															\
+CppConcat(pg_stat_get_relfilenode_,stat)(PG_FUNCTION_ARGS)					\
+{																\
+	Oid			dboid = PG_GETARG_OID(0);						\
+	Oid			 spcOid = PG_GETARG_OID(1);						\
+	RelFileNumber			 relfile = PG_GETARG_OID(2);						\
+	int64		result;											\
+	PgStat_StatRelFileNodeEntry *relfileentry;								\
+																\
+	if ((relfileentry = pgstat_fetch_stat_relfilenodeentry(dboid, spcOid, relfile)) == NULL)	\
+		result = 0;												\
+	else														\
+		result = (int64) (relfileentry->stat);						\
+																\
+	PG_RETURN_INT64(result);									\
+}
+
+/* pg_stat_get_relfilenode_blocks_written */
+PG_STAT_GET_RELFILEENTRY_INT64(blocks_written)
+
+/* pg_stat_get_blocks_written */
+PG_STAT_GET_RELENTRY_INT64(blocks_written)
+
 #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat)					\
 Datum															\
 CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)					\
@@ -1752,7 +1776,7 @@ pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 	Oid			taboid = PG_GETARG_OID(0);
 	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1762,7 +1786,7 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 {
 	Oid			funcoid = PG_GETARG_OID(0);
 
-	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid);
+	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1820,7 +1844,7 @@ pg_stat_reset_subscription_stats(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid subscription OID %u", subid)));
-		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 	}
 
 	PG_RETURN_VOID();
@@ -2028,7 +2052,9 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	char	   *stats_type = text_to_cstring(PG_GETARG_TEXT_P(0));
 	Oid			dboid = PG_GETARG_OID(1);
 	Oid			objoid = PG_GETARG_OID(2);
+	Oid			relfile = PG_GETARG_OID(3);
+
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
-	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
+	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid, relfile));
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd..792cd2237e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,7 +21,9 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "executor/tuptable.h"
+#include "pgstat.h"
 #include "storage/read_stream.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -1634,6 +1636,23 @@ table_relation_set_new_filelocator(Relation rel,
 								   TransactionId *freezeXid,
 								   MultiXactId *minmulti)
 {
+	PgStat_StatRelFileNodeEntry *relfileentry;
+	PgStat_StatTabEntry *tabentry = NULL;
+	PgStat_EntryRef *entry_ref = NULL;
+	PgStatShared_Relation *shtabentry;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_RELATION, MyDatabaseId, rel->rd_id, InvalidOid, false, NULL);
+	if (entry_ref)
+	{
+		shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
+		tabentry = &shtabentry->stats;
+	}
+
+	relfileentry = pgstat_fetch_stat_relfilenodeentry(rel->rd_locator.dbOid, rel->rd_locator.spcOid, rel->rd_locator.relNumber);
+
+	if (tabentry && relfileentry)
+		tabentry->blocks_written += relfileentry->blocks_written;
+
 	rel->rd_tableam->relation_set_new_filelocator(rel, newrlocator,
 												  persistence, freezeXid,
 												  minmulti);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 6d4439f052..3b9ed65ff6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -284,6 +284,7 @@ typedef struct xl_xact_stats_item
 	int			kind;
 	Oid			dboid;
 	Oid			objoid;
+	RelFileNumber relfile;
 } xl_xact_stats_item;
 
 typedef struct xl_xact_stats_items
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6a5476d3c4..912471a1ac 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5374,6 +5374,14 @@
   proname => 'pg_stat_get_tuples_updated', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_tuples_updated' },
+{ oid => '9280', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_relfilenode_blocks_written', provolatile => 's',
+  proparallel => 'r',
+  proargtypes => 'oid oid oid',
+  prorettype => 'int8',
+  proallargtypes => '{oid,oid,oid,int8}',
+  proargmodes => '{i,i,i,o}',
+  prosrc => 'pg_stat_get_relfilenode_blocks_written' },
 { oid => '1933', descr => 'statistics: number of tuples deleted',
   proname => 'pg_stat_get_tuples_deleted', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5413,6 +5421,10 @@
   proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8438', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_blocks_written', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_blocks_written' },
 { oid => '2781', descr => 'statistics: last manual vacuum time for a table',
   proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
   proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5499,7 +5511,7 @@
 
 { oid => '6230', descr => 'statistics: check if a stats object exists',
   proname => 'pg_stat_have_stats', provolatile => 'v', proparallel => 'r',
-  prorettype => 'bool', proargtypes => 'text oid oid',
+  prorettype => 'bool', proargtypes => 'text oid oid oid',
   prosrc => 'pg_stat_have_stats' },
 
 { oid => '6231', descr => 'statistics: information about subscription stats',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2136239710..9631689430 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -14,6 +14,7 @@
 #include "datatype/timestamp.h"
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -40,6 +41,7 @@ typedef enum PgStat_Kind
 	/* stats for variable-numbered objects */
 	PGSTAT_KIND_DATABASE,		/* database-wide statistics */
 	PGSTAT_KIND_RELATION,		/* per-table statistics */
+	PGSTAT_KIND_RELFILENODE,	/* per-relfilenode statistics */
 	PGSTAT_KIND_FUNCTION,		/* per-function statistics */
 	PGSTAT_KIND_REPLSLOT,		/* per-slot statistics */
 	PGSTAT_KIND_SUBSCRIPTION,	/* per-subscription statistics */
@@ -417,6 +419,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
 
 	TimestampTz last_vacuum_time;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
@@ -428,6 +431,13 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+typedef struct PgStat_StatRelFileNodeEntry
+{
+	PgStat_Counter blocks_fetched;
+	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
+} PgStat_StatRelFileNodeEntry;
+
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -478,7 +488,7 @@ extern long pgstat_report_stat(bool force);
 extern void pgstat_force_next_flush(void);
 
 extern void pgstat_reset_counters(void);
-extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_reset_of_kind(PgStat_Kind kind);
 
 /* stats accessors */
@@ -487,7 +497,7 @@ extern TimestampTz pgstat_get_stat_snapshot_timestamp(bool *have_snapshot);
 
 /* helpers */
 extern PgStat_Kind pgstat_get_kind_from_str(char *kind_str);
-extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
@@ -596,6 +606,10 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
+extern void pgstat_report_relfilenode_blks_written(RelFileLocator locator);
+extern void pgstat_report_relfilenode_buffer_read(Relation reln);
+extern void pgstat_report_relfilenode_buffer_hit(Relation reln);
+
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
  * pgstat_assoc_relation() to do so. See its comment for why this is done
@@ -655,6 +669,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
+extern PgStat_StatRelFileNodeEntry *pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
 														   Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index dbbca31602..50d5f1a577 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -53,7 +53,8 @@ typedef struct PgStat_HashKey
 {
 	PgStat_Kind kind;			/* statistics entry kind */
 	Oid			dboid;			/* database ID. InvalidOid for shared objects. */
-	Oid			objoid;			/* object ID, either table or function. */
+	Oid			objoid;			/* object ID, either table or function or tablespace. */
+	RelFileNumber relfile;		/* relfilenumber for RelFileLocator. */
 } PgStat_HashKey;
 
 /*
@@ -376,6 +377,12 @@ typedef struct PgStatShared_Relation
 	PgStat_StatTabEntry stats;
 } PgStatShared_Relation;
 
+typedef struct PgStatShared_RelFileNode
+{
+	PgStatShared_Common header;
+	PgStat_StatRelFileNodeEntry stats;
+} PgStatShared_RelFileNode;
+
 typedef struct PgStatShared_Function
 {
 	PgStatShared_Common header;
@@ -498,6 +505,9 @@ static inline size_t pgstat_get_entry_len(PgStat_Kind kind);
 static inline void *pgstat_get_entry_data(PgStat_Kind kind, PgStatShared_Common *entry);
 
 
+extern PgStat_SubXactStatus *pgStatXactStack;
+extern void PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator);
+
 /*
  * Functions in pgstat.c
  */
@@ -511,10 +521,12 @@ extern void pgstat_assert_is_up(void);
 #endif
 
 extern void pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref);
-extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry);
-extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
+												  Oid objoid, RelFileNumber relfile,
+												  bool *created_entry);
+extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
-extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_snapshot_fixed(PgStat_Kind kind);
 
 
@@ -582,6 +594,7 @@ extern void AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern bool pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 
 
@@ -602,15 +615,16 @@ extern void pgstat_attach_shmem(void);
 extern void pgstat_detach_shmem(void);
 
 extern PgStat_EntryRef *pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid,
-											 bool create, bool *created_entry);
+											 RelFileNumber relfile, bool create,
+											 bool *created_entry);
 extern bool pgstat_lock_entry(PgStat_EntryRef *entry_ref, bool nowait);
 extern bool pgstat_lock_entry_shared(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_unlock_entry(PgStat_EntryRef *entry_ref);
-extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_drop_all_entries(void);
 extern PgStat_EntryRef *pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-													bool nowait);
-extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts);
+													RelFileNumber relfile, bool nowait);
+extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts);
 extern void pgstat_reset_entries_of_kind(PgStat_Kind kind, TimestampTz ts);
 extern void pgstat_reset_matching_entries(bool (*do_reset) (PgStatShared_HashEntry *, Datum),
 										  Datum match_data,
@@ -655,8 +669,8 @@ extern void pgstat_subscription_reset_timestamp_cb(PgStatShared_Common *header,
  */
 
 extern PgStat_SubXactStatus *pgstat_get_xact_stack_level(int nest_level);
-extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
-extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
+extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 6a1615a1e8..ee5a404b45 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -40,10 +40,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 my $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -64,10 +64,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -81,10 +81,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -95,10 +95,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -114,10 +114,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -130,10 +130,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -292,10 +292,10 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objoid) = @_;
+	my ($kind, $dboid, $objoid, $relfile) = @_;
 
 	return $node->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid)");
+		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid, $relfile)");
 }
 
 sub overwrite_file
diff --git a/src/test/recovery/t/030_stats_cleanup_replica.pl b/src/test/recovery/t/030_stats_cleanup_replica.pl
index 74b516cc7c..317df24c4f 100644
--- a/src/test/recovery/t/030_stats_cleanup_replica.pl
+++ b/src/test/recovery/t/030_stats_cleanup_replica.pl
@@ -179,9 +179,9 @@ sub test_standby_func_tab_stats_status
 	my %stats;
 
 	$stats{rel} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid)");
+		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid, 0)");
 	$stats{func} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('function', $dboid, $funcoid)");
+		"SELECT pg_stat_have_stats('function', $dboid, $funcoid, 0)");
 
 	is_deeply(\%stats, \%expected, "$sect: standby stats as expected");
 
@@ -194,7 +194,7 @@ sub test_standby_db_stats_status
 	my ($connect_db, $dboid, $present) = @_;
 
 	is( $node_standby->safe_psql(
-			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0)"),
+			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0, 0)"),
 		$present,
 		"$sect: standby db stats as expected");
 }
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ef658ad740..a2fa165c4c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2335,6 +2335,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
     (pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)) AS heap_blks_read,
+    (pg_stat_get_blocks_written(c.oid) + pg_stat_get_relfilenode_blocks_written(d.oid,
+        CASE
+            WHEN (c.reltablespace <> (0)::oid) THEN c.reltablespace
+            ELSE d.dattablespace
+        END, c.relfilenode)) AS heap_blks_written,
     pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
     i.idx_blks_read,
     i.idx_blks_hit,
@@ -2342,7 +2347,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
     x.idx_blks_read AS tidx_blks_read,
     x.idx_blks_hit AS tidx_blks_hit
-   FROM ((((pg_class c
+   FROM pg_database d,
+    ((((pg_class c
      LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2353,7 +2359,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
             (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
            FROM pg_index
           WHERE (pg_index.indrelid = t.oid)) x ON (true))
-  WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]));
+  WHERE ((c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) AND (d.datname = current_database()));
 pg_statio_sys_indexes| SELECT relid,
     indexrelid,
     schemaname,
@@ -2374,6 +2380,7 @@ pg_statio_sys_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
@@ -2403,6 +2410,7 @@ pg_statio_user_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..eff0c9372c 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1111,23 +1111,23 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 ERROR:  invalid statistics kind: "zaphod"
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
  pg_stat_have_stats 
 --------------------
  f
 (1 row)
 
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1144,21 +1144,21 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1174,14 +1174,14 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1196,7 +1196,7 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1204,7 +1204,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1212,7 +1212,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1220,7 +1220,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1228,7 +1228,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1513,7 +1513,7 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 (1 row)
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..5a40779989 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -539,12 +539,12 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
 
 -- pg_stat_have_stats returns true for committed index creation
 CREATE table stats_test_tab1 as select generate_series(1,10) a;
@@ -552,40 +552,40 @@ CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 SET enable_seqscan TO off;
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for rolled back index creation
 BEGIN;
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for reindex CONCURRENTLY
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- put enable_seqscan back to on
 SET enable_seqscan TO on;
@@ -759,7 +759,7 @@ SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
 SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
 SELECT pg_stat_reset_shared('io');
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index fb3e5629b3..1f4ae5efd5 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -263,7 +263,7 @@ $node_subscriber->safe_psql($db, qq(DROP SUBSCRIPTION $sub1_name));
 
 # Subscription stats for sub1 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub1_name' should be removed.));
 
@@ -282,7 +282,7 @@ DROP SUBSCRIPTION $sub2_name;
 
 # Subscription stats for sub2 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d427a1c16a..d7385f9bfb 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2118,6 +2118,7 @@ PgStatShared_Function
 PgStatShared_HashEntry
 PgStatShared_IO
 PgStatShared_Relation
+PgStatShared_RelFileNode
 PgStatShared_ReplSlot
 PgStatShared_SLRU
 PgStatShared_Subscription
-- 
2.34.1

#2Robert Haas
robertmhaas@gmail.com
In reply to: Bertrand Drouvot (#1)
Re: relfilenode statistics

Hi Bertrand,

It would be helpful to me if the reasons why we're splitting out
relfilenodestats could be more clearly spelled out. I see Andres's
comment in the thread to which you linked, but it's pretty vague about
why we should do this ("it's not nice") and whether we should do this
("I wonder if this is an argument for") and maybe that's all fine if
Andres is going to be the one to review and commit this, but even if
then it would be nice if the rest of us could follow along from home,
and right now I can't.

The commit message is often a good place to spell this kind of thing
out, because then it's included with every version of the patch you
post, and may be of some use to the eventual committer in writing
their commit message. The body of the email where you post the patch
set can be fine, too.

...Robert

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Robert Haas (#2)
Re: relfilenode statistics

Hi Robert,

On Mon, May 27, 2024 at 09:10:13AM -0400, Robert Haas wrote:

Hi Bertrand,

It would be helpful to me if the reasons why we're splitting out
relfilenodestats could be more clearly spelled out. I see Andres's
comment in the thread to which you linked, but it's pretty vague about
why we should do this ("it's not nice") and whether we should do this
("I wonder if this is an argument for") and maybe that's all fine if
Andres is going to be the one to review and commit this, but even if
then it would be nice if the rest of us could follow along from home,
and right now I can't.

Thanks for the feedback!

You’re completely right, my previous message is missing clear explanation as to
why I think that relfilenode stats could be useful. Let me try to fix this.

The main argument is that we currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation (example in the v1 patch and in the message up-thread).

I think that adding instrumentation in this area (writes counters) could be
beneficial (like it is for the ones we currently have for reads).

Second argument is that this is also beneficial for the "Split index and
table statistics into different types of stats" thread (mentioned in the previous
message). It would allow us to avoid additional branches in some situations (like
the one mentioned by Andres in the link I provided up-thread).

If we agree that the main argument makes sense to think about having relfilenode
stats then I think using them as proposed in the second argument makes sense too:

We’d move the current buffer read and buffer hit counters from the relation stats
to the relfilenode stats (while still being able to retrieve them from the
pg_statio_all_tables/indexes views: see the example for the new heap_blks_written
stat added in the patch). Generally speaking, I think that tracking counters at
a common level (i.e relfilenode level instead of table or index level) is
beneficial (avoid storing/allocating space for the same counters in multiple
structs) and sounds more intuitive to me.

Also I think this is open door for new ideas: for example, with relfilenode
statistics in place, we could probably also start thinking about tracking
checksum errors per relfllenode.

The commit message is often a good place to spell this kind of thing
out, because then it's included with every version of the patch you
post, and may be of some use to the eventual committer in writing
their commit message. The body of the email where you post the patch
set can be fine, too.

Yeah, I’ll update the commit message in V2 with better explanations once I get
feedback on V1 (should we decide to move on with the relfilenode stats idea).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4Robert Haas
robertmhaas@gmail.com
In reply to: Bertrand Drouvot (#3)
Re: relfilenode statistics

On Mon, Jun 3, 2024 at 7:11 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

The main argument is that we currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.

OK.

Second argument is that this is also beneficial for the "Split index and
table statistics into different types of stats" thread (mentioned in the previous
message). It would allow us to avoid additional branches in some situations (like
the one mentioned by Andres in the link I provided up-thread).

OK.

We’d move the current buffer read and buffer hit counters from the relation stats
to the relfilenode stats (while still being able to retrieve them from the
pg_statio_all_tables/indexes views: see the example for the new heap_blks_written
stat added in the patch). Generally speaking, I think that tracking counters at
a common level (i.e relfilenode level instead of table or index level) is
beneficial (avoid storing/allocating space for the same counters in multiple
structs) and sounds more intuitive to me.

Hmm. So if I CLUSTER or VACUUM FULL the relation, the relfilenode
changes. Does that mean I lose all of those stats? Is that a problem?
Or is it good? Or what?

I also thought about the other direction. Suppose I drop the a
relation and create a new one that gets a different relation OID but
the same relfilenode. But I don't think that's a problem: dropping the
relation should forcibly remove the old stats, so there won't be any
conflict in this case.

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Robert Haas (#4)
Re: relfilenode statistics

On Tue, Jun 04, 2024 at 09:26:27AM -0400, Robert Haas wrote:

On Mon, Jun 3, 2024 at 7:11 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

We’d move the current buffer read and buffer hit counters from the relation stats
to the relfilenode stats (while still being able to retrieve them from the
pg_statio_all_tables/indexes views: see the example for the new heap_blks_written
stat added in the patch). Generally speaking, I think that tracking counters at
a common level (i.e relfilenode level instead of table or index level) is
beneficial (avoid storing/allocating space for the same counters in multiple
structs) and sounds more intuitive to me.

Hmm. So if I CLUSTER or VACUUM FULL the relation, the relfilenode
changes. Does that mean I lose all of those stats? Is that a problem?
Or is it good? Or what?

I think we should keep the stats in the relation during relfilenode changes.
As a POC, v1 implemented a way to do so during TRUNCATE (see the changes in
table_relation_set_new_filelocator() and in pg_statio_all_tables): as you can
see in the example provided up-thread the new heap_blks_written statistic has
been preserved during the TRUNCATE.

Please note that the v1 POC only takes care of the new heap_blks_written stat and
that the logic used in table_relation_set_new_filelocator() would probably need
to be applied in rebuild_relation() or such to deal with CLUSTER or VACUUM FULL.

For the relation, the new counter "blocks_written" has been added to the
PgStat_StatTabEntry struct (it's not needed in the PgStat_TableCounts one as the
relfilenode stat takes care of it). It's added in PgStat_StatTabEntry only
to copy/preserve the relfilenode stats during rewrite operations and to retrieve
the stats in pg_statio_all_tables.

Then, if later we split the relation stats to index/table stats, we'd have
blocks_written defined in less structs (as compare to doing the split without
relfilenode stat in place).

As mentioned up-thread, the new logic has been implemented in v1 only for the
new blocks_written stat (we'd need to do the same for the existing buffer read /
buffer hit if we agree on the approach implemented in v1).

I also thought about the other direction. Suppose I drop the a
relation and create a new one that gets a different relation OID but
the same relfilenode. But I don't think that's a problem: dropping the
relation should forcibly remove the old stats, so there won't be any
conflict in this case.

Yeah.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#3)
1 attachment(s)
Re: relfilenode statistics

Hi,

On Mon, Jun 03, 2024 at 11:11:46AM +0000, Bertrand Drouvot wrote:

Yeah, I’ll update the commit message in V2 with better explanations once I get
feedback on V1 (should we decide to move on with the relfilenode stats idea).

Please find attached v2, mandatory rebase due to cd312adc56. In passing it
provides a more detailed commit message (also making clear that the goal of this
patch is to start the discussion and agree on the design before moving forward.)

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-Provide-relfilenode-statistics.patchtext/x-diff; charset=utf-8Download
From 81d25e077c9f4eafa5304c257d1b39ee8a811628 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Nov 2023 02:30:01 +0000
Subject: [PATCH v2] Provide relfilenode statistics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation.

relfilenode stats is also beneficial for the "Split index and table statistics
into different types of stats" work in progress: it would allow us to avoid
additional branches in some situations.

=== Remarks ===

This is a POC patch. There is still work to do: there is more places we should
add relfilenode counters, create more APIS to retrieve the relfilenode stats,
the patch takes care of rewrite generated by TRUNCATE but there is more to
care about like CLUSTER,VACUUM FULL.

The new logic to retrieve stats in pg_statio_all_tables has been implemented
only for the new blocks_written stat (we'd need to do the same for the existing
buffer read / buffer hit if we agree on the approach implemented here).

The goal of this patch is to start the discussion and agree on the design before
moving forward.
---
 src/backend/access/rmgrdesc/xactdesc.c        |   5 +-
 src/backend/catalog/storage.c                 |   8 ++
 src/backend/catalog/system_functions.sql      |   2 +-
 src/backend/catalog/system_views.sql          |   5 +-
 src/backend/postmaster/checkpointer.c         |   5 +
 src/backend/storage/buffer/bufmgr.c           |   6 +-
 src/backend/storage/smgr/md.c                 |   7 ++
 src/backend/utils/activity/pgstat.c           |  39 ++++--
 src/backend/utils/activity/pgstat_database.c  |  12 +-
 src/backend/utils/activity/pgstat_function.c  |  13 +-
 src/backend/utils/activity/pgstat_relation.c  | 112 ++++++++++++++++--
 src/backend/utils/activity/pgstat_replslot.c  |  13 +-
 src/backend/utils/activity/pgstat_shmem.c     |  19 ++-
 .../utils/activity/pgstat_subscription.c      |  12 +-
 src/backend/utils/activity/pgstat_xact.c      |  60 +++++++---
 src/backend/utils/adt/pgstatfuncs.c           |  34 +++++-
 src/include/access/tableam.h                  |  19 +++
 src/include/access/xact.h                     |   1 +
 src/include/catalog/pg_proc.dat               |  14 ++-
 src/include/pgstat.h                          |  19 ++-
 src/include/utils/pgstat_internal.h           |  34 ++++--
 src/test/recovery/t/029_stats_restart.pl      |  40 +++----
 .../recovery/t/030_stats_cleanup_replica.pl   |   6 +-
 src/test/regress/expected/rules.out           |  12 +-
 src/test/regress/expected/stats.out           |  30 ++---
 src/test/regress/sql/stats.sql                |  30 ++---
 src/test/subscription/t/026_stats.pl          |   4 +-
 src/tools/pgindent/typedefs.list              |   1 +
 28 files changed, 415 insertions(+), 147 deletions(-)
   4.6% src/backend/catalog/
  47.8% src/backend/utils/activity/
   6.5% src/backend/utils/adt/
   3.7% src/backend/
   3.3% src/include/access/
   3.3% src/include/catalog/
   6.2% src/include/utils/
   3.3% src/include/
  12.1% src/test/recovery/t/
   5.5% src/test/regress/expected/
   3.0% src/test/

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index dccca201e0..c02b079645 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -319,10 +319,11 @@ xact_desc_stats(StringInfo buf, const char *label,
 		appendStringInfo(buf, "; %sdropped stats:", label);
 		for (i = 0; i < ndropped; i++)
 		{
-			appendStringInfo(buf, " %d/%u/%u",
+			appendStringInfo(buf, " %d/%u/%u/%u",
 							 dropped_stats[i].kind,
 							 dropped_stats[i].dboid,
-							 dropped_stats[i].objoid);
+							 dropped_stats[i].objoid,
+							 dropped_stats[i].relfile);
 		}
 	}
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index f56b3cc0f2..db6107cd90 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -33,6 +33,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 
 /* GUC variables */
@@ -152,6 +153,7 @@ RelationCreateStorage(RelFileLocator rlocator, char relpersistence,
 	if (needs_wal)
 		log_smgrcreate(&srel->smgr_rlocator.locator, MAIN_FORKNUM);
 
+	pgstat_create_transactional(PGSTAT_KIND_RELFILENODE, rlocator.dbOid, rlocator.spcOid, rlocator.relNumber);
 	/*
 	 * Add the relation to the list of stuff to delete at abort, if we are
 	 * asked to do so.
@@ -227,6 +229,8 @@ RelationDropStorage(Relation rel)
 	 * for now I'll keep the logic simple.
 	 */
 
+	pgstat_drop_transactional(PGSTAT_KIND_RELFILENODE, rel->rd_locator.dbOid, rel->rd_locator.spcOid,  rel->rd_locator.relNumber);
+
 	RelationCloseSmgr(rel);
 }
 
@@ -253,6 +257,9 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 	PendingRelDelete *pending;
 	PendingRelDelete *prev;
 	PendingRelDelete *next;
+	PgStat_SubXactStatus *xact_state;
+
+	xact_state = pgStatXactStack;
 
 	prev = NULL;
 	for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -267,6 +274,7 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 			else
 				pendingDeletes = next;
 			pfree(pending);
+			PgStat_RemoveRelFileNodeFromDroppedStats(xact_state, rlocator);
 			/* prev does not change */
 		}
 		else
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index ae099e328c..140c8d556c 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -681,7 +681,7 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM publ
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid, oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_stats(oid) FROM public;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 53047cab5f..b0d7af6df0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -750,6 +750,7 @@ CREATE VIEW pg_statio_all_tables AS
             C.relname AS relname,
             pg_stat_get_blocks_fetched(C.oid) -
                     pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
+			pg_stat_get_blocks_written(C.oid) + pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END, C.relfilenode) AS heap_blks_written,
             pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
             I.idx_blks_read AS idx_blks_read,
             I.idx_blks_hit AS idx_blks_hit,
@@ -758,7 +759,7 @@ CREATE VIEW pg_statio_all_tables AS
             pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
             X.idx_blks_read AS tidx_blks_read,
             X.idx_blks_hit AS tidx_blks_hit
-    FROM pg_class C LEFT JOIN
+    FROM pg_database d, pg_class C LEFT JOIN
             pg_class T ON C.reltoastrelid = T.oid
             LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
             LEFT JOIN LATERAL (
@@ -775,7 +776,7 @@ CREATE VIEW pg_statio_all_tables AS
                      sum(pg_stat_get_blocks_hit(indexrelid))::bigint
                      AS idx_blks_hit
               FROM pg_index WHERE indrelid = T.oid ) X ON true
-    WHERE C.relkind IN ('r', 't', 'm');
+    WHERE C.relkind IN ('r', 't', 'm') AND d.datname = current_database();
 
 CREATE VIEW pg_statio_sys_tables AS
     SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3c68a9904d..0ff2812218 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -519,6 +519,11 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_checkpointer();
 		pgstat_report_wal(true);
+		/*
+		 *  No need to check for transaction state in checkpointer before
+		 *  calling pgstat_report_stat().
+		 */
+		pgstat_report_stat(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 49637284f9..06d89ba26b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1121,9 +1121,9 @@ PinBufferForBlock(Relation rel,
 		 * WaitReadBuffers() (so, not for hits, and not for buffers that are
 		 * zeroed instead), the per-relation stats always count them.
 		 */
-		pgstat_count_buffer_read(rel);
+		pgstat_report_relfilenode_buffer_read(rel);
 		if (*foundPtr)
-			pgstat_count_buffer_hit(rel);
+			pgstat_report_relfilenode_buffer_hit(rel);
 	}
 	if (*foundPtr)
 	{
@@ -3838,6 +3838,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 
 	pgBufferUsage.shared_blks_written++;
 
+	pgstat_report_relfilenode_blks_written(reln->smgr_rlocator.locator);
+
 	/*
 	 * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
 	 * end the BM_IO_IN_PROGRESS state.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 6796756358..5bc5fc65cd 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1447,12 +1447,16 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 {
 	SMgrRelation *srels;
 	int			i;
+	int         not_freed_count = 0;
 
 	srels = palloc(sizeof(SMgrRelation) * ndelrels);
 	for (i = 0; i < ndelrels; i++)
 	{
 		SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
 
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELFILENODE, delrels[i].dbOid, delrels[i].spcOid, delrels[i].relNumber))
+			not_freed_count++;
+
 		if (isRedo)
 		{
 			ForkNumber	fork;
@@ -1463,6 +1467,9 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 		srels[i] = srel;
 	}
 
+	if (not_freed_count > 0)
+		pgstat_request_entry_refs_gc();
+
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index dcc2ad8d95..e3b6f45828 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -288,6 +288,19 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
+	[PGSTAT_KIND_RELFILENODE] = {
+		.name = "relfilenode",
+
+		.fixed_amount = false,
+
+		.shared_size = sizeof(PgStatShared_RelFileNode),
+		.shared_data_off = offsetof(PgStatShared_RelFileNode, stats),
+		.shared_data_len = sizeof(((PgStatShared_RelFileNode *) 0)->stats),
+		.pending_size = sizeof(PgStat_StatRelFileNodeEntry),
+
+		.flush_pending_cb = pgstat_relfilenode_flush_cb,
+	},
+
 	[PGSTAT_KIND_FUNCTION] = {
 		.name = "function",
 
@@ -651,7 +664,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / relfilenode / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush IO stats */
@@ -731,7 +744,7 @@ pgstat_reset_counters(void)
  * GRANT system.
  */
 void
-pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 	TimestampTz ts = GetCurrentTimestamp();
@@ -740,7 +753,7 @@ pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
 
 	/* reset the "single counter" */
-	pgstat_reset_entry(kind, dboid, objoid, ts);
+	pgstat_reset_entry(kind, dboid, objoid, relfile, ts);
 
 	if (!kind_info->accessed_across_databases)
 		pgstat_reset_database_timestamp(dboid, ts);
@@ -809,7 +822,7 @@ pgstat_clear_snapshot(void)
 }
 
 void *
-pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_HashKey key;
 	PgStat_EntryRef *entry_ref;
@@ -825,6 +838,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objoid = objoid;
+	key.relfile = relfile;
 
 	/* if we need to build a full snapshot, do so */
 	if (pgstat_fetch_consistency == PGSTAT_FETCH_CONSISTENCY_SNAPSHOT)
@@ -850,7 +864,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 
 	pgStatLocal.snapshot.mode = pgstat_fetch_consistency;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->shared_entry->dropped)
 	{
@@ -919,13 +933,13 @@ pgstat_get_stat_snapshot_timestamp(bool *have_snapshot)
 }
 
 bool
-pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	/* fixed-numbered stats always exist */
 	if (pgstat_get_kind_info(kind)->fixed_amount)
 		return true;
 
-	return pgstat_get_entry_ref(kind, dboid, objoid, false, NULL) != NULL;
+	return pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL) != NULL;
 }
 
 /*
@@ -1102,7 +1116,8 @@ pgstat_build_snapshot_fixed(PgStat_Kind kind)
  * created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry)
+pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid,
+						  RelFileNumber relfile, bool *created_entry)
 {
 	PgStat_EntryRef *entry_ref;
 
@@ -1117,7 +1132,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
 								  ALLOCSET_SMALL_SIZES);
 	}
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid,
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile,
 									 true, created_entry);
 
 	if (entry_ref->pending == NULL)
@@ -1140,11 +1155,11 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
  * that it shouldn't be needed.
  */
 PgStat_EntryRef *
-pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->pending == NULL)
 		return NULL;
@@ -1173,7 +1188,7 @@ pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
 }
 
 /*
- * Flush out pending stats for database objects (databases, relations,
+ * Flush out pending stats for database objects (databases, relations, relfilenodes,
  * functions).
  */
 static bool
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 29bc090974..cf77f2dbdb 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -43,7 +43,7 @@ static PgStat_Counter pgLastSessionReportTime = 0;
 void
 pgstat_drop_database(Oid databaseid)
 {
-	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid);
+	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid, InvalidOid);
 }
 
 /*
@@ -66,7 +66,7 @@ pgstat_report_autovac(Oid dboid)
 	 * operation so it doesn't matter if we get blocked here a little.
 	 */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE,
-											dboid, InvalidOid, false);
+											dboid, InvalidOid, InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) entry_ref->shared_stats;
 	dbentry->stats.last_autovac_time = GetCurrentTimestamp();
@@ -150,7 +150,7 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
 	 * common enough for that to be a problem.
 	 */
 	entry_ref =
-		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, false);
+		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid, false);
 
 	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
 	sharedent->stats.checksum_failures += failurecount;
@@ -242,7 +242,7 @@ PgStat_StatDBEntry *
 pgstat_fetch_stat_dbentry(Oid dboid)
 {
 	return (PgStat_StatDBEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid);
+		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid);
 }
 
 void
@@ -341,7 +341,7 @@ pgstat_prep_database_pending(Oid dboid)
 	Assert(!OidIsValid(dboid) || OidIsValid(MyDatabaseId));
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid,
-										  NULL);
+										  InvalidOid, NULL);
 
 	return entry_ref->pending;
 }
@@ -357,7 +357,7 @@ pgstat_reset_database_timestamp(Oid dboid, TimestampTz ts)
 	PgStatShared_Database *dbentry;
 
 	dbref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, MyDatabaseId, InvalidOid,
-										false);
+										InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) dbref->shared_stats;
 	dbentry->stats.stat_reset_timestamp = ts;
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d26da551a4..440e44e300 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -46,7 +46,8 @@ pgstat_create_function(Oid proid)
 {
 	pgstat_create_transactional(PGSTAT_KIND_FUNCTION,
 								MyDatabaseId,
-								proid);
+								proid,
+								InvalidOid);
 }
 
 /*
@@ -61,7 +62,8 @@ pgstat_drop_function(Oid proid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_FUNCTION,
 							  MyDatabaseId,
-							  proid);
+							  proid,
+							  InvalidOid);
 }
 
 /*
@@ -86,6 +88,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_FUNCTION,
 										  MyDatabaseId,
 										  fcinfo->flinfo->fn_oid,
+										  InvalidOid,
 										  &created_entry);
 
 	/*
@@ -113,7 +116,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 		if (!SearchSysCacheExists1(PROCOID, ObjectIdGetDatum(fcinfo->flinfo->fn_oid)))
 		{
 			pgstat_drop_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId,
-							  fcinfo->flinfo->fn_oid);
+							  fcinfo->flinfo->fn_oid, InvalidOid);
 			ereport(ERROR, errcode(ERRCODE_UNDEFINED_FUNCTION),
 					errmsg("function call to dropped function"));
 		}
@@ -224,7 +227,7 @@ find_funcstat_entry(Oid func_id)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 
 	if (entry_ref)
 		return entry_ref->pending;
@@ -239,5 +242,5 @@ PgStat_StatFuncEntry *
 pgstat_fetch_stat_funcentry(Oid func_id)
 {
 	return (PgStat_StatFuncEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 }
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 8a3f7d434c..136dd6c85b 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -44,6 +44,7 @@ typedef struct TwoPhasePgStatRecord
 
 
 static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+PgStat_StatRelFileNodeEntry *pgstat_prep_relfilenode_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -69,6 +70,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
 										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
 										  RelationGetRelid(dst),
+										  InvalidOid,
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
@@ -170,7 +172,7 @@ pgstat_create_relation(Relation rel)
 {
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
 								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								RelationGetRelid(rel), InvalidOid);
 }
 
 /*
@@ -184,7 +186,7 @@ pgstat_drop_relation(Relation rel)
 
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
 							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  RelationGetRelid(rel), InvalidOid);
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -225,7 +227,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											dboid, tableoid, InvalidOid, false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -318,6 +320,7 @@ pgstat_report_analyze(Relation rel,
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
 											RelationGetRelid(rel),
+											InvalidOid,
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -458,6 +461,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
 	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
 }
 
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the collected statistics for one relfilenode or NULL. NULL doesn't mean
+ * that the relfilenode doesn't exist, just that there are no statistics, so the
+ * caller is better off to report ZERO instead.
+ */
+PgStat_StatRelFileNodeEntry *
+pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile)
+{
+	return (PgStat_StatRelFileNodeEntry *)
+		pgstat_fetch_entry(PGSTAT_KIND_RELFILENODE, dboid, spcOid, relfile);
+}
+
 /*
  * More efficient version of pgstat_fetch_stat_tabentry(), allowing to specify
  * whether the to-be-accessed table is a shared relation or not.
@@ -468,7 +484,7 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
 	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 
 	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid, InvalidOid);
 }
 
 /*
@@ -491,10 +507,10 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id, InvalidOid);
 	if (!entry_ref)
 	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id, InvalidOid);
 		if (!entry_ref)
 			return tablestatus;
 	}
@@ -881,6 +897,38 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	return true;
 }
 
+/*
+ * Flush out pending stats for the relfilenode entry
+ *
+ * If nowait is true, this function returns false if lock could not
+ * immediately acquired, otherwise true is returned.
+ */
+bool
+pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_RelFileNode *sharedent;
+	PgStat_StatRelFileNodeEntry *pendingent;
+
+	pendingent = (PgStat_StatRelFileNodeEntry *) entry_ref->pending;
+	sharedent = (PgStatShared_RelFileNode *) entry_ref->shared_stats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+#define PGSTAT_ACCUM_RELFILENODECOUNT(item)      \
+		(sharedent)->stats.item += (pendingent)->item
+
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_fetched);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_hit);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_written);
+
+	pgstat_unlock_entry(entry_ref);
+
+	memset(pendingent, 0, sizeof(*pendingent));
+
+	return true;
+}
+
 void
 pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref)
 {
@@ -902,7 +950,7 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
 										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  rel_id, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->id = rel_id;
 	pending->shared = isshared;
@@ -910,6 +958,56 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 	return pending;
 }
 
+PgStat_StatRelFileNodeEntry *
+pgstat_prep_relfilenode_pending(RelFileLocator locator)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELFILENODE, locator.dbOid,
+										  locator.spcOid, locator.relNumber, NULL);
+
+	return entry_ref->pending;
+}
+
+void
+pgstat_report_relfilenode_blks_written(RelFileLocator locator)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	relfileentry = pgstat_prep_relfilenode_pending(locator);
+
+	if (relfileentry)
+		relfileentry->blocks_written++;
+}
+
+void
+pgstat_report_relfilenode_buffer_read(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_read(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_fetched++;
+}
+
+void
+pgstat_report_relfilenode_buffer_hit(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_hit(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_hit++;
+}
+
 /*
  * add a new (sub)transaction state record
  */
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index da11b86744..2e68ed4a09 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -62,7 +62,7 @@ pgstat_reset_replslot(const char *name)
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
-					 ReplicationSlotIndex(slot));
+					 ReplicationSlotIndex(slot), InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 }
@@ -82,7 +82,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
 	PgStat_StatReplSlotEntry *statent;
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 	statent = &shstatent->stats;
 
@@ -116,7 +116,7 @@ pgstat_create_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 
 	/*
@@ -146,7 +146,7 @@ void
 pgstat_acquire_replslot(ReplicationSlot *slot)
 {
 	pgstat_get_entry_ref(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						 ReplicationSlotIndex(slot), true, NULL);
+						 ReplicationSlotIndex(slot), InvalidOid, true, NULL);
 }
 
 /*
@@ -158,7 +158,7 @@ pgstat_drop_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	if (!pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						   ReplicationSlotIndex(slot)))
+						   ReplicationSlotIndex(slot), InvalidOid))
 		pgstat_request_entry_refs_gc();
 }
 
@@ -178,7 +178,7 @@ pgstat_fetch_replslot(NameData slotname)
 
 	if (idx != -1)
 		slotentry = (PgStat_StatReplSlotEntry *) pgstat_fetch_entry(PGSTAT_KIND_REPLSLOT,
-																	InvalidOid, idx);
+																	InvalidOid, idx, InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 
@@ -210,6 +210,7 @@ pgstat_replslot_from_serialized_name_cb(const NameData *name, PgStat_HashKey *ke
 	key->kind = PGSTAT_KIND_REPLSLOT;
 	key->dboid = InvalidOid;
 	key->objoid = idx;
+	key->relfile = InvalidOid;
 
 	return true;
 }
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 4a4b69891d..047e81f756 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -395,10 +395,10 @@ pgstat_get_entry_ref_cached(PgStat_HashKey key, PgStat_EntryRef **entry_ref_p)
  * if the entry is newly created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, bool create,
-					 bool *created_entry)
+pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile,
+					 bool create, bool *created_entry)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shhashent;
 	PgStatShared_Common *shheader = NULL;
 	PgStat_EntryRef *entry_ref;
@@ -611,12 +611,12 @@ pgstat_unlock_entry(PgStat_EntryRef *entry_ref)
  */
 PgStat_EntryRef *
 pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-							bool nowait)
+							RelFileNumber relfile, bool nowait)
 {
 	PgStat_EntryRef *entry_ref;
 
 	/* find shared table stats entry corresponding to the local entry */
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, true, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, true, NULL);
 
 	/* lock the shared entry to protect the content, skip if failed */
 	if (!pgstat_lock_entry(entry_ref, nowait))
@@ -867,9 +867,9 @@ pgstat_drop_database_and_contents(Oid dboid)
  * pgstat_gc_entry_refs().
  */
 bool
-pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shent;
 	bool		freed = true;
 
@@ -942,13 +942,12 @@ shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header,
  * Reset one variable-numbered stats entry.
  */
 void
-pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts)
+pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts)
 {
 	PgStat_EntryRef *entry_ref;
 
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
-
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 	if (!entry_ref || entry_ref->shared_entry->dropped)
 		return;
 
diff --git a/src/backend/utils/activity/pgstat_subscription.c b/src/backend/utils/activity/pgstat_subscription.c
index d9af8de658..9b9ab2861b 100644
--- a/src/backend/utils/activity/pgstat_subscription.c
+++ b/src/backend/utils/activity/pgstat_subscription.c
@@ -30,7 +30,7 @@ pgstat_report_subscription_error(Oid subid, bool is_apply_error)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 
 	if (is_apply_error)
@@ -47,12 +47,12 @@ pgstat_create_subscription(Oid subid)
 {
 	/* Ensures that stats are dropped if transaction rolls back */
 	pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
-								InvalidOid, subid);
+								InvalidOid, subid, InvalidOid);
 
 	/* Create and initialize the subscription stats entry */
-	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid,
+	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid,
 						 true, NULL);
-	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, 0);
+	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid, 0);
 }
 
 /*
@@ -64,7 +64,7 @@ void
 pgstat_drop_subscription(Oid subid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_SUBSCRIPTION,
-							  InvalidOid, subid);
+							  InvalidOid, subid, InvalidOid);
 }
 
 /*
@@ -75,7 +75,7 @@ PgStat_StatSubEntry *
 pgstat_fetch_stat_subscription(Oid subid)
 {
 	return (PgStat_StatSubEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 1877d22f14..b25df5112b 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -30,7 +30,7 @@ static void AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool
 static void AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 											bool isCommit, int nestDepth);
 
-static PgStat_SubXactStatus *pgStatXactStack = NULL;
+PgStat_SubXactStatus *pgStatXactStack = NULL;
 
 
 /*
@@ -84,7 +84,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that dropped an object committed. Drop the stats
 			 * too.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 		else if (!isCommit && pending->is_create)
@@ -93,7 +93,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that created an object aborted. Drop the stats
 			 * associated with the object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 
@@ -105,6 +105,33 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 		pgstat_request_entry_refs_gc();
 }
 
+/*
+ * Remove a relfilenode stat from the list of stats to be dropped.
+ */
+void
+PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator)
+{
+	dlist_mutable_iter iter;
+
+	if (dclist_count(&xact_state->pending_drops) == 0)
+		return;
+
+	dclist_foreach_modify(iter, &xact_state->pending_drops)
+	{
+		PgStat_PendingDroppedStatsItem *pending =
+			dclist_container(PgStat_PendingDroppedStatsItem, node, iter.cur);
+		xl_xact_stats_item *it = &pending->item;
+
+		if (it->kind == PGSTAT_KIND_RELFILENODE && it->dboid == rlocator.dbOid
+			&& it->objoid == rlocator.spcOid && it->relfile == rlocator.relNumber)
+		{
+			dclist_delete_from(&xact_state->pending_drops, &pending->node);
+			pfree(pending);
+			return;
+		}
+	}
+}
+
 /*
  * Called from access/transam/xact.c at subtransaction commit/abort.
  */
@@ -158,7 +185,7 @@ AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 			 * Subtransaction creating a new stats object aborted. Drop the
 			 * stats object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 			pfree(pending);
 		}
@@ -320,7 +347,11 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 	{
 		xl_xact_stats_item *it = &items[i];
 
-		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+		/* leave it to pgstat_drop_transactional() in RelationDropStorage() */
+		if (it->kind == PGSTAT_KIND_RELFILENODE)
+			continue;
+
+		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 			not_freed_count++;
 	}
 
@@ -329,7 +360,7 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 }
 
 static void
-create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool is_create)
+create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, bool is_create)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_SubXactStatus *xact_state;
@@ -342,6 +373,7 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
 	drop->item.kind = kind;
 	drop->item.dboid = dboid;
 	drop->item.objoid = objoid;
+	drop->item.relfile = relfile;
 
 	dclist_push_tail(&xact_state->pending_drops, &drop->node);
 }
@@ -354,18 +386,18 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
  * dropped.
  */
 void
-pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objoid, false, NULL))
+	if (pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL))
 	{
 		ereport(WARNING,
-				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u",
-					   (pgstat_get_kind_info(kind))->name, dboid, objoid));
+				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u, relfile=%u",
+					   (pgstat_get_kind_info(kind))->name, dboid, objoid, relfile));
 
-		pgstat_reset(kind, dboid, objoid);
+		pgstat_reset(kind, dboid, objoid, relfile);
 	}
 
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ true);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ true);
 }
 
 /*
@@ -376,7 +408,7 @@ pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
  * alive.
  */
 void
-pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ false);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ false);
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3876339ee1..e266d96f5e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -106,6 +106,30 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated)
 /* pg_stat_get_vacuum_count */
 PG_STAT_GET_RELENTRY_INT64(vacuum_count)
 
+#define PG_STAT_GET_RELFILEENTRY_INT64(stat)						\
+Datum															\
+CppConcat(pg_stat_get_relfilenode_,stat)(PG_FUNCTION_ARGS)					\
+{																\
+	Oid			dboid = PG_GETARG_OID(0);						\
+	Oid			 spcOid = PG_GETARG_OID(1);						\
+	RelFileNumber			 relfile = PG_GETARG_OID(2);						\
+	int64		result;											\
+	PgStat_StatRelFileNodeEntry *relfileentry;								\
+																\
+	if ((relfileentry = pgstat_fetch_stat_relfilenodeentry(dboid, spcOid, relfile)) == NULL)	\
+		result = 0;												\
+	else														\
+		result = (int64) (relfileentry->stat);						\
+																\
+	PG_RETURN_INT64(result);									\
+}
+
+/* pg_stat_get_relfilenode_blocks_written */
+PG_STAT_GET_RELFILEENTRY_INT64(blocks_written)
+
+/* pg_stat_get_blocks_written */
+PG_STAT_GET_RELENTRY_INT64(blocks_written)
+
 #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat)					\
 Datum															\
 CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)					\
@@ -1752,7 +1776,7 @@ pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 	Oid			taboid = PG_GETARG_OID(0);
 	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1762,7 +1786,7 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 {
 	Oid			funcoid = PG_GETARG_OID(0);
 
-	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid);
+	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1820,7 +1844,7 @@ pg_stat_reset_subscription_stats(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid subscription OID %u", subid)));
-		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 	}
 
 	PG_RETURN_VOID();
@@ -2028,7 +2052,9 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	char	   *stats_type = text_to_cstring(PG_GETARG_TEXT_P(0));
 	Oid			dboid = PG_GETARG_OID(1);
 	Oid			objoid = PG_GETARG_OID(2);
+	Oid			relfile = PG_GETARG_OID(3);
+
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
-	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
+	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid, relfile));
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 8e583b45cd..792cd2237e 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,7 +21,9 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "executor/tuptable.h"
+#include "pgstat.h"
 #include "storage/read_stream.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -1634,6 +1636,23 @@ table_relation_set_new_filelocator(Relation rel,
 								   TransactionId *freezeXid,
 								   MultiXactId *minmulti)
 {
+	PgStat_StatRelFileNodeEntry *relfileentry;
+	PgStat_StatTabEntry *tabentry = NULL;
+	PgStat_EntryRef *entry_ref = NULL;
+	PgStatShared_Relation *shtabentry;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_RELATION, MyDatabaseId, rel->rd_id, InvalidOid, false, NULL);
+	if (entry_ref)
+	{
+		shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
+		tabentry = &shtabentry->stats;
+	}
+
+	relfileentry = pgstat_fetch_stat_relfilenodeentry(rel->rd_locator.dbOid, rel->rd_locator.spcOid, rel->rd_locator.relNumber);
+
+	if (tabentry && relfileentry)
+		tabentry->blocks_written += relfileentry->blocks_written;
+
 	rel->rd_tableam->relation_set_new_filelocator(rel, newrlocator,
 												  persistence, freezeXid,
 												  minmulti);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 6d4439f052..3b9ed65ff6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -284,6 +284,7 @@ typedef struct xl_xact_stats_item
 	int			kind;
 	Oid			dboid;
 	Oid			objoid;
+	RelFileNumber relfile;
 } xl_xact_stats_item;
 
 typedef struct xl_xact_stats_items
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 6a5476d3c4..912471a1ac 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5374,6 +5374,14 @@
   proname => 'pg_stat_get_tuples_updated', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_tuples_updated' },
+{ oid => '9280', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_relfilenode_blocks_written', provolatile => 's',
+  proparallel => 'r',
+  proargtypes => 'oid oid oid',
+  prorettype => 'int8',
+  proallargtypes => '{oid,oid,oid,int8}',
+  proargmodes => '{i,i,i,o}',
+  prosrc => 'pg_stat_get_relfilenode_blocks_written' },
 { oid => '1933', descr => 'statistics: number of tuples deleted',
   proname => 'pg_stat_get_tuples_deleted', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5413,6 +5421,10 @@
   proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8438', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_blocks_written', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_blocks_written' },
 { oid => '2781', descr => 'statistics: last manual vacuum time for a table',
   proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
   proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5499,7 +5511,7 @@
 
 { oid => '6230', descr => 'statistics: check if a stats object exists',
   proname => 'pg_stat_have_stats', provolatile => 'v', proparallel => 'r',
-  prorettype => 'bool', proargtypes => 'text oid oid',
+  prorettype => 'bool', proargtypes => 'text oid oid oid',
   prosrc => 'pg_stat_have_stats' },
 
 { oid => '6231', descr => 'statistics: information about subscription stats',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2136239710..9631689430 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -14,6 +14,7 @@
 #include "datatype/timestamp.h"
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -40,6 +41,7 @@ typedef enum PgStat_Kind
 	/* stats for variable-numbered objects */
 	PGSTAT_KIND_DATABASE,		/* database-wide statistics */
 	PGSTAT_KIND_RELATION,		/* per-table statistics */
+	PGSTAT_KIND_RELFILENODE,	/* per-relfilenode statistics */
 	PGSTAT_KIND_FUNCTION,		/* per-function statistics */
 	PGSTAT_KIND_REPLSLOT,		/* per-slot statistics */
 	PGSTAT_KIND_SUBSCRIPTION,	/* per-subscription statistics */
@@ -417,6 +419,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
 
 	TimestampTz last_vacuum_time;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
@@ -428,6 +431,13 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+typedef struct PgStat_StatRelFileNodeEntry
+{
+	PgStat_Counter blocks_fetched;
+	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
+} PgStat_StatRelFileNodeEntry;
+
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -478,7 +488,7 @@ extern long pgstat_report_stat(bool force);
 extern void pgstat_force_next_flush(void);
 
 extern void pgstat_reset_counters(void);
-extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_reset_of_kind(PgStat_Kind kind);
 
 /* stats accessors */
@@ -487,7 +497,7 @@ extern TimestampTz pgstat_get_stat_snapshot_timestamp(bool *have_snapshot);
 
 /* helpers */
 extern PgStat_Kind pgstat_get_kind_from_str(char *kind_str);
-extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
@@ -596,6 +606,10 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
+extern void pgstat_report_relfilenode_blks_written(RelFileLocator locator);
+extern void pgstat_report_relfilenode_buffer_read(Relation reln);
+extern void pgstat_report_relfilenode_buffer_hit(Relation reln);
+
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
  * pgstat_assoc_relation() to do so. See its comment for why this is done
@@ -655,6 +669,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
+extern PgStat_StatRelFileNodeEntry *pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
 														   Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index dbbca31602..50d5f1a577 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -53,7 +53,8 @@ typedef struct PgStat_HashKey
 {
 	PgStat_Kind kind;			/* statistics entry kind */
 	Oid			dboid;			/* database ID. InvalidOid for shared objects. */
-	Oid			objoid;			/* object ID, either table or function. */
+	Oid			objoid;			/* object ID, either table or function or tablespace. */
+	RelFileNumber relfile;		/* relfilenumber for RelFileLocator. */
 } PgStat_HashKey;
 
 /*
@@ -376,6 +377,12 @@ typedef struct PgStatShared_Relation
 	PgStat_StatTabEntry stats;
 } PgStatShared_Relation;
 
+typedef struct PgStatShared_RelFileNode
+{
+	PgStatShared_Common header;
+	PgStat_StatRelFileNodeEntry stats;
+} PgStatShared_RelFileNode;
+
 typedef struct PgStatShared_Function
 {
 	PgStatShared_Common header;
@@ -498,6 +505,9 @@ static inline size_t pgstat_get_entry_len(PgStat_Kind kind);
 static inline void *pgstat_get_entry_data(PgStat_Kind kind, PgStatShared_Common *entry);
 
 
+extern PgStat_SubXactStatus *pgStatXactStack;
+extern void PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator);
+
 /*
  * Functions in pgstat.c
  */
@@ -511,10 +521,12 @@ extern void pgstat_assert_is_up(void);
 #endif
 
 extern void pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref);
-extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry);
-extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
+												  Oid objoid, RelFileNumber relfile,
+												  bool *created_entry);
+extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
-extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_snapshot_fixed(PgStat_Kind kind);
 
 
@@ -582,6 +594,7 @@ extern void AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern bool pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 
 
@@ -602,15 +615,16 @@ extern void pgstat_attach_shmem(void);
 extern void pgstat_detach_shmem(void);
 
 extern PgStat_EntryRef *pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid,
-											 bool create, bool *created_entry);
+											 RelFileNumber relfile, bool create,
+											 bool *created_entry);
 extern bool pgstat_lock_entry(PgStat_EntryRef *entry_ref, bool nowait);
 extern bool pgstat_lock_entry_shared(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_unlock_entry(PgStat_EntryRef *entry_ref);
-extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_drop_all_entries(void);
 extern PgStat_EntryRef *pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-													bool nowait);
-extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts);
+													RelFileNumber relfile, bool nowait);
+extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts);
 extern void pgstat_reset_entries_of_kind(PgStat_Kind kind, TimestampTz ts);
 extern void pgstat_reset_matching_entries(bool (*do_reset) (PgStatShared_HashEntry *, Datum),
 										  Datum match_data,
@@ -655,8 +669,8 @@ extern void pgstat_subscription_reset_timestamp_cb(PgStatShared_Common *header,
  */
 
 extern PgStat_SubXactStatus *pgstat_get_xact_stack_level(int nest_level);
-extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
-extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
+extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 6a1615a1e8..ee5a404b45 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -40,10 +40,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 my $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -64,10 +64,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -81,10 +81,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -95,10 +95,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -114,10 +114,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -130,10 +130,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -292,10 +292,10 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objoid) = @_;
+	my ($kind, $dboid, $objoid, $relfile) = @_;
 
 	return $node->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid)");
+		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid, $relfile)");
 }
 
 sub overwrite_file
diff --git a/src/test/recovery/t/030_stats_cleanup_replica.pl b/src/test/recovery/t/030_stats_cleanup_replica.pl
index 74b516cc7c..317df24c4f 100644
--- a/src/test/recovery/t/030_stats_cleanup_replica.pl
+++ b/src/test/recovery/t/030_stats_cleanup_replica.pl
@@ -179,9 +179,9 @@ sub test_standby_func_tab_stats_status
 	my %stats;
 
 	$stats{rel} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid)");
+		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid, 0)");
 	$stats{func} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('function', $dboid, $funcoid)");
+		"SELECT pg_stat_have_stats('function', $dboid, $funcoid, 0)");
 
 	is_deeply(\%stats, \%expected, "$sect: standby stats as expected");
 
@@ -194,7 +194,7 @@ sub test_standby_db_stats_status
 	my ($connect_db, $dboid, $present) = @_;
 
 	is( $node_standby->safe_psql(
-			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0)"),
+			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0, 0)"),
 		$present,
 		"$sect: standby db stats as expected");
 }
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index ef658ad740..a2fa165c4c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2335,6 +2335,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
     (pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)) AS heap_blks_read,
+    (pg_stat_get_blocks_written(c.oid) + pg_stat_get_relfilenode_blocks_written(d.oid,
+        CASE
+            WHEN (c.reltablespace <> (0)::oid) THEN c.reltablespace
+            ELSE d.dattablespace
+        END, c.relfilenode)) AS heap_blks_written,
     pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
     i.idx_blks_read,
     i.idx_blks_hit,
@@ -2342,7 +2347,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
     x.idx_blks_read AS tidx_blks_read,
     x.idx_blks_hit AS tidx_blks_hit
-   FROM ((((pg_class c
+   FROM pg_database d,
+    ((((pg_class c
      LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2353,7 +2359,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
             (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
            FROM pg_index
           WHERE (pg_index.indrelid = t.oid)) x ON (true))
-  WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]));
+  WHERE ((c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) AND (d.datname = current_database()));
 pg_statio_sys_indexes| SELECT relid,
     indexrelid,
     schemaname,
@@ -2374,6 +2380,7 @@ pg_statio_sys_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
@@ -2403,6 +2410,7 @@ pg_statio_user_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..eff0c9372c 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1111,23 +1111,23 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 ERROR:  invalid statistics kind: "zaphod"
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
  pg_stat_have_stats 
 --------------------
  f
 (1 row)
 
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1144,21 +1144,21 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1174,14 +1174,14 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1196,7 +1196,7 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1204,7 +1204,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1212,7 +1212,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1220,7 +1220,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1228,7 +1228,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1513,7 +1513,7 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 (1 row)
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..5a40779989 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -539,12 +539,12 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
 
 -- pg_stat_have_stats returns true for committed index creation
 CREATE table stats_test_tab1 as select generate_series(1,10) a;
@@ -552,40 +552,40 @@ CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 SET enable_seqscan TO off;
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for rolled back index creation
 BEGIN;
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for reindex CONCURRENTLY
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- put enable_seqscan back to on
 SET enable_seqscan TO on;
@@ -759,7 +759,7 @@ SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
 SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
 SELECT pg_stat_reset_shared('io');
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index fb3e5629b3..1f4ae5efd5 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -263,7 +263,7 @@ $node_subscriber->safe_psql($db, qq(DROP SUBSCRIPTION $sub1_name));
 
 # Subscription stats for sub1 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub1_name' should be removed.));
 
@@ -282,7 +282,7 @@ DROP SUBSCRIPTION $sub2_name;
 
 # Subscription stats for sub2 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d427a1c16a..d7385f9bfb 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2118,6 +2118,7 @@ PgStatShared_Function
 PgStatShared_HashEntry
 PgStatShared_IO
 PgStatShared_Relation
+PgStatShared_RelFileNode
 PgStatShared_ReplSlot
 PgStatShared_SLRU
 PgStatShared_Subscription
-- 
2.34.1

#7Robert Haas
robertmhaas@gmail.com
In reply to: Bertrand Drouvot (#5)
Re: relfilenode statistics

On Wed, Jun 5, 2024 at 1:52 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

I think we should keep the stats in the relation during relfilenode changes.
As a POC, v1 implemented a way to do so during TRUNCATE (see the changes in
table_relation_set_new_filelocator() and in pg_statio_all_tables): as you can
see in the example provided up-thread the new heap_blks_written statistic has
been preserved during the TRUNCATE.

Yeah, I think there's something weird about this design. Somehow we're
ending up with both per-relation and per-relfilenode counters:

+ pg_stat_get_blocks_written(C.oid) +
pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN
C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END,
C.relfilenode) AS heap_blks_written,

I'll defer to Andres if he thinks that's awesome, but to me it does
not seem right to track some blocks written in a per-relation counter
and others in a per-relfilenode counter.

--
Robert Haas
EDB: http://www.enterprisedb.com

#8Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#7)
Re: relfilenode statistics

Hi,

On 2024-06-06 12:27:49 -0400, Robert Haas wrote:

On Wed, Jun 5, 2024 at 1:52 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

I think we should keep the stats in the relation during relfilenode changes.
As a POC, v1 implemented a way to do so during TRUNCATE (see the changes in
table_relation_set_new_filelocator() and in pg_statio_all_tables): as you can
see in the example provided up-thread the new heap_blks_written statistic has
been preserved during the TRUNCATE.

Yeah, I think there's something weird about this design. Somehow we're
ending up with both per-relation and per-relfilenode counters:

+ pg_stat_get_blocks_written(C.oid) +
pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN
C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END,
C.relfilenode) AS heap_blks_written,

I'll defer to Andres if he thinks that's awesome, but to me it does
not seem right to track some blocks written in a per-relation counter
and others in a per-relfilenode counter.

It doesn't immediately sound awesome. Nor really necessary?

If we just want to keep prior stats upon arelation rewrite, we can just copy
the stats from the old relfilenode. Or we can decide that those stats don't
really make sense anymore, and start from scratch.

I *guess* I could see an occasional benefit in having both counter for "prior
relfilenodes" and "current relfilenode" - except that stats get reset manually
and upon crash anyway, making this less useful than if it were really
"lifetime" stats.

Greetings,

Andres Freund

#9Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#3)
Re: relfilenode statistics

Hi,

On 2024-06-03 11:11:46 +0000, Bertrand Drouvot wrote:

The main argument is that we currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation (example in the v1 patch and in the message up-thread).

I think that adding instrumentation in this area (writes counters) could be
beneficial (like it is for the ones we currently have for reads).

Second argument is that this is also beneficial for the "Split index and
table statistics into different types of stats" thread (mentioned in the previous
message). It would allow us to avoid additional branches in some situations (like
the one mentioned by Andres in the link I provided up-thread).

I think there's another *very* significant benefit:

Right now physical replication doesn't populate statistics fields like
n_dead_tup, which can be a huge issue after failovers, because there's little
information about what autovacuum needs to do.

Auto-analyze *partially* can fix it at times, if it's lucky enough to see
enough dead tuples - but that's not a given and even if it works, is often
wildly inaccurate.

Once we put things like n_dead_tup into per-relfilenode stats, we can populate
them during WAL replay. Thus after a promotion autovacuum has much better
data.

This also is important when we crash: We've been talking about storing a
snapshot of the stats alongside each REDO pointer. Combined with updating
stats during crash recovery, we'll have accurate dead-tuple stats once recovey
has finished.

Greetings,

Andres Freund

#10Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#9)
Re: relfilenode statistics

Hi,

On Thu, Jun 06, 2024 at 08:38:06PM -0700, Andres Freund wrote:

Hi,

On 2024-06-03 11:11:46 +0000, Bertrand Drouvot wrote:

The main argument is that we currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation (example in the v1 patch and in the message up-thread).

I think that adding instrumentation in this area (writes counters) could be
beneficial (like it is for the ones we currently have for reads).

Second argument is that this is also beneficial for the "Split index and
table statistics into different types of stats" thread (mentioned in the previous
message). It would allow us to avoid additional branches in some situations (like
the one mentioned by Andres in the link I provided up-thread).

I think there's another *very* significant benefit:

Right now physical replication doesn't populate statistics fields like
n_dead_tup, which can be a huge issue after failovers, because there's little
information about what autovacuum needs to do.

Auto-analyze *partially* can fix it at times, if it's lucky enough to see
enough dead tuples - but that's not a given and even if it works, is often
wildly inaccurate.

Once we put things like n_dead_tup into per-relfilenode stats,

Hm - I had in mind to populate relfilenode stats only with stats that are
somehow related to I/O activities. Which ones do you have in mind to put in
relfilenode stats?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#8)
Re: relfilenode statistics

Hi,

On Thu, Jun 06, 2024 at 08:17:36PM -0700, Andres Freund wrote:

Hi,

On 2024-06-06 12:27:49 -0400, Robert Haas wrote:

On Wed, Jun 5, 2024 at 1:52 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

I think we should keep the stats in the relation during relfilenode changes.
As a POC, v1 implemented a way to do so during TRUNCATE (see the changes in
table_relation_set_new_filelocator() and in pg_statio_all_tables): as you can
see in the example provided up-thread the new heap_blks_written statistic has
been preserved during the TRUNCATE.

Yeah, I think there's something weird about this design. Somehow we're
ending up with both per-relation and per-relfilenode counters:

+ pg_stat_get_blocks_written(C.oid) +
pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN
C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END,
C.relfilenode) AS heap_blks_written,

I'll defer to Andres if he thinks that's awesome, but to me it does
not seem right to track some blocks written in a per-relation counter
and others in a per-relfilenode counter.

It doesn't immediately sound awesome. Nor really necessary?

If we just want to keep prior stats upon arelation rewrite, we can just copy
the stats from the old relfilenode.

Agree, that's another option. But I think that would be in another field like
"cumulative_XXX" to ensure one could still retrieve stats that are "dedicated"
to this particular "new" relfilenode. Thoughts?

Or we can decide that those stats don't
really make sense anymore, and start from scratch.

I *guess* I could see an occasional benefit in having both counter for "prior
relfilenodes" and "current relfilenode" - except that stats get reset manually
and upon crash anyway, making this less useful than if it were really
"lifetime" stats.

Right but currently they are not lost during a relation rewrite. If we decide to
not keep the relfilenode stats during a rewrite then things like heap_blks_read
would stop surviving a rewrite (if we move it to relfilenode stats) while it
currently does.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#8)
Re: relfilenode statistics

On Thu, Jun 6, 2024 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:

If we just want to keep prior stats upon arelation rewrite, we can just copy
the stats from the old relfilenode. Or we can decide that those stats don't
really make sense anymore, and start from scratch.

I think we need to think carefully about what we want the user
experience to be here. "Per-relfilenode stats" could mean "sometimes I
don't know the relation OID so I want to use the relfilenumber
instead, without changing the user experience" or it could mean "some
of these stats actually properly pertain to the relfilenode rather
than the relation so I want to associate them with the right object
and that will affect how the user sees things." We need to decide
which it is. If it's the former, then we need to examine whether the
goal of hiding the distinction between relfilenode stats and relation
stats from the user is in fact feasible. If it's the latter, then we
need to make sure the whole patch reflects that design, which would
include e.g. NOT copying stats from the old to the new relfilenode,
and which would also include documenting the behavior in a way that
will be understandable to users.

In my experience, the worst thing you can do in cases like this is be
somewhere in the middle. Then you tend to end up with stuff like: the
difference isn't supposed to be something that the user knows or cares
about, except that they do have to know and care because you haven't
thoroughly covered up the deception, and often they have to reverse
engineer the behavior because you didn't document what was really
happening because you imagined that they wouldn't notice.

--
Robert Haas
EDB: http://www.enterprisedb.com

#13Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Robert Haas (#12)
Re: relfilenode statistics

Hi,

On Fri, Jun 07, 2024 at 09:24:41AM -0400, Robert Haas wrote:

On Thu, Jun 6, 2024 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:

If we just want to keep prior stats upon arelation rewrite, we can just copy
the stats from the old relfilenode. Or we can decide that those stats don't
really make sense anymore, and start from scratch.

I think we need to think carefully about what we want the user
experience to be here. "Per-relfilenode stats" could mean "sometimes I
don't know the relation OID so I want to use the relfilenumber
instead, without changing the user experience" or it could mean "some
of these stats actually properly pertain to the relfilenode rather
than the relation so I want to associate them with the right object
and that will affect how the user sees things." We need to decide
which it is. If it's the former, then we need to examine whether the
goal of hiding the distinction between relfilenode stats and relation
stats from the user is in fact feasible. If it's the latter, then we
need to make sure the whole patch reflects that design, which would
include e.g. NOT copying stats from the old to the new relfilenode,
and which would also include documenting the behavior in a way that
will be understandable to users.

Thanks for sharing your thoughts!

Let's take the current heap_blks_read as an example: it currently survives
a relation rewrite and I guess we don't want to change the existing user
experience for it.

Now say we want to add "heap_blks_written" (like in this POC patch) then I think
that it makes sense for the user to 1) query this new stat from the same place
as the existing heap_blks_read: from pg_statio_all_tables and 2) to have the same
experience as far the relation rewrite is concerned (keep the previous stats).

To achieve the rewrite behavior we could:

1) copy the stats from the OLD relfilenode to the relation (like in the POC patch)
2) copy the stats from the OLD relfilenode to the NEW one (could be in a dedicated
field)

The PROS of 1) is that the behavior is consistent with the current heap_blks_read
and that the user could still see the current relfilenode stats (through a new API)
if he wants to.

In my experience, the worst thing you can do in cases like this is be
somewhere in the middle. Then you tend to end up with stuff like: the
difference isn't supposed to be something that the user knows or cares
about, except that they do have to know and care because you haven't
thoroughly covered up the deception, and often they have to reverse
engineer the behavior because you didn't document what was really
happening because you imagined that they wouldn't notice.

My idea was to move all that is in pg_statio_all_tables to relfilenode stats
and 1) add new stats to pg_statio_all_tables (like heap_blks_written), 2) ensure
the user can still retrieve the stats from pg_statio_all_tables in such a way
that it survives a rewrite, 3) provide dedicated APIs to retrieve
relfilenode stats but only for the current relfilenode, 4) document this
behavior. This is what the POC patch is doing for heap_blks_written (would
need to do the same for heap_blks_read and friends) except for the documentation
part. What do you think?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#14Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bertrand Drouvot (#13)
Re: relfilenode statistics

At Mon, 10 Jun 2024 08:09:56 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in

Hi,

On Fri, Jun 07, 2024 at 09:24:41AM -0400, Robert Haas wrote:

On Thu, Jun 6, 2024 at 11:17 PM Andres Freund <andres@anarazel.de> wrote:

If we just want to keep prior stats upon arelation rewrite, we can just copy
the stats from the old relfilenode. Or we can decide that those stats don't
really make sense anymore, and start from scratch.

I think we need to think carefully about what we want the user
experience to be here. "Per-relfilenode stats" could mean "sometimes I
don't know the relation OID so I want to use the relfilenumber
instead, without changing the user experience" or it could mean "some
of these stats actually properly pertain to the relfilenode rather
than the relation so I want to associate them with the right object
and that will affect how the user sees things." We need to decide
which it is. If it's the former, then we need to examine whether the
goal of hiding the distinction between relfilenode stats and relation
stats from the user is in fact feasible. If it's the latter, then we
need to make sure the whole patch reflects that design, which would
include e.g. NOT copying stats from the old to the new relfilenode,
and which would also include documenting the behavior in a way that
will be understandable to users.

Thanks for sharing your thoughts!

Let's take the current heap_blks_read as an example: it currently survives
a relation rewrite and I guess we don't want to change the existing user
experience for it.

Now say we want to add "heap_blks_written" (like in this POC patch) then I think
that it makes sense for the user to 1) query this new stat from the same place
as the existing heap_blks_read: from pg_statio_all_tables and 2) to have the same
experience as far the relation rewrite is concerned (keep the previous stats).

To achieve the rewrite behavior we could:

1) copy the stats from the OLD relfilenode to the relation (like in the POC patch)
2) copy the stats from the OLD relfilenode to the NEW one (could be in a dedicated
field)

The PROS of 1) is that the behavior is consistent with the current heap_blks_read
and that the user could still see the current relfilenode stats (through a new API)
if he wants to.

In my experience, the worst thing you can do in cases like this is be
somewhere in the middle. Then you tend to end up with stuff like: the
difference isn't supposed to be something that the user knows or cares
about, except that they do have to know and care because you haven't
thoroughly covered up the deception, and often they have to reverse
engineer the behavior because you didn't document what was really
happening because you imagined that they wouldn't notice.

My idea was to move all that is in pg_statio_all_tables to relfilenode stats
and 1) add new stats to pg_statio_all_tables (like heap_blks_written), 2) ensure
the user can still retrieve the stats from pg_statio_all_tables in such a way
that it survives a rewrite, 3) provide dedicated APIs to retrieve
relfilenode stats but only for the current relfilenode, 4) document this
behavior. This is what the POC patch is doing for heap_blks_written (would
need to do the same for heap_blks_read and friends) except for the documentation
part. What do you think?

In my opinion, it is certainly strange that bufmgr is aware of
relation kinds, but introducing relfilenode stats to avoid this skew
doesn't seem to be the best way, as it invites inconclusive arguments
like the one raised above. The fact that we transfer counters from old
relfilenodes to new ones indicates that we are not really interested
in counts by relfilenode. If that's the case, wouldn't it be simpler
to call pgstat_count_relation_buffer_read() from bufmgr.c and then
branch according to relkind within that function? If you're concerned
about the additional branch, some ingenuity may be needed.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#15Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kyotaro Horiguchi (#14)
Re: relfilenode statistics

Hi,

On Tue, Jun 11, 2024 at 03:35:23PM +0900, Kyotaro Horiguchi wrote:

At Mon, 10 Jun 2024 08:09:56 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in

My idea was to move all that is in pg_statio_all_tables to relfilenode stats
and 1) add new stats to pg_statio_all_tables (like heap_blks_written), 2) ensure
the user can still retrieve the stats from pg_statio_all_tables in such a way
that it survives a rewrite, 3) provide dedicated APIs to retrieve
relfilenode stats but only for the current relfilenode, 4) document this
behavior. This is what the POC patch is doing for heap_blks_written (would
need to do the same for heap_blks_read and friends) except for the documentation
part. What do you think?

In my opinion,

Thanks for looking at it!

it is certainly strange that bufmgr is aware of
relation kinds, but introducing relfilenode stats to avoid this skew
doesn't seem to be the best way, as it invites inconclusive arguments
like the one raised above. The fact that we transfer counters from old
relfilenodes to new ones indicates that we are not really interested
in counts by relfilenode. If that's the case, wouldn't it be simpler
to call pgstat_count_relation_buffer_read() from bufmgr.c and then
branch according to relkind within that function? If you're concerned
about the additional branch, some ingenuity may be needed.

That may be doable for "read" activities but what about write activities?
Do you mean not relying on relfilenode stats for reads but relying on relfilenode
stats for writes?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#16Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#1)
Re: relfilenode statistics

On Sat, May 25, 2024 at 07:52:02AM +0000, Bertrand Drouvot wrote:

But I think that it is in a state that can be used to discuss the approach it
is implementing (so that we can agree or not on it) before moving
forward.

I have read through the patch to get an idea of how things are done,
and I am troubled by the approach taken (mentioned down by you), but
that's invasive compared to how pgstats wants to be transparent with
its stats kinds.

+   Oid         objoid;         /* object ID, either table or function
or tablespace. */
+   RelFileNumber relfile;      /* relfilenumber for RelFileLocator. */
 } PgStat_HashKey;

This adds a relfilenode component to the central hash key used for the
dshash of pgstats, which is something most stats types don't care
about. That looks like the incorrect thing to do to me, particularly
seeing a couple of lines down that a stats kind is assigned so the
HashKey uniqueness is ensured by the KindInfo:
+ [PGSTAT_KIND_RELFILENODE] = {
+ .name = "relfilenode",

FWIW, I have on my stack of patches something to switch the objoid to
8 bytes, actually, which is something that would be required for
pg_stat_statements as query IDs are wider than that and affect all
databases, FWIW. Relfilenodes are 4 bytes, okay still Robert has
proposed a couple of years ago a patch set to bump that to 56 bits,
change reverted in a448e49bcbe4. The objoid is also not something
specific to OIDs, see replication slots with their idx for example.

What you would be looking instead is to use the relfilenode as an
objoid and keep track of the OID of the original relation in each
PgStat_StatRelFileNodeEntry so as it is possible to know where a past
relfilenode was used? That makes looking back at the past relation's
elfilenodes stats more complicated as it would be necessary to keep a
list of the past relfilenodes for a relation, as well. Perhaps with
some kind of cache that maintains a mapping between the relation and
its relfilenode history?
--
Michael

#17Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#16)
Re: relfilenode statistics

Hi,

On Wed, Jul 10, 2024 at 03:02:34PM +0900, Michael Paquier wrote:

On Sat, May 25, 2024 at 07:52:02AM +0000, Bertrand Drouvot wrote:

But I think that it is in a state that can be used to discuss the approach it
is implementing (so that we can agree or not on it) before moving
forward.

I have read through the patch to get an idea of how things are done,

Thanks!

and I am troubled by the approach taken (mentioned down by you), but
that's invasive compared to how pgstats wants to be transparent with
its stats kinds.

+   Oid         objoid;         /* object ID, either table or function
or tablespace. */
+   RelFileNumber relfile;      /* relfilenumber for RelFileLocator. */
} PgStat_HashKey;

This adds a relfilenode component to the central hash key used for the
dshash of pgstats, which is something most stats types don't care
about.

That's right but that's an existing behavior without the patch as:

PGSTAT_KIND_DATABASE does not care care about the objoid
PGSTAT_KIND_REPLSLOT does not care care about the dboid
PGSTAT_KIND_SUBSCRIPTION does not care care about the dboid

That's 3 kinds out of the 5 non fixed stats kind.

Not saying it's good, just saying that's an existing behavior.

That looks like the incorrect thing to do to me, particularly
seeing a couple of lines down that a stats kind is assigned so the
HashKey uniqueness is ensured by the KindInfo:
+ [PGSTAT_KIND_RELFILENODE] = {
+ .name = "relfilenode",

You mean, just rely on kind, dboid and relfile to ensure uniqueness?

I'm not sure that would work as there is this comment in relfilelocator.h:

"
* Notice that relNumber is only unique within a database in a particular
* tablespace.
"

So, I think it makes sense to link the hashkey to all the RelFileLocator
fields, means:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid)
relfile (linked to RelFileLocator's relNumber)

FWIW, I have on my stack of patches something to switch the objoid to
8 bytes, actually, which is something that would be required for
pg_stat_statements as query IDs are wider than that and affect all
databases, FWIW. Relfilenodes are 4 bytes, okay still Robert has
proposed a couple of years ago a patch set to bump that to 56 bits,
change reverted in a448e49bcbe4.

Right, but it really looks like this extra field is needed to ensure
uniqueness (see above).

What you would be looking instead is to use the relfilenode as an
objoid

Not sure that works, as it looks like uniqueness won't be ensured (see above).

and keep track of the OID of the original relation in each
PgStat_StatRelFileNodeEntry so as it is possible to know where a past
relfilenode was used? That makes looking back at the past relation's
elfilenodes stats more complicated as it would be necessary to keep a
list of the past relfilenodes for a relation, as well. Perhaps with
some kind of cache that maintains a mapping between the relation and
its relfilenode history?

Yeah, I also thought about keeping a list of "previous" relfilenodes stats for a
relation but that would lead to:

1. Keep previous relfilnode stats
2. A more complicated way to look at relation stats (as you said)
3. Extra memory usage

I think the only reason "previous" relfilenode stats are needed is to provide
accurate stats for the relation. Outside of this need, I don't think we would
want to retrieve "individual" previous relfilenode stats in the past.

That's why the POC patch "simply" copies the stats to the relation during a
rewrite (before getting rid of the "previous" relfilenode stats).

What do you think?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#18Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#17)
Re: relfilenode statistics

On Wed, Jul 10, 2024 at 01:38:06PM +0000, Bertrand Drouvot wrote:

On Wed, Jul 10, 2024 at 03:02:34PM +0900, Michael Paquier wrote:

and I am troubled by the approach taken (mentioned down by you), but
that's invasive compared to how pgstats wants to be transparent with
its stats kinds.

+   Oid         objoid;         /* object ID, either table or function
or tablespace. */
+   RelFileNumber relfile;      /* relfilenumber for RelFileLocator. */
} PgStat_HashKey;

This adds a relfilenode component to the central hash key used for the
dshash of pgstats, which is something most stats types don't care
about.

That's right but that's an existing behavior without the patch as:

PGSTAT_KIND_DATABASE does not care care about the objoid
PGSTAT_KIND_REPLSLOT does not care care about the dboid
PGSTAT_KIND_SUBSCRIPTION does not care care about the dboid

That's 3 kinds out of the 5 non fixed stats kind.

I'd like to think that this is just going to increase across time.

That looks like the incorrect thing to do to me, particularly
seeing a couple of lines down that a stats kind is assigned so the
HashKey uniqueness is ensured by the KindInfo:
+ [PGSTAT_KIND_RELFILENODE] = {
+ .name = "relfilenode",

You mean, just rely on kind, dboid and relfile to ensure uniqueness?

Or table OID for the objid, with a hardcoded number of past
relfilenodes stats stored, to limit bloating the dshash with too much
past stats. See below.

So, I think it makes sense to link the hashkey to all the RelFileLocator
fields, means:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid)
relfile (linked to RelFileLocator's relNumber)

Hmm. How about using the table OID as objoid, but store in the stats
of the new KindInfo an array of entries with the relfilenodes (current
and past, perhaps with more data than the relfilenode to ensure the
uniqueness tracking) and each of its stats? The number of past
relfilenodes would be fixed, meaning that there would be a strict
control with the retention of the past stats. When a table is
dropped, removing its relfilenode stats would be as cheap as when its
PGSTAT_KIND_RELATION is dropped.

Yeah, I also thought about keeping a list of "previous" relfilenodes stats for a
relation but that would lead to:

1. Keep previous relfilnode stats
2. A more complicated way to look at relation stats (as you said)
3. Extra memory usage

I think the only reason "previous" relfilenode stats are needed is to provide
accurate stats for the relation. Outside of this need, I don't think we would
want to retrieve "individual" previous relfilenode stats in the past.

That's why the POC patch "simply" copies the stats to the relation during a
rewrite (before getting rid of the "previous" relfilenode stats).

Hmm. Okay.
--
Michael

#19Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#18)
Re: relfilenode statistics

Hi,

On Thu, Jul 11, 2024 at 01:58:19PM +0900, Michael Paquier wrote:

On Wed, Jul 10, 2024 at 01:38:06PM +0000, Bertrand Drouvot wrote:

So, I think it makes sense to link the hashkey to all the RelFileLocator
fields, means:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid)
relfile (linked to RelFileLocator's relNumber)

Hmm. How about using the table OID as objoid,

The issue is that we don't have the relation OID when writing buffers out (that's
one of the reason explained in [1]/messages/by-id/Zl2k8u4HDTUW6QlC@ip-10-97-1-34.eu-west-3.compute.internal).

[1]: /messages/by-id/Zl2k8u4HDTUW6QlC@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#19)
1 attachment(s)
Re: relfilenode statistics

Hi,

On Thu, Jul 11, 2024 at 06:10:23AM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Jul 11, 2024 at 01:58:19PM +0900, Michael Paquier wrote:

On Wed, Jul 10, 2024 at 01:38:06PM +0000, Bertrand Drouvot wrote:

So, I think it makes sense to link the hashkey to all the RelFileLocator
fields, means:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid)
relfile (linked to RelFileLocator's relNumber)

Hmm. How about using the table OID as objoid,

The issue is that we don't have the relation OID when writing buffers out (that's
one of the reason explained in [1]).

[1]: /messages/by-id/Zl2k8u4HDTUW6QlC@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

Please find attached a mandatory rebase due to the recent changes around
statistics.

As mentioned up-thread:

The attached patch is not in a fully "polished" state yet: there is more places
we should add relfilenode counters, create more APIS to retrieve the relfilenode
stats....

It is in a state that can be used to discuss the approach it is implementing (as
we have done so far) before moving forward.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Provide-relfilenode-statistics.patchtext/x-diff; charset=utf-8Download
From 1df7f2eed01478cdbe36673ef18247452e579f3b Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Nov 2023 02:30:01 +0000
Subject: [PATCH v3] Provide relfilenode statistics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation.

relfilenode stats is also beneficial for the "Split index and table statistics
into different types of stats" work in progress: it would allow us to avoid
additional branches in some situations.

=== Remarks ===

This is a POC patch. There is still work to do: there is more places we should
add relfilenode counters, create more APIS to retrieve the relfilenode stats,
the patch takes care of rewrite generated by TRUNCATE but there is more to
care about like CLUSTER,VACUUM FULL.

The new logic to retrieve stats in pg_statio_all_tables has been implemented
only for the new blocks_written stat (we'd need to do the same for the existing
buffer read / buffer hit if we agree on the approach implemented here).

The goal of this patch is to start the discussion and agree on the design before
moving forward.
---
 src/backend/access/rmgrdesc/xactdesc.c        |   5 +-
 src/backend/catalog/storage.c                 |   8 ++
 src/backend/catalog/system_functions.sql      |   2 +-
 src/backend/catalog/system_views.sql          |   5 +-
 src/backend/postmaster/checkpointer.c         |   5 +
 src/backend/storage/buffer/bufmgr.c           |   6 +-
 src/backend/storage/smgr/md.c                 |   7 ++
 src/backend/utils/activity/pgstat.c           |  39 ++++--
 src/backend/utils/activity/pgstat_database.c  |  12 +-
 src/backend/utils/activity/pgstat_function.c  |  13 +-
 src/backend/utils/activity/pgstat_relation.c  | 112 ++++++++++++++++--
 src/backend/utils/activity/pgstat_replslot.c  |  13 +-
 src/backend/utils/activity/pgstat_shmem.c     |  19 ++-
 .../utils/activity/pgstat_subscription.c      |  12 +-
 src/backend/utils/activity/pgstat_xact.c      |  60 +++++++---
 src/backend/utils/adt/pgstatfuncs.c           |  34 +++++-
 src/include/access/tableam.h                  |  19 +++
 src/include/access/xact.h                     |   1 +
 src/include/catalog/pg_proc.dat               |  14 ++-
 src/include/pgstat.h                          |  37 ++++--
 src/include/utils/pgstat_internal.h           |  34 ++++--
 src/test/recovery/t/029_stats_restart.pl      |  40 +++----
 .../recovery/t/030_stats_cleanup_replica.pl   |   6 +-
 src/test/regress/expected/rules.out           |  12 +-
 src/test/regress/expected/stats.out           |  30 ++---
 src/test/regress/sql/stats.sql                |  30 ++---
 src/test/subscription/t/026_stats.pl          |   4 +-
 src/tools/pgindent/typedefs.list              |   1 +
 28 files changed, 424 insertions(+), 156 deletions(-)
   4.4% src/backend/catalog/
  46.2% src/backend/utils/activity/
   6.3% src/backend/utils/adt/
   3.6% src/backend/
   3.2% src/include/access/
   3.2% src/include/catalog/
   6.0% src/include/utils/
   6.6% src/include/
  11.7% src/test/recovery/t/
   5.3% src/test/regress/expected/
   3.0% src/

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index dccca201e0..c02b079645 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -319,10 +319,11 @@ xact_desc_stats(StringInfo buf, const char *label,
 		appendStringInfo(buf, "; %sdropped stats:", label);
 		for (i = 0; i < ndropped; i++)
 		{
-			appendStringInfo(buf, " %d/%u/%u",
+			appendStringInfo(buf, " %d/%u/%u/%u",
 							 dropped_stats[i].kind,
 							 dropped_stats[i].dboid,
-							 dropped_stats[i].objoid);
+							 dropped_stats[i].objoid,
+							 dropped_stats[i].relfile);
 		}
 	}
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index f56b3cc0f2..db6107cd90 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -33,6 +33,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 
 /* GUC variables */
@@ -152,6 +153,7 @@ RelationCreateStorage(RelFileLocator rlocator, char relpersistence,
 	if (needs_wal)
 		log_smgrcreate(&srel->smgr_rlocator.locator, MAIN_FORKNUM);
 
+	pgstat_create_transactional(PGSTAT_KIND_RELFILENODE, rlocator.dbOid, rlocator.spcOid, rlocator.relNumber);
 	/*
 	 * Add the relation to the list of stuff to delete at abort, if we are
 	 * asked to do so.
@@ -227,6 +229,8 @@ RelationDropStorage(Relation rel)
 	 * for now I'll keep the logic simple.
 	 */
 
+	pgstat_drop_transactional(PGSTAT_KIND_RELFILENODE, rel->rd_locator.dbOid, rel->rd_locator.spcOid,  rel->rd_locator.relNumber);
+
 	RelationCloseSmgr(rel);
 }
 
@@ -253,6 +257,9 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 	PendingRelDelete *pending;
 	PendingRelDelete *prev;
 	PendingRelDelete *next;
+	PgStat_SubXactStatus *xact_state;
+
+	xact_state = pgStatXactStack;
 
 	prev = NULL;
 	for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -267,6 +274,7 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 			else
 				pendingDeletes = next;
 			pfree(pending);
+			PgStat_RemoveRelFileNodeFromDroppedStats(xact_state, rlocator);
 			/* prev does not change */
 		}
 		else
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..ec60ef72e3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -684,7 +684,7 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM publ
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid, oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_stats(oid) FROM public;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 19cabc9a47..a7b13d877b 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -746,6 +746,7 @@ CREATE VIEW pg_statio_all_tables AS
             C.relname AS relname,
             pg_stat_get_blocks_fetched(C.oid) -
                     pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
+			pg_stat_get_blocks_written(C.oid) + pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END, C.relfilenode) AS heap_blks_written,
             pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
             I.idx_blks_read AS idx_blks_read,
             I.idx_blks_hit AS idx_blks_hit,
@@ -754,7 +755,7 @@ CREATE VIEW pg_statio_all_tables AS
             pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
             X.idx_blks_read AS tidx_blks_read,
             X.idx_blks_hit AS tidx_blks_hit
-    FROM pg_class C LEFT JOIN
+    FROM pg_database d, pg_class C LEFT JOIN
             pg_class T ON C.reltoastrelid = T.oid
             LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
             LEFT JOIN LATERAL (
@@ -771,7 +772,7 @@ CREATE VIEW pg_statio_all_tables AS
                      sum(pg_stat_get_blocks_hit(indexrelid))::bigint
                      AS idx_blks_hit
               FROM pg_index WHERE indrelid = T.oid ) X ON true
-    WHERE C.relkind IN ('r', 't', 'm');
+    WHERE C.relkind IN ('r', 't', 'm') AND d.datname = current_database();
 
 CREATE VIEW pg_statio_sys_tables AS
     SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 199f008bcd..7f13a840c4 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -519,6 +519,11 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_checkpointer();
 		pgstat_report_wal(true);
+		/*
+		 *  No need to check for transaction state in checkpointer before
+		 *  calling pgstat_report_stat().
+		 */
+		pgstat_report_stat(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4415ba648a..1ac73f672b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1186,9 +1186,9 @@ PinBufferForBlock(Relation rel,
 		 * WaitReadBuffers() (so, not for hits, and not for buffers that are
 		 * zeroed instead), the per-relation stats always count them.
 		 */
-		pgstat_count_buffer_read(rel);
+		pgstat_report_relfilenode_buffer_read(rel);
 		if (*foundPtr)
-			pgstat_count_buffer_hit(rel);
+			pgstat_report_relfilenode_buffer_hit(rel);
 	}
 	if (*foundPtr)
 	{
@@ -3907,6 +3907,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 
 	pgBufferUsage.shared_blks_written++;
 
+	pgstat_report_relfilenode_blks_written(reln->smgr_rlocator.locator);
+
 	/*
 	 * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
 	 * end the BM_IO_IN_PROGRESS state.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 6796756358..5bc5fc65cd 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1447,12 +1447,16 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 {
 	SMgrRelation *srels;
 	int			i;
+	int         not_freed_count = 0;
 
 	srels = palloc(sizeof(SMgrRelation) * ndelrels);
 	for (i = 0; i < ndelrels; i++)
 	{
 		SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
 
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELFILENODE, delrels[i].dbOid, delrels[i].spcOid, delrels[i].relNumber))
+			not_freed_count++;
+
 		if (isRedo)
 		{
 			ForkNumber	fork;
@@ -1463,6 +1467,9 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 		srels[i] = srel;
 	}
 
+	if (not_freed_count > 0)
+		pgstat_request_entry_refs_gc();
+
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b2ca3f39b7..035fdb2aa1 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -308,6 +308,19 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
+	[PGSTAT_KIND_RELFILENODE] = {
+		.name = "relfilenode",
+
+		.fixed_amount = false,
+
+		.shared_size = sizeof(PgStatShared_RelFileNode),
+		.shared_data_off = offsetof(PgStatShared_RelFileNode, stats),
+		.shared_data_len = sizeof(((PgStatShared_RelFileNode *) 0)->stats),
+		.pending_size = sizeof(PgStat_StatRelFileNodeEntry),
+
+		.flush_pending_cb = pgstat_relfilenode_flush_cb,
+	},
+
 	[PGSTAT_KIND_FUNCTION] = {
 		.name = "function",
 
@@ -717,7 +730,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / relfilenode / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush IO stats */
@@ -797,7 +810,7 @@ pgstat_reset_counters(void)
  * GRANT system.
  */
 void
-pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 	TimestampTz ts = GetCurrentTimestamp();
@@ -806,7 +819,7 @@ pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
 
 	/* reset the "single counter" */
-	pgstat_reset_entry(kind, dboid, objoid, ts);
+	pgstat_reset_entry(kind, dboid, objoid, relfile, ts);
 
 	if (!kind_info->accessed_across_databases)
 		pgstat_reset_database_timestamp(dboid, ts);
@@ -877,7 +890,7 @@ pgstat_clear_snapshot(void)
 }
 
 void *
-pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_HashKey key;
 	PgStat_EntryRef *entry_ref;
@@ -893,6 +906,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objoid = objoid;
+	key.relfile = relfile;
 
 	/* if we need to build a full snapshot, do so */
 	if (pgstat_fetch_consistency == PGSTAT_FETCH_CONSISTENCY_SNAPSHOT)
@@ -918,7 +932,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 
 	pgStatLocal.snapshot.mode = pgstat_fetch_consistency;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->shared_entry->dropped)
 	{
@@ -987,13 +1001,13 @@ pgstat_get_stat_snapshot_timestamp(bool *have_snapshot)
 }
 
 bool
-pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	/* fixed-numbered stats always exist */
 	if (pgstat_get_kind_info(kind)->fixed_amount)
 		return true;
 
-	return pgstat_get_entry_ref(kind, dboid, objoid, false, NULL) != NULL;
+	return pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL) != NULL;
 }
 
 /*
@@ -1208,7 +1222,8 @@ pgstat_build_snapshot_fixed(PgStat_Kind kind)
  * created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry)
+pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid,
+						  RelFileNumber relfile, bool *created_entry)
 {
 	PgStat_EntryRef *entry_ref;
 
@@ -1223,7 +1238,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
 								  ALLOCSET_SMALL_SIZES);
 	}
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid,
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile,
 									 true, created_entry);
 
 	if (entry_ref->pending == NULL)
@@ -1246,11 +1261,11 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
  * that it shouldn't be needed.
  */
 PgStat_EntryRef *
-pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->pending == NULL)
 		return NULL;
@@ -1279,7 +1294,7 @@ pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
 }
 
 /*
- * Flush out pending stats for database objects (databases, relations,
+ * Flush out pending stats for database objects (databases, relations, relfilenodes,
  * functions).
  */
 static bool
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 29bc090974..cf77f2dbdb 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -43,7 +43,7 @@ static PgStat_Counter pgLastSessionReportTime = 0;
 void
 pgstat_drop_database(Oid databaseid)
 {
-	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid);
+	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid, InvalidOid);
 }
 
 /*
@@ -66,7 +66,7 @@ pgstat_report_autovac(Oid dboid)
 	 * operation so it doesn't matter if we get blocked here a little.
 	 */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE,
-											dboid, InvalidOid, false);
+											dboid, InvalidOid, InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) entry_ref->shared_stats;
 	dbentry->stats.last_autovac_time = GetCurrentTimestamp();
@@ -150,7 +150,7 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
 	 * common enough for that to be a problem.
 	 */
 	entry_ref =
-		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, false);
+		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid, false);
 
 	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
 	sharedent->stats.checksum_failures += failurecount;
@@ -242,7 +242,7 @@ PgStat_StatDBEntry *
 pgstat_fetch_stat_dbentry(Oid dboid)
 {
 	return (PgStat_StatDBEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid);
+		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid);
 }
 
 void
@@ -341,7 +341,7 @@ pgstat_prep_database_pending(Oid dboid)
 	Assert(!OidIsValid(dboid) || OidIsValid(MyDatabaseId));
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid,
-										  NULL);
+										  InvalidOid, NULL);
 
 	return entry_ref->pending;
 }
@@ -357,7 +357,7 @@ pgstat_reset_database_timestamp(Oid dboid, TimestampTz ts)
 	PgStatShared_Database *dbentry;
 
 	dbref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, MyDatabaseId, InvalidOid,
-										false);
+										InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) dbref->shared_stats;
 	dbentry->stats.stat_reset_timestamp = ts;
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d26da551a4..440e44e300 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -46,7 +46,8 @@ pgstat_create_function(Oid proid)
 {
 	pgstat_create_transactional(PGSTAT_KIND_FUNCTION,
 								MyDatabaseId,
-								proid);
+								proid,
+								InvalidOid);
 }
 
 /*
@@ -61,7 +62,8 @@ pgstat_drop_function(Oid proid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_FUNCTION,
 							  MyDatabaseId,
-							  proid);
+							  proid,
+							  InvalidOid);
 }
 
 /*
@@ -86,6 +88,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_FUNCTION,
 										  MyDatabaseId,
 										  fcinfo->flinfo->fn_oid,
+										  InvalidOid,
 										  &created_entry);
 
 	/*
@@ -113,7 +116,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 		if (!SearchSysCacheExists1(PROCOID, ObjectIdGetDatum(fcinfo->flinfo->fn_oid)))
 		{
 			pgstat_drop_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId,
-							  fcinfo->flinfo->fn_oid);
+							  fcinfo->flinfo->fn_oid, InvalidOid);
 			ereport(ERROR, errcode(ERRCODE_UNDEFINED_FUNCTION),
 					errmsg("function call to dropped function"));
 		}
@@ -224,7 +227,7 @@ find_funcstat_entry(Oid func_id)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 
 	if (entry_ref)
 		return entry_ref->pending;
@@ -239,5 +242,5 @@ PgStat_StatFuncEntry *
 pgstat_fetch_stat_funcentry(Oid func_id)
 {
 	return (PgStat_StatFuncEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 }
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 8a3f7d434c..136dd6c85b 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -44,6 +44,7 @@ typedef struct TwoPhasePgStatRecord
 
 
 static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+PgStat_StatRelFileNodeEntry *pgstat_prep_relfilenode_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -69,6 +70,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
 										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
 										  RelationGetRelid(dst),
+										  InvalidOid,
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
@@ -170,7 +172,7 @@ pgstat_create_relation(Relation rel)
 {
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
 								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								RelationGetRelid(rel), InvalidOid);
 }
 
 /*
@@ -184,7 +186,7 @@ pgstat_drop_relation(Relation rel)
 
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
 							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  RelationGetRelid(rel), InvalidOid);
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -225,7 +227,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											dboid, tableoid, InvalidOid, false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -318,6 +320,7 @@ pgstat_report_analyze(Relation rel,
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
 											RelationGetRelid(rel),
+											InvalidOid,
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -458,6 +461,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
 	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
 }
 
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the collected statistics for one relfilenode or NULL. NULL doesn't mean
+ * that the relfilenode doesn't exist, just that there are no statistics, so the
+ * caller is better off to report ZERO instead.
+ */
+PgStat_StatRelFileNodeEntry *
+pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile)
+{
+	return (PgStat_StatRelFileNodeEntry *)
+		pgstat_fetch_entry(PGSTAT_KIND_RELFILENODE, dboid, spcOid, relfile);
+}
+
 /*
  * More efficient version of pgstat_fetch_stat_tabentry(), allowing to specify
  * whether the to-be-accessed table is a shared relation or not.
@@ -468,7 +484,7 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
 	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 
 	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid, InvalidOid);
 }
 
 /*
@@ -491,10 +507,10 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id, InvalidOid);
 	if (!entry_ref)
 	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id, InvalidOid);
 		if (!entry_ref)
 			return tablestatus;
 	}
@@ -881,6 +897,38 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	return true;
 }
 
+/*
+ * Flush out pending stats for the relfilenode entry
+ *
+ * If nowait is true, this function returns false if lock could not
+ * immediately acquired, otherwise true is returned.
+ */
+bool
+pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_RelFileNode *sharedent;
+	PgStat_StatRelFileNodeEntry *pendingent;
+
+	pendingent = (PgStat_StatRelFileNodeEntry *) entry_ref->pending;
+	sharedent = (PgStatShared_RelFileNode *) entry_ref->shared_stats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+#define PGSTAT_ACCUM_RELFILENODECOUNT(item)      \
+		(sharedent)->stats.item += (pendingent)->item
+
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_fetched);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_hit);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_written);
+
+	pgstat_unlock_entry(entry_ref);
+
+	memset(pendingent, 0, sizeof(*pendingent));
+
+	return true;
+}
+
 void
 pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref)
 {
@@ -902,7 +950,7 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
 										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  rel_id, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->id = rel_id;
 	pending->shared = isshared;
@@ -910,6 +958,56 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 	return pending;
 }
 
+PgStat_StatRelFileNodeEntry *
+pgstat_prep_relfilenode_pending(RelFileLocator locator)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELFILENODE, locator.dbOid,
+										  locator.spcOid, locator.relNumber, NULL);
+
+	return entry_ref->pending;
+}
+
+void
+pgstat_report_relfilenode_blks_written(RelFileLocator locator)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	relfileentry = pgstat_prep_relfilenode_pending(locator);
+
+	if (relfileentry)
+		relfileentry->blocks_written++;
+}
+
+void
+pgstat_report_relfilenode_buffer_read(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_read(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_fetched++;
+}
+
+void
+pgstat_report_relfilenode_buffer_hit(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_hit(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_hit++;
+}
+
 /*
  * add a new (sub)transaction state record
  */
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index da11b86744..2e68ed4a09 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -62,7 +62,7 @@ pgstat_reset_replslot(const char *name)
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
-					 ReplicationSlotIndex(slot));
+					 ReplicationSlotIndex(slot), InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 }
@@ -82,7 +82,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
 	PgStat_StatReplSlotEntry *statent;
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 	statent = &shstatent->stats;
 
@@ -116,7 +116,7 @@ pgstat_create_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 
 	/*
@@ -146,7 +146,7 @@ void
 pgstat_acquire_replslot(ReplicationSlot *slot)
 {
 	pgstat_get_entry_ref(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						 ReplicationSlotIndex(slot), true, NULL);
+						 ReplicationSlotIndex(slot), InvalidOid, true, NULL);
 }
 
 /*
@@ -158,7 +158,7 @@ pgstat_drop_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	if (!pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						   ReplicationSlotIndex(slot)))
+						   ReplicationSlotIndex(slot), InvalidOid))
 		pgstat_request_entry_refs_gc();
 }
 
@@ -178,7 +178,7 @@ pgstat_fetch_replslot(NameData slotname)
 
 	if (idx != -1)
 		slotentry = (PgStat_StatReplSlotEntry *) pgstat_fetch_entry(PGSTAT_KIND_REPLSLOT,
-																	InvalidOid, idx);
+																	InvalidOid, idx, InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 
@@ -210,6 +210,7 @@ pgstat_replslot_from_serialized_name_cb(const NameData *name, PgStat_HashKey *ke
 	key->kind = PGSTAT_KIND_REPLSLOT;
 	key->dboid = InvalidOid;
 	key->objoid = idx;
+	key->relfile = InvalidOid;
 
 	return true;
 }
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index fd09b9d988..4890f8b807 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -429,10 +429,10 @@ pgstat_get_entry_ref_cached(PgStat_HashKey key, PgStat_EntryRef **entry_ref_p)
  * if the entry is newly created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, bool create,
-					 bool *created_entry)
+pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile,
+					 bool create, bool *created_entry)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shhashent;
 	PgStatShared_Common *shheader = NULL;
 	PgStat_EntryRef *entry_ref;
@@ -645,12 +645,12 @@ pgstat_unlock_entry(PgStat_EntryRef *entry_ref)
  */
 PgStat_EntryRef *
 pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-							bool nowait)
+							RelFileNumber relfile, bool nowait)
 {
 	PgStat_EntryRef *entry_ref;
 
 	/* find shared table stats entry corresponding to the local entry */
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, true, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, true, NULL);
 
 	/* lock the shared entry to protect the content, skip if failed */
 	if (!pgstat_lock_entry(entry_ref, nowait))
@@ -905,9 +905,9 @@ pgstat_drop_database_and_contents(Oid dboid)
  * pgstat_gc_entry_refs().
  */
 bool
-pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shent;
 	bool		freed = true;
 
@@ -980,13 +980,12 @@ shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header,
  * Reset one variable-numbered stats entry.
  */
 void
-pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts)
+pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts)
 {
 	PgStat_EntryRef *entry_ref;
 
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
-
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 	if (!entry_ref || entry_ref->shared_entry->dropped)
 		return;
 
diff --git a/src/backend/utils/activity/pgstat_subscription.c b/src/backend/utils/activity/pgstat_subscription.c
index d9af8de658..9b9ab2861b 100644
--- a/src/backend/utils/activity/pgstat_subscription.c
+++ b/src/backend/utils/activity/pgstat_subscription.c
@@ -30,7 +30,7 @@ pgstat_report_subscription_error(Oid subid, bool is_apply_error)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 
 	if (is_apply_error)
@@ -47,12 +47,12 @@ pgstat_create_subscription(Oid subid)
 {
 	/* Ensures that stats are dropped if transaction rolls back */
 	pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
-								InvalidOid, subid);
+								InvalidOid, subid, InvalidOid);
 
 	/* Create and initialize the subscription stats entry */
-	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid,
+	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid,
 						 true, NULL);
-	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, 0);
+	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid, 0);
 }
 
 /*
@@ -64,7 +64,7 @@ void
 pgstat_drop_subscription(Oid subid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_SUBSCRIPTION,
-							  InvalidOid, subid);
+							  InvalidOid, subid, InvalidOid);
 }
 
 /*
@@ -75,7 +75,7 @@ PgStat_StatSubEntry *
 pgstat_fetch_stat_subscription(Oid subid)
 {
 	return (PgStat_StatSubEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 1877d22f14..b25df5112b 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -30,7 +30,7 @@ static void AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool
 static void AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 											bool isCommit, int nestDepth);
 
-static PgStat_SubXactStatus *pgStatXactStack = NULL;
+PgStat_SubXactStatus *pgStatXactStack = NULL;
 
 
 /*
@@ -84,7 +84,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that dropped an object committed. Drop the stats
 			 * too.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 		else if (!isCommit && pending->is_create)
@@ -93,7 +93,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that created an object aborted. Drop the stats
 			 * associated with the object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 
@@ -105,6 +105,33 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 		pgstat_request_entry_refs_gc();
 }
 
+/*
+ * Remove a relfilenode stat from the list of stats to be dropped.
+ */
+void
+PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator)
+{
+	dlist_mutable_iter iter;
+
+	if (dclist_count(&xact_state->pending_drops) == 0)
+		return;
+
+	dclist_foreach_modify(iter, &xact_state->pending_drops)
+	{
+		PgStat_PendingDroppedStatsItem *pending =
+			dclist_container(PgStat_PendingDroppedStatsItem, node, iter.cur);
+		xl_xact_stats_item *it = &pending->item;
+
+		if (it->kind == PGSTAT_KIND_RELFILENODE && it->dboid == rlocator.dbOid
+			&& it->objoid == rlocator.spcOid && it->relfile == rlocator.relNumber)
+		{
+			dclist_delete_from(&xact_state->pending_drops, &pending->node);
+			pfree(pending);
+			return;
+		}
+	}
+}
+
 /*
  * Called from access/transam/xact.c at subtransaction commit/abort.
  */
@@ -158,7 +185,7 @@ AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 			 * Subtransaction creating a new stats object aborted. Drop the
 			 * stats object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 			pfree(pending);
 		}
@@ -320,7 +347,11 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 	{
 		xl_xact_stats_item *it = &items[i];
 
-		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+		/* leave it to pgstat_drop_transactional() in RelationDropStorage() */
+		if (it->kind == PGSTAT_KIND_RELFILENODE)
+			continue;
+
+		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 			not_freed_count++;
 	}
 
@@ -329,7 +360,7 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 }
 
 static void
-create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool is_create)
+create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, bool is_create)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_SubXactStatus *xact_state;
@@ -342,6 +373,7 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
 	drop->item.kind = kind;
 	drop->item.dboid = dboid;
 	drop->item.objoid = objoid;
+	drop->item.relfile = relfile;
 
 	dclist_push_tail(&xact_state->pending_drops, &drop->node);
 }
@@ -354,18 +386,18 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
  * dropped.
  */
 void
-pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objoid, false, NULL))
+	if (pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL))
 	{
 		ereport(WARNING,
-				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u",
-					   (pgstat_get_kind_info(kind))->name, dboid, objoid));
+				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u, relfile=%u",
+					   (pgstat_get_kind_info(kind))->name, dboid, objoid, relfile));
 
-		pgstat_reset(kind, dboid, objoid);
+		pgstat_reset(kind, dboid, objoid, relfile);
 	}
 
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ true);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ true);
 }
 
 /*
@@ -376,7 +408,7 @@ pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
  * alive.
  */
 void
-pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ false);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ false);
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3221137123..c1d62873a3 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -106,6 +106,30 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated)
 /* pg_stat_get_vacuum_count */
 PG_STAT_GET_RELENTRY_INT64(vacuum_count)
 
+#define PG_STAT_GET_RELFILEENTRY_INT64(stat)						\
+Datum															\
+CppConcat(pg_stat_get_relfilenode_,stat)(PG_FUNCTION_ARGS)					\
+{																\
+	Oid			dboid = PG_GETARG_OID(0);						\
+	Oid			 spcOid = PG_GETARG_OID(1);						\
+	RelFileNumber			 relfile = PG_GETARG_OID(2);						\
+	int64		result;											\
+	PgStat_StatRelFileNodeEntry *relfileentry;								\
+																\
+	if ((relfileentry = pgstat_fetch_stat_relfilenodeentry(dboid, spcOid, relfile)) == NULL)	\
+		result = 0;												\
+	else														\
+		result = (int64) (relfileentry->stat);						\
+																\
+	PG_RETURN_INT64(result);									\
+}
+
+/* pg_stat_get_relfilenode_blocks_written */
+PG_STAT_GET_RELFILEENTRY_INT64(blocks_written)
+
+/* pg_stat_get_blocks_written */
+PG_STAT_GET_RELENTRY_INT64(blocks_written)
+
 #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat)					\
 Datum															\
 CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)					\
@@ -1752,7 +1776,7 @@ pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 	Oid			taboid = PG_GETARG_OID(0);
 	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1762,7 +1786,7 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 {
 	Oid			funcoid = PG_GETARG_OID(0);
 
-	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid);
+	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1820,7 +1844,7 @@ pg_stat_reset_subscription_stats(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid subscription OID %u", subid)));
-		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 	}
 
 	PG_RETURN_VOID();
@@ -2028,7 +2052,9 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	char	   *stats_type = text_to_cstring(PG_GETARG_TEXT_P(0));
 	Oid			dboid = PG_GETARG_OID(1);
 	Oid			objoid = PG_GETARG_OID(2);
+	Oid			relfile = PG_GETARG_OID(3);
+
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
-	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
+	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid, relfile));
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index da661289c1..3614bae63c 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,7 +21,9 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "executor/tuptable.h"
+#include "pgstat.h"
 #include "storage/read_stream.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -1624,6 +1626,23 @@ table_relation_set_new_filelocator(Relation rel,
 								   TransactionId *freezeXid,
 								   MultiXactId *minmulti)
 {
+	PgStat_StatRelFileNodeEntry *relfileentry;
+	PgStat_StatTabEntry *tabentry = NULL;
+	PgStat_EntryRef *entry_ref = NULL;
+	PgStatShared_Relation *shtabentry;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_RELATION, MyDatabaseId, rel->rd_id, InvalidOid, false, NULL);
+	if (entry_ref)
+	{
+		shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
+		tabentry = &shtabentry->stats;
+	}
+
+	relfileentry = pgstat_fetch_stat_relfilenodeentry(rel->rd_locator.dbOid, rel->rd_locator.spcOid, rel->rd_locator.relNumber);
+
+	if (tabentry && relfileentry)
+		tabentry->blocks_written += relfileentry->blocks_written;
+
 	rel->rd_tableam->relation_set_new_filelocator(rel, newrlocator,
 												  persistence, freezeXid,
 												  minmulti);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 6d4439f052..3b9ed65ff6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -284,6 +284,7 @@ typedef struct xl_xact_stats_item
 	int			kind;
 	Oid			dboid;
 	Oid			objoid;
+	RelFileNumber relfile;
 } xl_xact_stats_item;
 
 typedef struct xl_xact_stats_items
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d36f6001bb..fae89b1e08 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5407,6 +5407,14 @@
   proname => 'pg_stat_get_tuples_updated', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_tuples_updated' },
+{ oid => '9280', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_relfilenode_blocks_written', provolatile => 's',
+  proparallel => 'r',
+  proargtypes => 'oid oid oid',
+  prorettype => 'int8',
+  proallargtypes => '{oid,oid,oid,int8}',
+  proargmodes => '{i,i,i,o}',
+  prosrc => 'pg_stat_get_relfilenode_blocks_written' },
 { oid => '1933', descr => 'statistics: number of tuples deleted',
   proname => 'pg_stat_get_tuples_deleted', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5446,6 +5454,10 @@
   proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8438', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_blocks_written', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_blocks_written' },
 { oid => '2781', descr => 'statistics: last manual vacuum time for a table',
   proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
   proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5532,7 +5544,7 @@
 
 { oid => '6230', descr => 'statistics: check if a stats object exists',
   proname => 'pg_stat_have_stats', provolatile => 'v', proparallel => 'r',
-  prorettype => 'bool', proargtypes => 'text oid oid',
+  prorettype => 'bool', proargtypes => 'text oid oid oid',
   prosrc => 'pg_stat_have_stats' },
 
 { oid => '6231', descr => 'statistics: information about subscription stats',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f63159c55c..cfed4f07f5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -15,6 +15,7 @@
 #include "datatype/timestamp.h"
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -45,17 +46,18 @@
 /* stats for variable-numbered objects */
 #define PGSTAT_KIND_DATABASE	1	/* database-wide statistics */
 #define PGSTAT_KIND_RELATION	2	/* per-table statistics */
-#define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
-#define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
-#define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_RELFILENODE 3   /* per-relfilenode statistics */
+#define PGSTAT_KIND_FUNCTION	4	/* per-function statistics */
+#define PGSTAT_KIND_REPLSLOT	5	/* per-slot statistics */
+#define PGSTAT_KIND_SUBSCRIPTION	6	/* per-subscription statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -447,6 +449,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
 
 	TimestampTz last_vacuum_time;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
@@ -458,6 +461,13 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+typedef struct PgStat_StatRelFileNodeEntry
+{
+	PgStat_Counter blocks_fetched;
+	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
+} PgStat_StatRelFileNodeEntry;
+
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -508,7 +518,7 @@ extern long pgstat_report_stat(bool force);
 extern void pgstat_force_next_flush(void);
 
 extern void pgstat_reset_counters(void);
-extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_reset_of_kind(PgStat_Kind kind);
 
 /* stats accessors */
@@ -517,7 +527,7 @@ extern TimestampTz pgstat_get_stat_snapshot_timestamp(bool *have_snapshot);
 
 /* helpers */
 extern PgStat_Kind pgstat_get_kind_from_str(char *kind_str);
-extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
@@ -626,6 +636,10 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
+extern void pgstat_report_relfilenode_blks_written(RelFileLocator locator);
+extern void pgstat_report_relfilenode_buffer_read(Relation reln);
+extern void pgstat_report_relfilenode_buffer_hit(Relation reln);
+
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
  * pgstat_assoc_relation() to do so. See its comment for why this is done
@@ -685,6 +699,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
+extern PgStat_StatRelFileNodeEntry *pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
 														   Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index fb132e439d..47c448d5de 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -53,7 +53,8 @@ typedef struct PgStat_HashKey
 {
 	PgStat_Kind kind;			/* statistics entry kind */
 	Oid			dboid;			/* database ID. InvalidOid for shared objects. */
-	Oid			objoid;			/* object ID, either table or function. */
+	Oid			objoid;			/* object ID, either table or function or tablespace. */
+	RelFileNumber relfile;		/* relfilenumber for RelFileLocator. */
 } PgStat_HashKey;
 
 /*
@@ -390,6 +391,12 @@ typedef struct PgStatShared_Relation
 	PgStat_StatTabEntry stats;
 } PgStatShared_Relation;
 
+typedef struct PgStatShared_RelFileNode
+{
+	PgStatShared_Common header;
+	PgStat_StatRelFileNodeEntry stats;
+} PgStatShared_RelFileNode;
+
 typedef struct PgStatShared_Function
 {
 	PgStatShared_Common header;
@@ -528,6 +535,9 @@ static inline void *pgstat_get_entry_data(PgStat_Kind kind, PgStatShared_Common
 static inline void *pgstat_get_custom_shmem_data(PgStat_Kind kind);
 static inline void *pgstat_get_custom_snapshot_data(PgStat_Kind kind);
 
+extern PgStat_SubXactStatus *pgStatXactStack;
+extern void PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator);
+
 
 /*
  * Functions in pgstat.c
@@ -544,10 +554,12 @@ extern void pgstat_assert_is_up(void);
 #endif
 
 extern void pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref);
-extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry);
-extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
+												  Oid objoid, RelFileNumber relfile,
+												  bool *created_entry);
+extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
-extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_snapshot_fixed(PgStat_Kind kind);
 
 
@@ -619,6 +631,7 @@ extern void AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern bool pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 
 
@@ -639,15 +652,16 @@ extern void pgstat_attach_shmem(void);
 extern void pgstat_detach_shmem(void);
 
 extern PgStat_EntryRef *pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid,
-											 bool create, bool *created_entry);
+											 RelFileNumber relfile, bool create,
+											 bool *created_entry);
 extern bool pgstat_lock_entry(PgStat_EntryRef *entry_ref, bool nowait);
 extern bool pgstat_lock_entry_shared(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_unlock_entry(PgStat_EntryRef *entry_ref);
-extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_drop_all_entries(void);
 extern PgStat_EntryRef *pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-													bool nowait);
-extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts);
+													RelFileNumber relfile, bool nowait);
+extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts);
 extern void pgstat_reset_entries_of_kind(PgStat_Kind kind, TimestampTz ts);
 extern void pgstat_reset_matching_entries(bool (*do_reset) (PgStatShared_HashEntry *, Datum),
 										  Datum match_data,
@@ -694,8 +708,8 @@ extern void pgstat_subscription_reset_timestamp_cb(PgStatShared_Common *header,
  */
 
 extern PgStat_SubXactStatus *pgstat_get_xact_stack_level(int nest_level);
-extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
-extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
+extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 93a7209f69..f9988b5028 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -40,10 +40,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 my $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -64,10 +64,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -81,10 +81,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -95,10 +95,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -114,10 +114,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -130,10 +130,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -292,10 +292,10 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objoid) = @_;
+	my ($kind, $dboid, $objoid, $relfile) = @_;
 
 	return $node->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid)");
+		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid, $relfile)");
 }
 
 sub overwrite_file
diff --git a/src/test/recovery/t/030_stats_cleanup_replica.pl b/src/test/recovery/t/030_stats_cleanup_replica.pl
index 74b516cc7c..317df24c4f 100644
--- a/src/test/recovery/t/030_stats_cleanup_replica.pl
+++ b/src/test/recovery/t/030_stats_cleanup_replica.pl
@@ -179,9 +179,9 @@ sub test_standby_func_tab_stats_status
 	my %stats;
 
 	$stats{rel} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid)");
+		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid, 0)");
 	$stats{func} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('function', $dboid, $funcoid)");
+		"SELECT pg_stat_have_stats('function', $dboid, $funcoid, 0)");
 
 	is_deeply(\%stats, \%expected, "$sect: standby stats as expected");
 
@@ -194,7 +194,7 @@ sub test_standby_db_stats_status
 	my ($connect_db, $dboid, $present) = @_;
 
 	is( $node_standby->safe_psql(
-			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0)"),
+			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0, 0)"),
 		$present,
 		"$sect: standby db stats as expected");
 }
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5201280669..234356a710 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2335,6 +2335,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
     (pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)) AS heap_blks_read,
+    (pg_stat_get_blocks_written(c.oid) + pg_stat_get_relfilenode_blocks_written(d.oid,
+        CASE
+            WHEN (c.reltablespace <> (0)::oid) THEN c.reltablespace
+            ELSE d.dattablespace
+        END, c.relfilenode)) AS heap_blks_written,
     pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
     i.idx_blks_read,
     i.idx_blks_hit,
@@ -2342,7 +2347,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
     x.idx_blks_read AS tidx_blks_read,
     x.idx_blks_hit AS tidx_blks_hit
-   FROM ((((pg_class c
+   FROM pg_database d,
+    ((((pg_class c
      LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2353,7 +2359,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
             (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
            FROM pg_index
           WHERE (pg_index.indrelid = t.oid)) x ON (true))
-  WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]));
+  WHERE ((c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) AND (d.datname = current_database()));
 pg_statio_sys_indexes| SELECT relid,
     indexrelid,
     schemaname,
@@ -2374,6 +2380,7 @@ pg_statio_sys_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
@@ -2403,6 +2410,7 @@ pg_statio_user_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..eff0c9372c 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1111,23 +1111,23 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 ERROR:  invalid statistics kind: "zaphod"
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
  pg_stat_have_stats 
 --------------------
  f
 (1 row)
 
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1144,21 +1144,21 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1174,14 +1174,14 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1196,7 +1196,7 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1204,7 +1204,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1212,7 +1212,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1220,7 +1220,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1228,7 +1228,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1513,7 +1513,7 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 (1 row)
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..5a40779989 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -539,12 +539,12 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
 
 -- pg_stat_have_stats returns true for committed index creation
 CREATE table stats_test_tab1 as select generate_series(1,10) a;
@@ -552,40 +552,40 @@ CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 SET enable_seqscan TO off;
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for rolled back index creation
 BEGIN;
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for reindex CONCURRENTLY
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- put enable_seqscan back to on
 SET enable_seqscan TO on;
@@ -759,7 +759,7 @@ SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
 SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
 SELECT pg_stat_reset_shared('io');
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index fb3e5629b3..1f4ae5efd5 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -263,7 +263,7 @@ $node_subscriber->safe_psql($db, qq(DROP SUBSCRIPTION $sub1_name));
 
 # Subscription stats for sub1 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub1_name' should be removed.));
 
@@ -282,7 +282,7 @@ DROP SUBSCRIPTION $sub2_name;
 
 # Subscription stats for sub2 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6e6b7c2711..7b6b413c03 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2123,6 +2123,7 @@ PgStatShared_InjectionPoint
 PgStatShared_InjectionPointFixed
 PgStatShared_IO
 PgStatShared_Relation
+PgStatShared_RelFileNode
 PgStatShared_ReplSlot
 PgStatShared_SLRU
 PgStatShared_Subscription
-- 
2.34.1

#21Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#20)
1 attachment(s)
Re: relfilenode statistics

Hi,

On Mon, Aug 05, 2024 at 05:28:22AM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Jul 11, 2024 at 06:10:23AM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Jul 11, 2024 at 01:58:19PM +0900, Michael Paquier wrote:

On Wed, Jul 10, 2024 at 01:38:06PM +0000, Bertrand Drouvot wrote:

So, I think it makes sense to link the hashkey to all the RelFileLocator
fields, means:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid)
relfile (linked to RelFileLocator's relNumber)

Hmm. How about using the table OID as objoid,

The issue is that we don't have the relation OID when writing buffers out (that's
one of the reason explained in [1]).

[1]: /messages/by-id/Zl2k8u4HDTUW6QlC@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

Please find attached a mandatory rebase due to the recent changes around
statistics.

As mentioned up-thread:

The attached patch is not in a fully "polished" state yet: there is more places
we should add relfilenode counters, create more APIS to retrieve the relfilenode
stats....

It is in a state that can be used to discuss the approach it is implementing (as
we have done so far) before moving forward.

Please find attached a mandatory rebase.

In passing, checking if based on the previous discussion (and given that we
don't have the relation OID when writing buffers out) you see another approach
that the one this patch is implementing?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-Provide-relfilenode-statistics.patchtext/x-diff; charset=utf-8Download
From 095af2878f8ab85509766807f60e0dadcf0cd018 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Nov 2023 02:30:01 +0000
Subject: [PATCH v4] Provide relfilenode statistics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation.

relfilenode stats is also beneficial for the "Split index and table statistics
into different types of stats" work in progress: it would allow us to avoid
additional branches in some situations.

=== Remarks ===

This is a POC patch. There is still work to do: there is more places we should
add relfilenode counters, create more APIS to retrieve the relfilenode stats,
the patch takes care of rewrite generated by TRUNCATE but there is more to
care about like CLUSTER,VACUUM FULL.

The new logic to retrieve stats in pg_statio_all_tables has been implemented
only for the new blocks_written stat (we'd need to do the same for the existing
buffer read / buffer hit if we agree on the approach implemented here).

The goal of this patch is to start the discussion and agree on the design before
moving forward.
---
 src/backend/access/rmgrdesc/xactdesc.c        |   5 +-
 src/backend/catalog/storage.c                 |   8 ++
 src/backend/catalog/system_functions.sql      |   2 +-
 src/backend/catalog/system_views.sql          |   5 +-
 src/backend/postmaster/checkpointer.c         |   5 +
 src/backend/storage/buffer/bufmgr.c           |   6 +-
 src/backend/storage/smgr/md.c                 |   7 ++
 src/backend/utils/activity/pgstat.c           |  39 ++++--
 src/backend/utils/activity/pgstat_database.c  |  12 +-
 src/backend/utils/activity/pgstat_function.c  |  13 +-
 src/backend/utils/activity/pgstat_relation.c  | 112 ++++++++++++++++--
 src/backend/utils/activity/pgstat_replslot.c  |  13 +-
 src/backend/utils/activity/pgstat_shmem.c     |  19 ++-
 .../utils/activity/pgstat_subscription.c      |  14 +--
 src/backend/utils/activity/pgstat_xact.c      |  60 +++++++---
 src/backend/utils/adt/pgstatfuncs.c           |  34 +++++-
 src/include/access/tableam.h                  |  19 +++
 src/include/access/xact.h                     |   1 +
 src/include/catalog/pg_proc.dat               |  14 ++-
 src/include/pgstat.h                          |  37 ++++--
 src/include/utils/pgstat_internal.h           |  34 ++++--
 src/test/recovery/t/029_stats_restart.pl      |  40 +++----
 .../recovery/t/030_stats_cleanup_replica.pl   |   6 +-
 src/test/regress/expected/rules.out           |  12 +-
 src/test/regress/expected/stats.out           |  30 ++---
 src/test/regress/sql/stats.sql                |  30 ++---
 src/test/subscription/t/026_stats.pl          |   4 +-
 src/tools/pgindent/typedefs.list              |   1 +
 28 files changed, 425 insertions(+), 157 deletions(-)
   4.4% src/backend/catalog/
  46.4% src/backend/utils/activity/
   6.2% src/backend/utils/adt/
   3.6% src/backend/
   3.1% src/include/access/
   3.2% src/include/catalog/
   5.9% src/include/utils/
   6.6% src/include/
  11.7% src/test/recovery/t/
   5.3% src/test/regress/expected/
   3.0% src/

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index dccca201e0..c02b079645 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -319,10 +319,11 @@ xact_desc_stats(StringInfo buf, const char *label,
 		appendStringInfo(buf, "; %sdropped stats:", label);
 		for (i = 0; i < ndropped; i++)
 		{
-			appendStringInfo(buf, " %d/%u/%u",
+			appendStringInfo(buf, " %d/%u/%u/%u",
 							 dropped_stats[i].kind,
 							 dropped_stats[i].dboid,
-							 dropped_stats[i].objoid);
+							 dropped_stats[i].objoid,
+							 dropped_stats[i].relfile);
 		}
 	}
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index f56b3cc0f2..db6107cd90 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -33,6 +33,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 
 /* GUC variables */
@@ -152,6 +153,7 @@ RelationCreateStorage(RelFileLocator rlocator, char relpersistence,
 	if (needs_wal)
 		log_smgrcreate(&srel->smgr_rlocator.locator, MAIN_FORKNUM);
 
+	pgstat_create_transactional(PGSTAT_KIND_RELFILENODE, rlocator.dbOid, rlocator.spcOid, rlocator.relNumber);
 	/*
 	 * Add the relation to the list of stuff to delete at abort, if we are
 	 * asked to do so.
@@ -227,6 +229,8 @@ RelationDropStorage(Relation rel)
 	 * for now I'll keep the logic simple.
 	 */
 
+	pgstat_drop_transactional(PGSTAT_KIND_RELFILENODE, rel->rd_locator.dbOid, rel->rd_locator.spcOid,  rel->rd_locator.relNumber);
+
 	RelationCloseSmgr(rel);
 }
 
@@ -253,6 +257,9 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 	PendingRelDelete *pending;
 	PendingRelDelete *prev;
 	PendingRelDelete *next;
+	PgStat_SubXactStatus *xact_state;
+
+	xact_state = pgStatXactStack;
 
 	prev = NULL;
 	for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -267,6 +274,7 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 			else
 				pendingDeletes = next;
 			pfree(pending);
+			PgStat_RemoveRelFileNodeFromDroppedStats(xact_state, rlocator);
 			/* prev does not change */
 		}
 		else
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..ec60ef72e3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -684,7 +684,7 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM publ
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid, oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_stats(oid) FROM public;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a1..0e13b6ae17 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -746,6 +746,7 @@ CREATE VIEW pg_statio_all_tables AS
             C.relname AS relname,
             pg_stat_get_blocks_fetched(C.oid) -
                     pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
+			pg_stat_get_blocks_written(C.oid) + pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END, C.relfilenode) AS heap_blks_written,
             pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
             I.idx_blks_read AS idx_blks_read,
             I.idx_blks_hit AS idx_blks_hit,
@@ -754,7 +755,7 @@ CREATE VIEW pg_statio_all_tables AS
             pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
             X.idx_blks_read AS tidx_blks_read,
             X.idx_blks_hit AS tidx_blks_hit
-    FROM pg_class C LEFT JOIN
+    FROM pg_database d, pg_class C LEFT JOIN
             pg_class T ON C.reltoastrelid = T.oid
             LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
             LEFT JOIN LATERAL (
@@ -771,7 +772,7 @@ CREATE VIEW pg_statio_all_tables AS
                      sum(pg_stat_get_blocks_hit(indexrelid))::bigint
                      AS idx_blks_hit
               FROM pg_index WHERE indrelid = T.oid ) X ON true
-    WHERE C.relkind IN ('r', 't', 'm');
+    WHERE C.relkind IN ('r', 't', 'm') AND d.datname = current_database();
 
 CREATE VIEW pg_statio_sys_tables AS
     SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index eeb73c8572..fd543f243b 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -519,6 +519,11 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_checkpointer();
 		pgstat_report_wal(true);
+		/*
+		 *  No need to check for transaction state in checkpointer before
+		 *  calling pgstat_report_stat().
+		 */
+		pgstat_report_stat(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4852044300..7b4c92d312 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1159,9 +1159,9 @@ PinBufferForBlock(Relation rel,
 		 * WaitReadBuffers() (so, not for hits, and not for buffers that are
 		 * zeroed instead), the per-relation stats always count them.
 		 */
-		pgstat_count_buffer_read(rel);
+		pgstat_report_relfilenode_buffer_read(rel);
 		if (*foundPtr)
-			pgstat_count_buffer_hit(rel);
+			pgstat_report_relfilenode_buffer_hit(rel);
 	}
 	if (*foundPtr)
 	{
@@ -3877,6 +3877,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 
 	pgBufferUsage.shared_blks_written++;
 
+	pgstat_report_relfilenode_blks_written(reln->smgr_rlocator.locator);
+
 	/*
 	 * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
 	 * end the BM_IO_IN_PROGRESS state.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 6796756358..5bc5fc65cd 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1447,12 +1447,16 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 {
 	SMgrRelation *srels;
 	int			i;
+	int         not_freed_count = 0;
 
 	srels = palloc(sizeof(SMgrRelation) * ndelrels);
 	for (i = 0; i < ndelrels; i++)
 	{
 		SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
 
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELFILENODE, delrels[i].dbOid, delrels[i].spcOid, delrels[i].relNumber))
+			not_freed_count++;
+
 		if (isRedo)
 		{
 			ForkNumber	fork;
@@ -1463,6 +1467,9 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 		srels[i] = srel;
 	}
 
+	if (not_freed_count > 0)
+		pgstat_request_entry_refs_gc();
+
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b2ca3f39b7..035fdb2aa1 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -308,6 +308,19 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
+	[PGSTAT_KIND_RELFILENODE] = {
+		.name = "relfilenode",
+
+		.fixed_amount = false,
+
+		.shared_size = sizeof(PgStatShared_RelFileNode),
+		.shared_data_off = offsetof(PgStatShared_RelFileNode, stats),
+		.shared_data_len = sizeof(((PgStatShared_RelFileNode *) 0)->stats),
+		.pending_size = sizeof(PgStat_StatRelFileNodeEntry),
+
+		.flush_pending_cb = pgstat_relfilenode_flush_cb,
+	},
+
 	[PGSTAT_KIND_FUNCTION] = {
 		.name = "function",
 
@@ -717,7 +730,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / relfilenode / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush IO stats */
@@ -797,7 +810,7 @@ pgstat_reset_counters(void)
  * GRANT system.
  */
 void
-pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 	TimestampTz ts = GetCurrentTimestamp();
@@ -806,7 +819,7 @@ pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
 
 	/* reset the "single counter" */
-	pgstat_reset_entry(kind, dboid, objoid, ts);
+	pgstat_reset_entry(kind, dboid, objoid, relfile, ts);
 
 	if (!kind_info->accessed_across_databases)
 		pgstat_reset_database_timestamp(dboid, ts);
@@ -877,7 +890,7 @@ pgstat_clear_snapshot(void)
 }
 
 void *
-pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_HashKey key;
 	PgStat_EntryRef *entry_ref;
@@ -893,6 +906,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objoid = objoid;
+	key.relfile = relfile;
 
 	/* if we need to build a full snapshot, do so */
 	if (pgstat_fetch_consistency == PGSTAT_FETCH_CONSISTENCY_SNAPSHOT)
@@ -918,7 +932,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 
 	pgStatLocal.snapshot.mode = pgstat_fetch_consistency;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->shared_entry->dropped)
 	{
@@ -987,13 +1001,13 @@ pgstat_get_stat_snapshot_timestamp(bool *have_snapshot)
 }
 
 bool
-pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	/* fixed-numbered stats always exist */
 	if (pgstat_get_kind_info(kind)->fixed_amount)
 		return true;
 
-	return pgstat_get_entry_ref(kind, dboid, objoid, false, NULL) != NULL;
+	return pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL) != NULL;
 }
 
 /*
@@ -1208,7 +1222,8 @@ pgstat_build_snapshot_fixed(PgStat_Kind kind)
  * created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry)
+pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid,
+						  RelFileNumber relfile, bool *created_entry)
 {
 	PgStat_EntryRef *entry_ref;
 
@@ -1223,7 +1238,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
 								  ALLOCSET_SMALL_SIZES);
 	}
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid,
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile,
 									 true, created_entry);
 
 	if (entry_ref->pending == NULL)
@@ -1246,11 +1261,11 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
  * that it shouldn't be needed.
  */
 PgStat_EntryRef *
-pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->pending == NULL)
 		return NULL;
@@ -1279,7 +1294,7 @@ pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
 }
 
 /*
- * Flush out pending stats for database objects (databases, relations,
+ * Flush out pending stats for database objects (databases, relations, relfilenodes,
  * functions).
  */
 static bool
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 29bc090974..cf77f2dbdb 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -43,7 +43,7 @@ static PgStat_Counter pgLastSessionReportTime = 0;
 void
 pgstat_drop_database(Oid databaseid)
 {
-	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid);
+	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid, InvalidOid);
 }
 
 /*
@@ -66,7 +66,7 @@ pgstat_report_autovac(Oid dboid)
 	 * operation so it doesn't matter if we get blocked here a little.
 	 */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE,
-											dboid, InvalidOid, false);
+											dboid, InvalidOid, InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) entry_ref->shared_stats;
 	dbentry->stats.last_autovac_time = GetCurrentTimestamp();
@@ -150,7 +150,7 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
 	 * common enough for that to be a problem.
 	 */
 	entry_ref =
-		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, false);
+		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid, false);
 
 	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
 	sharedent->stats.checksum_failures += failurecount;
@@ -242,7 +242,7 @@ PgStat_StatDBEntry *
 pgstat_fetch_stat_dbentry(Oid dboid)
 {
 	return (PgStat_StatDBEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid);
+		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid);
 }
 
 void
@@ -341,7 +341,7 @@ pgstat_prep_database_pending(Oid dboid)
 	Assert(!OidIsValid(dboid) || OidIsValid(MyDatabaseId));
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid,
-										  NULL);
+										  InvalidOid, NULL);
 
 	return entry_ref->pending;
 }
@@ -357,7 +357,7 @@ pgstat_reset_database_timestamp(Oid dboid, TimestampTz ts)
 	PgStatShared_Database *dbentry;
 
 	dbref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, MyDatabaseId, InvalidOid,
-										false);
+										InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) dbref->shared_stats;
 	dbentry->stats.stat_reset_timestamp = ts;
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d26da551a4..440e44e300 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -46,7 +46,8 @@ pgstat_create_function(Oid proid)
 {
 	pgstat_create_transactional(PGSTAT_KIND_FUNCTION,
 								MyDatabaseId,
-								proid);
+								proid,
+								InvalidOid);
 }
 
 /*
@@ -61,7 +62,8 @@ pgstat_drop_function(Oid proid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_FUNCTION,
 							  MyDatabaseId,
-							  proid);
+							  proid,
+							  InvalidOid);
 }
 
 /*
@@ -86,6 +88,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_FUNCTION,
 										  MyDatabaseId,
 										  fcinfo->flinfo->fn_oid,
+										  InvalidOid,
 										  &created_entry);
 
 	/*
@@ -113,7 +116,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 		if (!SearchSysCacheExists1(PROCOID, ObjectIdGetDatum(fcinfo->flinfo->fn_oid)))
 		{
 			pgstat_drop_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId,
-							  fcinfo->flinfo->fn_oid);
+							  fcinfo->flinfo->fn_oid, InvalidOid);
 			ereport(ERROR, errcode(ERRCODE_UNDEFINED_FUNCTION),
 					errmsg("function call to dropped function"));
 		}
@@ -224,7 +227,7 @@ find_funcstat_entry(Oid func_id)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 
 	if (entry_ref)
 		return entry_ref->pending;
@@ -239,5 +242,5 @@ PgStat_StatFuncEntry *
 pgstat_fetch_stat_funcentry(Oid func_id)
 {
 	return (PgStat_StatFuncEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 }
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 8a3f7d434c..136dd6c85b 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -44,6 +44,7 @@ typedef struct TwoPhasePgStatRecord
 
 
 static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+PgStat_StatRelFileNodeEntry *pgstat_prep_relfilenode_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -69,6 +70,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
 										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
 										  RelationGetRelid(dst),
+										  InvalidOid,
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
@@ -170,7 +172,7 @@ pgstat_create_relation(Relation rel)
 {
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
 								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								RelationGetRelid(rel), InvalidOid);
 }
 
 /*
@@ -184,7 +186,7 @@ pgstat_drop_relation(Relation rel)
 
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
 							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  RelationGetRelid(rel), InvalidOid);
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -225,7 +227,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											dboid, tableoid, InvalidOid, false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -318,6 +320,7 @@ pgstat_report_analyze(Relation rel,
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
 											RelationGetRelid(rel),
+											InvalidOid,
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -458,6 +461,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
 	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
 }
 
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the collected statistics for one relfilenode or NULL. NULL doesn't mean
+ * that the relfilenode doesn't exist, just that there are no statistics, so the
+ * caller is better off to report ZERO instead.
+ */
+PgStat_StatRelFileNodeEntry *
+pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile)
+{
+	return (PgStat_StatRelFileNodeEntry *)
+		pgstat_fetch_entry(PGSTAT_KIND_RELFILENODE, dboid, spcOid, relfile);
+}
+
 /*
  * More efficient version of pgstat_fetch_stat_tabentry(), allowing to specify
  * whether the to-be-accessed table is a shared relation or not.
@@ -468,7 +484,7 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
 	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 
 	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid, InvalidOid);
 }
 
 /*
@@ -491,10 +507,10 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id, InvalidOid);
 	if (!entry_ref)
 	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id, InvalidOid);
 		if (!entry_ref)
 			return tablestatus;
 	}
@@ -881,6 +897,38 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	return true;
 }
 
+/*
+ * Flush out pending stats for the relfilenode entry
+ *
+ * If nowait is true, this function returns false if lock could not
+ * immediately acquired, otherwise true is returned.
+ */
+bool
+pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_RelFileNode *sharedent;
+	PgStat_StatRelFileNodeEntry *pendingent;
+
+	pendingent = (PgStat_StatRelFileNodeEntry *) entry_ref->pending;
+	sharedent = (PgStatShared_RelFileNode *) entry_ref->shared_stats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+#define PGSTAT_ACCUM_RELFILENODECOUNT(item)      \
+		(sharedent)->stats.item += (pendingent)->item
+
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_fetched);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_hit);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_written);
+
+	pgstat_unlock_entry(entry_ref);
+
+	memset(pendingent, 0, sizeof(*pendingent));
+
+	return true;
+}
+
 void
 pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref)
 {
@@ -902,7 +950,7 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
 										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  rel_id, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->id = rel_id;
 	pending->shared = isshared;
@@ -910,6 +958,56 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 	return pending;
 }
 
+PgStat_StatRelFileNodeEntry *
+pgstat_prep_relfilenode_pending(RelFileLocator locator)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELFILENODE, locator.dbOid,
+										  locator.spcOid, locator.relNumber, NULL);
+
+	return entry_ref->pending;
+}
+
+void
+pgstat_report_relfilenode_blks_written(RelFileLocator locator)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	relfileentry = pgstat_prep_relfilenode_pending(locator);
+
+	if (relfileentry)
+		relfileentry->blocks_written++;
+}
+
+void
+pgstat_report_relfilenode_buffer_read(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_read(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_fetched++;
+}
+
+void
+pgstat_report_relfilenode_buffer_hit(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_hit(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_hit++;
+}
+
 /*
  * add a new (sub)transaction state record
  */
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index da11b86744..2e68ed4a09 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -62,7 +62,7 @@ pgstat_reset_replslot(const char *name)
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
-					 ReplicationSlotIndex(slot));
+					 ReplicationSlotIndex(slot), InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 }
@@ -82,7 +82,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
 	PgStat_StatReplSlotEntry *statent;
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 	statent = &shstatent->stats;
 
@@ -116,7 +116,7 @@ pgstat_create_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 
 	/*
@@ -146,7 +146,7 @@ void
 pgstat_acquire_replslot(ReplicationSlot *slot)
 {
 	pgstat_get_entry_ref(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						 ReplicationSlotIndex(slot), true, NULL);
+						 ReplicationSlotIndex(slot), InvalidOid, true, NULL);
 }
 
 /*
@@ -158,7 +158,7 @@ pgstat_drop_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	if (!pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						   ReplicationSlotIndex(slot)))
+						   ReplicationSlotIndex(slot), InvalidOid))
 		pgstat_request_entry_refs_gc();
 }
 
@@ -178,7 +178,7 @@ pgstat_fetch_replslot(NameData slotname)
 
 	if (idx != -1)
 		slotentry = (PgStat_StatReplSlotEntry *) pgstat_fetch_entry(PGSTAT_KIND_REPLSLOT,
-																	InvalidOid, idx);
+																	InvalidOid, idx, InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 
@@ -210,6 +210,7 @@ pgstat_replslot_from_serialized_name_cb(const NameData *name, PgStat_HashKey *ke
 	key->kind = PGSTAT_KIND_REPLSLOT;
 	key->dboid = InvalidOid;
 	key->objoid = idx;
+	key->relfile = InvalidOid;
 
 	return true;
 }
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index ec93bf6902..5eb6a0483a 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -429,10 +429,10 @@ pgstat_get_entry_ref_cached(PgStat_HashKey key, PgStat_EntryRef **entry_ref_p)
  * if the entry is newly created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, bool create,
-					 bool *created_entry)
+pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile,
+					 bool create, bool *created_entry)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shhashent;
 	PgStatShared_Common *shheader = NULL;
 	PgStat_EntryRef *entry_ref;
@@ -645,12 +645,12 @@ pgstat_unlock_entry(PgStat_EntryRef *entry_ref)
  */
 PgStat_EntryRef *
 pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-							bool nowait)
+							RelFileNumber relfile, bool nowait)
 {
 	PgStat_EntryRef *entry_ref;
 
 	/* find shared table stats entry corresponding to the local entry */
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, true, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, true, NULL);
 
 	/* lock the shared entry to protect the content, skip if failed */
 	if (!pgstat_lock_entry(entry_ref, nowait))
@@ -905,9 +905,9 @@ pgstat_drop_database_and_contents(Oid dboid)
  * pgstat_gc_entry_refs().
  */
 bool
-pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shent;
 	bool		freed = true;
 
@@ -980,13 +980,12 @@ shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header,
  * Reset one variable-numbered stats entry.
  */
 void
-pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts)
+pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts)
 {
 	PgStat_EntryRef *entry_ref;
 
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
-
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 	if (!entry_ref || entry_ref->shared_entry->dropped)
 		return;
 
diff --git a/src/backend/utils/activity/pgstat_subscription.c b/src/backend/utils/activity/pgstat_subscription.c
index e06c92727e..417c81246d 100644
--- a/src/backend/utils/activity/pgstat_subscription.c
+++ b/src/backend/utils/activity/pgstat_subscription.c
@@ -30,7 +30,7 @@ pgstat_report_subscription_error(Oid subid, bool is_apply_error)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 
 	if (is_apply_error)
@@ -49,7 +49,7 @@ pgstat_report_subscription_conflict(Oid subid, ConflictType type)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->conflict_count[type]++;
 }
@@ -62,12 +62,12 @@ pgstat_create_subscription(Oid subid)
 {
 	/* Ensures that stats are dropped if transaction rolls back */
 	pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
-								InvalidOid, subid);
+								InvalidOid, subid, InvalidOid);
 
 	/* Create and initialize the subscription stats entry */
-	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid,
+	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid,
 						 true, NULL);
-	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, 0);
+	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid, 0);
 }
 
 /*
@@ -79,7 +79,7 @@ void
 pgstat_drop_subscription(Oid subid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_SUBSCRIPTION,
-							  InvalidOid, subid);
+							  InvalidOid, subid, InvalidOid);
 }
 
 /*
@@ -90,7 +90,7 @@ PgStat_StatSubEntry *
 pgstat_fetch_stat_subscription(Oid subid)
 {
 	return (PgStat_StatSubEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 1877d22f14..b25df5112b 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -30,7 +30,7 @@ static void AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool
 static void AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 											bool isCommit, int nestDepth);
 
-static PgStat_SubXactStatus *pgStatXactStack = NULL;
+PgStat_SubXactStatus *pgStatXactStack = NULL;
 
 
 /*
@@ -84,7 +84,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that dropped an object committed. Drop the stats
 			 * too.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 		else if (!isCommit && pending->is_create)
@@ -93,7 +93,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that created an object aborted. Drop the stats
 			 * associated with the object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 
@@ -105,6 +105,33 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 		pgstat_request_entry_refs_gc();
 }
 
+/*
+ * Remove a relfilenode stat from the list of stats to be dropped.
+ */
+void
+PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator)
+{
+	dlist_mutable_iter iter;
+
+	if (dclist_count(&xact_state->pending_drops) == 0)
+		return;
+
+	dclist_foreach_modify(iter, &xact_state->pending_drops)
+	{
+		PgStat_PendingDroppedStatsItem *pending =
+			dclist_container(PgStat_PendingDroppedStatsItem, node, iter.cur);
+		xl_xact_stats_item *it = &pending->item;
+
+		if (it->kind == PGSTAT_KIND_RELFILENODE && it->dboid == rlocator.dbOid
+			&& it->objoid == rlocator.spcOid && it->relfile == rlocator.relNumber)
+		{
+			dclist_delete_from(&xact_state->pending_drops, &pending->node);
+			pfree(pending);
+			return;
+		}
+	}
+}
+
 /*
  * Called from access/transam/xact.c at subtransaction commit/abort.
  */
@@ -158,7 +185,7 @@ AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 			 * Subtransaction creating a new stats object aborted. Drop the
 			 * stats object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 			pfree(pending);
 		}
@@ -320,7 +347,11 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 	{
 		xl_xact_stats_item *it = &items[i];
 
-		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+		/* leave it to pgstat_drop_transactional() in RelationDropStorage() */
+		if (it->kind == PGSTAT_KIND_RELFILENODE)
+			continue;
+
+		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 			not_freed_count++;
 	}
 
@@ -329,7 +360,7 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 }
 
 static void
-create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool is_create)
+create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, bool is_create)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_SubXactStatus *xact_state;
@@ -342,6 +373,7 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
 	drop->item.kind = kind;
 	drop->item.dboid = dboid;
 	drop->item.objoid = objoid;
+	drop->item.relfile = relfile;
 
 	dclist_push_tail(&xact_state->pending_drops, &drop->node);
 }
@@ -354,18 +386,18 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
  * dropped.
  */
 void
-pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objoid, false, NULL))
+	if (pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL))
 	{
 		ereport(WARNING,
-				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u",
-					   (pgstat_get_kind_info(kind))->name, dboid, objoid));
+				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u, relfile=%u",
+					   (pgstat_get_kind_info(kind))->name, dboid, objoid, relfile));
 
-		pgstat_reset(kind, dboid, objoid);
+		pgstat_reset(kind, dboid, objoid, relfile);
 	}
 
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ true);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ true);
 }
 
 /*
@@ -376,7 +408,7 @@ pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
  * alive.
  */
 void
-pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ false);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ false);
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97dc09ac0d..443687947a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -106,6 +106,30 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated)
 /* pg_stat_get_vacuum_count */
 PG_STAT_GET_RELENTRY_INT64(vacuum_count)
 
+#define PG_STAT_GET_RELFILEENTRY_INT64(stat)						\
+Datum															\
+CppConcat(pg_stat_get_relfilenode_,stat)(PG_FUNCTION_ARGS)					\
+{																\
+	Oid			dboid = PG_GETARG_OID(0);						\
+	Oid			 spcOid = PG_GETARG_OID(1);						\
+	RelFileNumber			 relfile = PG_GETARG_OID(2);						\
+	int64		result;											\
+	PgStat_StatRelFileNodeEntry *relfileentry;								\
+																\
+	if ((relfileentry = pgstat_fetch_stat_relfilenodeentry(dboid, spcOid, relfile)) == NULL)	\
+		result = 0;												\
+	else														\
+		result = (int64) (relfileentry->stat);						\
+																\
+	PG_RETURN_INT64(result);									\
+}
+
+/* pg_stat_get_relfilenode_blocks_written */
+PG_STAT_GET_RELFILEENTRY_INT64(blocks_written)
+
+/* pg_stat_get_blocks_written */
+PG_STAT_GET_RELENTRY_INT64(blocks_written)
+
 #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat)					\
 Datum															\
 CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)					\
@@ -1752,7 +1776,7 @@ pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 	Oid			taboid = PG_GETARG_OID(0);
 	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1762,7 +1786,7 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 {
 	Oid			funcoid = PG_GETARG_OID(0);
 
-	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid);
+	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1820,7 +1844,7 @@ pg_stat_reset_subscription_stats(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid subscription OID %u", subid)));
-		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 	}
 
 	PG_RETURN_VOID();
@@ -2047,7 +2071,9 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	char	   *stats_type = text_to_cstring(PG_GETARG_TEXT_P(0));
 	Oid			dboid = PG_GETARG_OID(1);
 	Oid			objoid = PG_GETARG_OID(2);
+	Oid			relfile = PG_GETARG_OID(3);
+
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
-	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
+	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid, relfile));
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index da661289c1..3614bae63c 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,7 +21,9 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "executor/tuptable.h"
+#include "pgstat.h"
 #include "storage/read_stream.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -1624,6 +1626,23 @@ table_relation_set_new_filelocator(Relation rel,
 								   TransactionId *freezeXid,
 								   MultiXactId *minmulti)
 {
+	PgStat_StatRelFileNodeEntry *relfileentry;
+	PgStat_StatTabEntry *tabentry = NULL;
+	PgStat_EntryRef *entry_ref = NULL;
+	PgStatShared_Relation *shtabentry;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_RELATION, MyDatabaseId, rel->rd_id, InvalidOid, false, NULL);
+	if (entry_ref)
+	{
+		shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
+		tabentry = &shtabentry->stats;
+	}
+
+	relfileentry = pgstat_fetch_stat_relfilenodeentry(rel->rd_locator.dbOid, rel->rd_locator.spcOid, rel->rd_locator.relNumber);
+
+	if (tabentry && relfileentry)
+		tabentry->blocks_written += relfileentry->blocks_written;
+
 	rel->rd_tableam->relation_set_new_filelocator(rel, newrlocator,
 												  persistence, freezeXid,
 												  minmulti);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 6d4439f052..3b9ed65ff6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -284,6 +284,7 @@ typedef struct xl_xact_stats_item
 	int			kind;
 	Oid			dboid;
 	Oid			objoid;
+	RelFileNumber relfile;
 } xl_xact_stats_item;
 
 typedef struct xl_xact_stats_items
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ff5436acac..c098d58753 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5407,6 +5407,14 @@
   proname => 'pg_stat_get_tuples_updated', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_tuples_updated' },
+{ oid => '9280', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_relfilenode_blocks_written', provolatile => 's',
+  proparallel => 'r',
+  proargtypes => 'oid oid oid',
+  prorettype => 'int8',
+  proallargtypes => '{oid,oid,oid,int8}',
+  proargmodes => '{i,i,i,o}',
+  prosrc => 'pg_stat_get_relfilenode_blocks_written' },
 { oid => '1933', descr => 'statistics: number of tuples deleted',
   proname => 'pg_stat_get_tuples_deleted', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5446,6 +5454,10 @@
   proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8438', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_blocks_written', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_blocks_written' },
 { oid => '2781', descr => 'statistics: last manual vacuum time for a table',
   proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
   proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5532,7 +5544,7 @@
 
 { oid => '6230', descr => 'statistics: check if a stats object exists',
   proname => 'pg_stat_have_stats', provolatile => 'v', proparallel => 'r',
-  prorettype => 'bool', proargtypes => 'text oid oid',
+  prorettype => 'bool', proargtypes => 'text oid oid oid',
   prosrc => 'pg_stat_have_stats' },
 
 { oid => '6231', descr => 'statistics: information about subscription stats',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a..afb913a336 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -46,17 +47,18 @@
 /* stats for variable-numbered objects */
 #define PGSTAT_KIND_DATABASE	1	/* database-wide statistics */
 #define PGSTAT_KIND_RELATION	2	/* per-table statistics */
-#define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
-#define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
-#define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_RELFILENODE 3   /* per-relfilenode statistics */
+#define PGSTAT_KIND_FUNCTION	4	/* per-function statistics */
+#define PGSTAT_KIND_REPLSLOT	5	/* per-slot statistics */
+#define PGSTAT_KIND_SUBSCRIPTION	6	/* per-subscription statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -450,6 +452,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
 
 	TimestampTz last_vacuum_time;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
@@ -461,6 +464,13 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+typedef struct PgStat_StatRelFileNodeEntry
+{
+	PgStat_Counter blocks_fetched;
+	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
+} PgStat_StatRelFileNodeEntry;
+
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -511,7 +521,7 @@ extern long pgstat_report_stat(bool force);
 extern void pgstat_force_next_flush(void);
 
 extern void pgstat_reset_counters(void);
-extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_reset_of_kind(PgStat_Kind kind);
 
 /* stats accessors */
@@ -520,7 +530,7 @@ extern TimestampTz pgstat_get_stat_snapshot_timestamp(bool *have_snapshot);
 
 /* helpers */
 extern PgStat_Kind pgstat_get_kind_from_str(char *kind_str);
-extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
@@ -629,6 +639,10 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
+extern void pgstat_report_relfilenode_blks_written(RelFileLocator locator);
+extern void pgstat_report_relfilenode_buffer_read(Relation reln);
+extern void pgstat_report_relfilenode_buffer_hit(Relation reln);
+
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
  * pgstat_assoc_relation() to do so. See its comment for why this is done
@@ -688,6 +702,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
+extern PgStat_StatRelFileNodeEntry *pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
 														   Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index fb132e439d..47c448d5de 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -53,7 +53,8 @@ typedef struct PgStat_HashKey
 {
 	PgStat_Kind kind;			/* statistics entry kind */
 	Oid			dboid;			/* database ID. InvalidOid for shared objects. */
-	Oid			objoid;			/* object ID, either table or function. */
+	Oid			objoid;			/* object ID, either table or function or tablespace. */
+	RelFileNumber relfile;		/* relfilenumber for RelFileLocator. */
 } PgStat_HashKey;
 
 /*
@@ -390,6 +391,12 @@ typedef struct PgStatShared_Relation
 	PgStat_StatTabEntry stats;
 } PgStatShared_Relation;
 
+typedef struct PgStatShared_RelFileNode
+{
+	PgStatShared_Common header;
+	PgStat_StatRelFileNodeEntry stats;
+} PgStatShared_RelFileNode;
+
 typedef struct PgStatShared_Function
 {
 	PgStatShared_Common header;
@@ -528,6 +535,9 @@ static inline void *pgstat_get_entry_data(PgStat_Kind kind, PgStatShared_Common
 static inline void *pgstat_get_custom_shmem_data(PgStat_Kind kind);
 static inline void *pgstat_get_custom_snapshot_data(PgStat_Kind kind);
 
+extern PgStat_SubXactStatus *pgStatXactStack;
+extern void PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator);
+
 
 /*
  * Functions in pgstat.c
@@ -544,10 +554,12 @@ extern void pgstat_assert_is_up(void);
 #endif
 
 extern void pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref);
-extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry);
-extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
+												  Oid objoid, RelFileNumber relfile,
+												  bool *created_entry);
+extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
-extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_snapshot_fixed(PgStat_Kind kind);
 
 
@@ -619,6 +631,7 @@ extern void AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern bool pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 
 
@@ -639,15 +652,16 @@ extern void pgstat_attach_shmem(void);
 extern void pgstat_detach_shmem(void);
 
 extern PgStat_EntryRef *pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid,
-											 bool create, bool *created_entry);
+											 RelFileNumber relfile, bool create,
+											 bool *created_entry);
 extern bool pgstat_lock_entry(PgStat_EntryRef *entry_ref, bool nowait);
 extern bool pgstat_lock_entry_shared(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_unlock_entry(PgStat_EntryRef *entry_ref);
-extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_drop_all_entries(void);
 extern PgStat_EntryRef *pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-													bool nowait);
-extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts);
+													RelFileNumber relfile, bool nowait);
+extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts);
 extern void pgstat_reset_entries_of_kind(PgStat_Kind kind, TimestampTz ts);
 extern void pgstat_reset_matching_entries(bool (*do_reset) (PgStatShared_HashEntry *, Datum),
 										  Datum match_data,
@@ -694,8 +708,8 @@ extern void pgstat_subscription_reset_timestamp_cb(PgStatShared_Common *header,
  */
 
 extern PgStat_SubXactStatus *pgstat_get_xact_stack_level(int nest_level);
-extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
-extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
+extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 93a7209f69..f9988b5028 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -40,10 +40,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 my $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -64,10 +64,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -81,10 +81,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -95,10 +95,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -114,10 +114,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -130,10 +130,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -292,10 +292,10 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objoid) = @_;
+	my ($kind, $dboid, $objoid, $relfile) = @_;
 
 	return $node->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid)");
+		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid, $relfile)");
 }
 
 sub overwrite_file
diff --git a/src/test/recovery/t/030_stats_cleanup_replica.pl b/src/test/recovery/t/030_stats_cleanup_replica.pl
index 74b516cc7c..317df24c4f 100644
--- a/src/test/recovery/t/030_stats_cleanup_replica.pl
+++ b/src/test/recovery/t/030_stats_cleanup_replica.pl
@@ -179,9 +179,9 @@ sub test_standby_func_tab_stats_status
 	my %stats;
 
 	$stats{rel} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid)");
+		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid, 0)");
 	$stats{func} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('function', $dboid, $funcoid)");
+		"SELECT pg_stat_have_stats('function', $dboid, $funcoid, 0)");
 
 	is_deeply(\%stats, \%expected, "$sect: standby stats as expected");
 
@@ -194,7 +194,7 @@ sub test_standby_db_stats_status
 	my ($connect_db, $dboid, $present) = @_;
 
 	is( $node_standby->safe_psql(
-			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0)"),
+			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0, 0)"),
 		$present,
 		"$sect: standby db stats as expected");
 }
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae..a9b3f36cd9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2340,6 +2340,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
     (pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)) AS heap_blks_read,
+    (pg_stat_get_blocks_written(c.oid) + pg_stat_get_relfilenode_blocks_written(d.oid,
+        CASE
+            WHEN (c.reltablespace <> (0)::oid) THEN c.reltablespace
+            ELSE d.dattablespace
+        END, c.relfilenode)) AS heap_blks_written,
     pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
     i.idx_blks_read,
     i.idx_blks_hit,
@@ -2347,7 +2352,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
     x.idx_blks_read AS tidx_blks_read,
     x.idx_blks_hit AS tidx_blks_hit
-   FROM ((((pg_class c
+   FROM pg_database d,
+    ((((pg_class c
      LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2358,7 +2364,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
             (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
            FROM pg_index
           WHERE (pg_index.indrelid = t.oid)) x ON (true))
-  WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]));
+  WHERE ((c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) AND (d.datname = current_database()));
 pg_statio_sys_indexes| SELECT relid,
     indexrelid,
     schemaname,
@@ -2379,6 +2385,7 @@ pg_statio_sys_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
@@ -2408,6 +2415,7 @@ pg_statio_user_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..eff0c9372c 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1111,23 +1111,23 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 ERROR:  invalid statistics kind: "zaphod"
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
  pg_stat_have_stats 
 --------------------
  f
 (1 row)
 
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1144,21 +1144,21 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1174,14 +1174,14 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1196,7 +1196,7 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1204,7 +1204,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1212,7 +1212,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1220,7 +1220,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1228,7 +1228,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1513,7 +1513,7 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 (1 row)
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..5a40779989 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -539,12 +539,12 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
 
 -- pg_stat_have_stats returns true for committed index creation
 CREATE table stats_test_tab1 as select generate_series(1,10) a;
@@ -552,40 +552,40 @@ CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 SET enable_seqscan TO off;
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for rolled back index creation
 BEGIN;
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for reindex CONCURRENTLY
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- put enable_seqscan back to on
 SET enable_seqscan TO on;
@@ -759,7 +759,7 @@ SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
 SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
 SELECT pg_stat_reset_shared('io');
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 6b6a5b0b1b..89ebf5aa2c 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -290,7 +290,7 @@ $node_subscriber->safe_psql($db, qq(DROP SUBSCRIPTION $sub1_name));
 
 # Subscription stats for sub1 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub1_name' should be removed.));
 
@@ -309,7 +309,7 @@ DROP SUBSCRIPTION $sub2_name;
 
 # Subscription stats for sub2 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index df3f336bec..1682273876 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2126,6 +2126,7 @@ PgStatShared_InjectionPoint
 PgStatShared_InjectionPointFixed
 PgStatShared_IO
 PgStatShared_Relation
+PgStatShared_RelFileNode
 PgStatShared_ReplSlot
 PgStatShared_SLRU
 PgStatShared_Subscription
-- 
2.34.1

#22Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#21)
1 attachment(s)
Re: relfilenode statistics

Hi,

On Thu, Sep 05, 2024 at 04:48:36AM +0000, Bertrand Drouvot wrote:

Please find attached a mandatory rebase.

In passing, checking if based on the previous discussion (and given that we
don't have the relation OID when writing buffers out) you see another approach
that the one this patch is implementing?

Attached v5, mandatory rebase due to recent changes in the stats area.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-Provide-relfilenode-statistics.patchtext/x-diff; charset=utf-8Download
From 027a0df0d560e51d62675bda750e84165097812a Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Nov 2023 02:30:01 +0000
Subject: [PATCH v5] Provide relfilenode statistics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation.

relfilenode stats is also beneficial for the "Split index and table statistics
into different types of stats" work in progress: it would allow us to avoid
additional branches in some situations.

=== Remarks ===

This is a POC patch. There is still work to do: there is more places we should
add relfilenode counters, create more APIS to retrieve the relfilenode stats,
the patch takes care of rewrite generated by TRUNCATE but there is more to
care about like CLUSTER,VACUUM FULL.

The new logic to retrieve stats in pg_statio_all_tables has been implemented
only for the new blocks_written stat (we'd need to do the same for the existing
buffer read / buffer hit if we agree on the approach implemented here).

The goal of this patch is to start the discussion and agree on the design before
moving forward.
---
 src/backend/access/rmgrdesc/xactdesc.c        |   5 +-
 src/backend/catalog/storage.c                 |   8 ++
 src/backend/catalog/system_functions.sql      |   2 +-
 src/backend/catalog/system_views.sql          |   5 +-
 src/backend/postmaster/checkpointer.c         |   5 +
 src/backend/storage/buffer/bufmgr.c           |   6 +-
 src/backend/storage/smgr/md.c                 |   7 ++
 src/backend/utils/activity/pgstat.c           |  39 ++++--
 src/backend/utils/activity/pgstat_database.c  |  12 +-
 src/backend/utils/activity/pgstat_function.c  |  13 +-
 src/backend/utils/activity/pgstat_relation.c  | 112 ++++++++++++++++--
 src/backend/utils/activity/pgstat_replslot.c  |  13 +-
 src/backend/utils/activity/pgstat_shmem.c     |  19 ++-
 .../utils/activity/pgstat_subscription.c      |  14 +--
 src/backend/utils/activity/pgstat_xact.c      |  60 +++++++---
 src/backend/utils/adt/pgstatfuncs.c           |  34 +++++-
 src/include/access/tableam.h                  |  19 +++
 src/include/access/xact.h                     |   1 +
 src/include/catalog/pg_proc.dat               |  14 ++-
 src/include/pgstat.h                          |  37 ++++--
 src/include/utils/pgstat_internal.h           |  34 ++++--
 src/test/recovery/t/029_stats_restart.pl      |  40 +++----
 .../recovery/t/030_stats_cleanup_replica.pl   |   6 +-
 src/test/regress/expected/rules.out           |  12 +-
 src/test/regress/expected/stats.out           |  30 ++---
 src/test/regress/sql/stats.sql                |  30 ++---
 src/test/subscription/t/026_stats.pl          |   4 +-
 src/tools/pgindent/typedefs.list              |   1 +
 28 files changed, 425 insertions(+), 157 deletions(-)
   4.4% src/backend/catalog/
  46.4% src/backend/utils/activity/
   6.2% src/backend/utils/adt/
   3.6% src/backend/
   3.1% src/include/access/
   3.2% src/include/catalog/
   5.9% src/include/utils/
   6.6% src/include/
  11.7% src/test/recovery/t/
   5.3% src/test/regress/expected/
   3.0% src/

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index dccca201e0..c02b079645 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -319,10 +319,11 @@ xact_desc_stats(StringInfo buf, const char *label,
 		appendStringInfo(buf, "; %sdropped stats:", label);
 		for (i = 0; i < ndropped; i++)
 		{
-			appendStringInfo(buf, " %d/%u/%u",
+			appendStringInfo(buf, " %d/%u/%u/%u",
 							 dropped_stats[i].kind,
 							 dropped_stats[i].dboid,
-							 dropped_stats[i].objoid);
+							 dropped_stats[i].objoid,
+							 dropped_stats[i].relfile);
 		}
 	}
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index f56b3cc0f2..db6107cd90 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -33,6 +33,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 
 /* GUC variables */
@@ -152,6 +153,7 @@ RelationCreateStorage(RelFileLocator rlocator, char relpersistence,
 	if (needs_wal)
 		log_smgrcreate(&srel->smgr_rlocator.locator, MAIN_FORKNUM);
 
+	pgstat_create_transactional(PGSTAT_KIND_RELFILENODE, rlocator.dbOid, rlocator.spcOid, rlocator.relNumber);
 	/*
 	 * Add the relation to the list of stuff to delete at abort, if we are
 	 * asked to do so.
@@ -227,6 +229,8 @@ RelationDropStorage(Relation rel)
 	 * for now I'll keep the logic simple.
 	 */
 
+	pgstat_drop_transactional(PGSTAT_KIND_RELFILENODE, rel->rd_locator.dbOid, rel->rd_locator.spcOid,  rel->rd_locator.relNumber);
+
 	RelationCloseSmgr(rel);
 }
 
@@ -253,6 +257,9 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 	PendingRelDelete *pending;
 	PendingRelDelete *prev;
 	PendingRelDelete *next;
+	PgStat_SubXactStatus *xact_state;
+
+	xact_state = pgStatXactStack;
 
 	prev = NULL;
 	for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -267,6 +274,7 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 			else
 				pendingDeletes = next;
 			pfree(pending);
+			PgStat_RemoveRelFileNodeFromDroppedStats(xact_state, rlocator);
 			/* prev does not change */
 		}
 		else
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 623b9539b1..ec60ef72e3 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -684,7 +684,7 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM publ
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, oid, oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_stats(oid) FROM public;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a1..0e13b6ae17 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -746,6 +746,7 @@ CREATE VIEW pg_statio_all_tables AS
             C.relname AS relname,
             pg_stat_get_blocks_fetched(C.oid) -
                     pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
+			pg_stat_get_blocks_written(C.oid) + pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END, C.relfilenode) AS heap_blks_written,
             pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
             I.idx_blks_read AS idx_blks_read,
             I.idx_blks_hit AS idx_blks_hit,
@@ -754,7 +755,7 @@ CREATE VIEW pg_statio_all_tables AS
             pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
             X.idx_blks_read AS tidx_blks_read,
             X.idx_blks_hit AS tidx_blks_hit
-    FROM pg_class C LEFT JOIN
+    FROM pg_database d, pg_class C LEFT JOIN
             pg_class T ON C.reltoastrelid = T.oid
             LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
             LEFT JOIN LATERAL (
@@ -771,7 +772,7 @@ CREATE VIEW pg_statio_all_tables AS
                      sum(pg_stat_get_blocks_hit(indexrelid))::bigint
                      AS idx_blks_hit
               FROM pg_index WHERE indrelid = T.oid ) X ON true
-    WHERE C.relkind IN ('r', 't', 'm');
+    WHERE C.relkind IN ('r', 't', 'm') AND d.datname = current_database();
 
 CREATE VIEW pg_statio_sys_tables AS
     SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index eeb73c8572..fd543f243b 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -519,6 +519,11 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_checkpointer();
 		pgstat_report_wal(true);
+		/*
+		 *  No need to check for transaction state in checkpointer before
+		 *  calling pgstat_report_stat().
+		 */
+		pgstat_report_stat(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4852044300..7b4c92d312 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1159,9 +1159,9 @@ PinBufferForBlock(Relation rel,
 		 * WaitReadBuffers() (so, not for hits, and not for buffers that are
 		 * zeroed instead), the per-relation stats always count them.
 		 */
-		pgstat_count_buffer_read(rel);
+		pgstat_report_relfilenode_buffer_read(rel);
 		if (*foundPtr)
-			pgstat_count_buffer_hit(rel);
+			pgstat_report_relfilenode_buffer_hit(rel);
 	}
 	if (*foundPtr)
 	{
@@ -3877,6 +3877,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 
 	pgBufferUsage.shared_blks_written++;
 
+	pgstat_report_relfilenode_blks_written(reln->smgr_rlocator.locator);
+
 	/*
 	 * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
 	 * end the BM_IO_IN_PROGRESS state.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 6796756358..5bc5fc65cd 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1447,12 +1447,16 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 {
 	SMgrRelation *srels;
 	int			i;
+	int         not_freed_count = 0;
 
 	srels = palloc(sizeof(SMgrRelation) * ndelrels);
 	for (i = 0; i < ndelrels; i++)
 	{
 		SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
 
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELFILENODE, delrels[i].dbOid, delrels[i].spcOid, delrels[i].relNumber))
+			not_freed_count++;
+
 		if (isRedo)
 		{
 			ForkNumber	fork;
@@ -1463,6 +1467,9 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 		srels[i] = srel;
 	}
 
+	if (not_freed_count > 0)
+		pgstat_request_entry_refs_gc();
+
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index a7f2dfc744..241dced63c 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -308,6 +308,19 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
+	[PGSTAT_KIND_RELFILENODE] = {
+		.name = "relfilenode",
+
+		.fixed_amount = false,
+
+		.shared_size = sizeof(PgStatShared_RelFileNode),
+		.shared_data_off = offsetof(PgStatShared_RelFileNode, stats),
+		.shared_data_len = sizeof(((PgStatShared_RelFileNode *) 0)->stats),
+		.pending_size = sizeof(PgStat_StatRelFileNodeEntry),
+
+		.flush_pending_cb = pgstat_relfilenode_flush_cb,
+	},
+
 	[PGSTAT_KIND_FUNCTION] = {
 		.name = "function",
 
@@ -757,7 +770,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / relfilenode / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush of fixed-numbered stats */
@@ -846,7 +859,7 @@ pgstat_reset_counters(void)
  * GRANT system.
  */
 void
-pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 	TimestampTz ts = GetCurrentTimestamp();
@@ -855,7 +868,7 @@ pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid)
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
 
 	/* reset the "single counter" */
-	pgstat_reset_entry(kind, dboid, objoid, ts);
+	pgstat_reset_entry(kind, dboid, objoid, relfile, ts);
 
 	if (!kind_info->accessed_across_databases)
 		pgstat_reset_database_timestamp(dboid, ts);
@@ -926,7 +939,7 @@ pgstat_clear_snapshot(void)
 }
 
 void *
-pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_HashKey key;
 	PgStat_EntryRef *entry_ref;
@@ -942,6 +955,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objoid = objoid;
+	key.relfile = relfile;
 
 	/* if we need to build a full snapshot, do so */
 	if (pgstat_fetch_consistency == PGSTAT_FETCH_CONSISTENCY_SNAPSHOT)
@@ -967,7 +981,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
 
 	pgStatLocal.snapshot.mode = pgstat_fetch_consistency;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->shared_entry->dropped)
 	{
@@ -1036,13 +1050,13 @@ pgstat_get_stat_snapshot_timestamp(bool *have_snapshot)
 }
 
 bool
-pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	/* fixed-numbered stats always exist */
 	if (pgstat_get_kind_info(kind)->fixed_amount)
 		return true;
 
-	return pgstat_get_entry_ref(kind, dboid, objoid, false, NULL) != NULL;
+	return pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL) != NULL;
 }
 
 /*
@@ -1257,7 +1271,8 @@ pgstat_build_snapshot_fixed(PgStat_Kind kind)
  * created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry)
+pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid,
+						  RelFileNumber relfile, bool *created_entry)
 {
 	PgStat_EntryRef *entry_ref;
 
@@ -1272,7 +1287,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
 								  ALLOCSET_SMALL_SIZES);
 	}
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid,
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile,
 									 true, created_entry);
 
 	if (entry_ref->pending == NULL)
@@ -1295,11 +1310,11 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created
  * that it shouldn't be needed.
  */
 PgStat_EntryRef *
-pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->pending == NULL)
 		return NULL;
@@ -1328,7 +1343,7 @@ pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
 }
 
 /*
- * Flush out pending stats for database objects (databases, relations,
+ * Flush out pending stats for database objects (databases, relations, relfilenodes,
  * functions).
  */
 static bool
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 29bc090974..cf77f2dbdb 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -43,7 +43,7 @@ static PgStat_Counter pgLastSessionReportTime = 0;
 void
 pgstat_drop_database(Oid databaseid)
 {
-	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid);
+	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid, InvalidOid);
 }
 
 /*
@@ -66,7 +66,7 @@ pgstat_report_autovac(Oid dboid)
 	 * operation so it doesn't matter if we get blocked here a little.
 	 */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE,
-											dboid, InvalidOid, false);
+											dboid, InvalidOid, InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) entry_ref->shared_stats;
 	dbentry->stats.last_autovac_time = GetCurrentTimestamp();
@@ -150,7 +150,7 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
 	 * common enough for that to be a problem.
 	 */
 	entry_ref =
-		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, false);
+		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid, false);
 
 	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
 	sharedent->stats.checksum_failures += failurecount;
@@ -242,7 +242,7 @@ PgStat_StatDBEntry *
 pgstat_fetch_stat_dbentry(Oid dboid)
 {
 	return (PgStat_StatDBEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid);
+		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid);
 }
 
 void
@@ -341,7 +341,7 @@ pgstat_prep_database_pending(Oid dboid)
 	Assert(!OidIsValid(dboid) || OidIsValid(MyDatabaseId));
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid,
-										  NULL);
+										  InvalidOid, NULL);
 
 	return entry_ref->pending;
 }
@@ -357,7 +357,7 @@ pgstat_reset_database_timestamp(Oid dboid, TimestampTz ts)
 	PgStatShared_Database *dbentry;
 
 	dbref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, MyDatabaseId, InvalidOid,
-										false);
+										InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) dbref->shared_stats;
 	dbentry->stats.stat_reset_timestamp = ts;
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d26da551a4..440e44e300 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -46,7 +46,8 @@ pgstat_create_function(Oid proid)
 {
 	pgstat_create_transactional(PGSTAT_KIND_FUNCTION,
 								MyDatabaseId,
-								proid);
+								proid,
+								InvalidOid);
 }
 
 /*
@@ -61,7 +62,8 @@ pgstat_drop_function(Oid proid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_FUNCTION,
 							  MyDatabaseId,
-							  proid);
+							  proid,
+							  InvalidOid);
 }
 
 /*
@@ -86,6 +88,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_FUNCTION,
 										  MyDatabaseId,
 										  fcinfo->flinfo->fn_oid,
+										  InvalidOid,
 										  &created_entry);
 
 	/*
@@ -113,7 +116,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 		if (!SearchSysCacheExists1(PROCOID, ObjectIdGetDatum(fcinfo->flinfo->fn_oid)))
 		{
 			pgstat_drop_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId,
-							  fcinfo->flinfo->fn_oid);
+							  fcinfo->flinfo->fn_oid, InvalidOid);
 			ereport(ERROR, errcode(ERRCODE_UNDEFINED_FUNCTION),
 					errmsg("function call to dropped function"));
 		}
@@ -224,7 +227,7 @@ find_funcstat_entry(Oid func_id)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 
 	if (entry_ref)
 		return entry_ref->pending;
@@ -239,5 +242,5 @@ PgStat_StatFuncEntry *
 pgstat_fetch_stat_funcentry(Oid func_id)
 {
 	return (PgStat_StatFuncEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 }
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 8a3f7d434c..136dd6c85b 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -44,6 +44,7 @@ typedef struct TwoPhasePgStatRecord
 
 
 static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+PgStat_StatRelFileNodeEntry *pgstat_prep_relfilenode_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -69,6 +70,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
 										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
 										  RelationGetRelid(dst),
+										  InvalidOid,
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
@@ -170,7 +172,7 @@ pgstat_create_relation(Relation rel)
 {
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
 								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								RelationGetRelid(rel), InvalidOid);
 }
 
 /*
@@ -184,7 +186,7 @@ pgstat_drop_relation(Relation rel)
 
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
 							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  RelationGetRelid(rel), InvalidOid);
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -225,7 +227,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											dboid, tableoid, InvalidOid, false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -318,6 +320,7 @@ pgstat_report_analyze(Relation rel,
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
 											RelationGetRelid(rel),
+											InvalidOid,
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -458,6 +461,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
 	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
 }
 
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the collected statistics for one relfilenode or NULL. NULL doesn't mean
+ * that the relfilenode doesn't exist, just that there are no statistics, so the
+ * caller is better off to report ZERO instead.
+ */
+PgStat_StatRelFileNodeEntry *
+pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile)
+{
+	return (PgStat_StatRelFileNodeEntry *)
+		pgstat_fetch_entry(PGSTAT_KIND_RELFILENODE, dboid, spcOid, relfile);
+}
+
 /*
  * More efficient version of pgstat_fetch_stat_tabentry(), allowing to specify
  * whether the to-be-accessed table is a shared relation or not.
@@ -468,7 +484,7 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
 	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 
 	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid, InvalidOid);
 }
 
 /*
@@ -491,10 +507,10 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id, InvalidOid);
 	if (!entry_ref)
 	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id, InvalidOid);
 		if (!entry_ref)
 			return tablestatus;
 	}
@@ -881,6 +897,38 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	return true;
 }
 
+/*
+ * Flush out pending stats for the relfilenode entry
+ *
+ * If nowait is true, this function returns false if lock could not
+ * immediately acquired, otherwise true is returned.
+ */
+bool
+pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_RelFileNode *sharedent;
+	PgStat_StatRelFileNodeEntry *pendingent;
+
+	pendingent = (PgStat_StatRelFileNodeEntry *) entry_ref->pending;
+	sharedent = (PgStatShared_RelFileNode *) entry_ref->shared_stats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+#define PGSTAT_ACCUM_RELFILENODECOUNT(item)      \
+		(sharedent)->stats.item += (pendingent)->item
+
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_fetched);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_hit);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_written);
+
+	pgstat_unlock_entry(entry_ref);
+
+	memset(pendingent, 0, sizeof(*pendingent));
+
+	return true;
+}
+
 void
 pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref)
 {
@@ -902,7 +950,7 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
 										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  rel_id, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->id = rel_id;
 	pending->shared = isshared;
@@ -910,6 +958,56 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 	return pending;
 }
 
+PgStat_StatRelFileNodeEntry *
+pgstat_prep_relfilenode_pending(RelFileLocator locator)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELFILENODE, locator.dbOid,
+										  locator.spcOid, locator.relNumber, NULL);
+
+	return entry_ref->pending;
+}
+
+void
+pgstat_report_relfilenode_blks_written(RelFileLocator locator)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	relfileentry = pgstat_prep_relfilenode_pending(locator);
+
+	if (relfileentry)
+		relfileentry->blocks_written++;
+}
+
+void
+pgstat_report_relfilenode_buffer_read(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_read(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_fetched++;
+}
+
+void
+pgstat_report_relfilenode_buffer_hit(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_hit(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_hit++;
+}
+
 /*
  * add a new (sub)transaction state record
  */
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index da11b86744..2e68ed4a09 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -62,7 +62,7 @@ pgstat_reset_replslot(const char *name)
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
-					 ReplicationSlotIndex(slot));
+					 ReplicationSlotIndex(slot), InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 }
@@ -82,7 +82,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
 	PgStat_StatReplSlotEntry *statent;
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 	statent = &shstatent->stats;
 
@@ -116,7 +116,7 @@ pgstat_create_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 
 	/*
@@ -146,7 +146,7 @@ void
 pgstat_acquire_replslot(ReplicationSlot *slot)
 {
 	pgstat_get_entry_ref(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						 ReplicationSlotIndex(slot), true, NULL);
+						 ReplicationSlotIndex(slot), InvalidOid, true, NULL);
 }
 
 /*
@@ -158,7 +158,7 @@ pgstat_drop_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	if (!pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						   ReplicationSlotIndex(slot)))
+						   ReplicationSlotIndex(slot), InvalidOid))
 		pgstat_request_entry_refs_gc();
 }
 
@@ -178,7 +178,7 @@ pgstat_fetch_replslot(NameData slotname)
 
 	if (idx != -1)
 		slotentry = (PgStat_StatReplSlotEntry *) pgstat_fetch_entry(PGSTAT_KIND_REPLSLOT,
-																	InvalidOid, idx);
+																	InvalidOid, idx, InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 
@@ -210,6 +210,7 @@ pgstat_replslot_from_serialized_name_cb(const NameData *name, PgStat_HashKey *ke
 	key->kind = PGSTAT_KIND_REPLSLOT;
 	key->dboid = InvalidOid;
 	key->objoid = idx;
+	key->relfile = InvalidOid;
 
 	return true;
 }
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index ec93bf6902..5eb6a0483a 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -429,10 +429,10 @@ pgstat_get_entry_ref_cached(PgStat_HashKey key, PgStat_EntryRef **entry_ref_p)
  * if the entry is newly created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, bool create,
-					 bool *created_entry)
+pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile,
+					 bool create, bool *created_entry)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shhashent;
 	PgStatShared_Common *shheader = NULL;
 	PgStat_EntryRef *entry_ref;
@@ -645,12 +645,12 @@ pgstat_unlock_entry(PgStat_EntryRef *entry_ref)
  */
 PgStat_EntryRef *
 pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-							bool nowait)
+							RelFileNumber relfile, bool nowait)
 {
 	PgStat_EntryRef *entry_ref;
 
 	/* find shared table stats entry corresponding to the local entry */
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, true, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, true, NULL);
 
 	/* lock the shared entry to protect the content, skip if failed */
 	if (!pgstat_lock_entry(entry_ref, nowait))
@@ -905,9 +905,9 @@ pgstat_drop_database_and_contents(Oid dboid)
  * pgstat_gc_entry_refs().
  */
 bool
-pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid};
+	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objoid = objoid,.relfile = relfile};
 	PgStatShared_HashEntry *shent;
 	bool		freed = true;
 
@@ -980,13 +980,12 @@ shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header,
  * Reset one variable-numbered stats entry.
  */
 void
-pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts)
+pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts)
 {
 	PgStat_EntryRef *entry_ref;
 
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
-
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL);
 	if (!entry_ref || entry_ref->shared_entry->dropped)
 		return;
 
diff --git a/src/backend/utils/activity/pgstat_subscription.c b/src/backend/utils/activity/pgstat_subscription.c
index e06c92727e..417c81246d 100644
--- a/src/backend/utils/activity/pgstat_subscription.c
+++ b/src/backend/utils/activity/pgstat_subscription.c
@@ -30,7 +30,7 @@ pgstat_report_subscription_error(Oid subid, bool is_apply_error)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 
 	if (is_apply_error)
@@ -49,7 +49,7 @@ pgstat_report_subscription_conflict(Oid subid, ConflictType type)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->conflict_count[type]++;
 }
@@ -62,12 +62,12 @@ pgstat_create_subscription(Oid subid)
 {
 	/* Ensures that stats are dropped if transaction rolls back */
 	pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
-								InvalidOid, subid);
+								InvalidOid, subid, InvalidOid);
 
 	/* Create and initialize the subscription stats entry */
-	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid,
+	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid,
 						 true, NULL);
-	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, 0);
+	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid, 0);
 }
 
 /*
@@ -79,7 +79,7 @@ void
 pgstat_drop_subscription(Oid subid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_SUBSCRIPTION,
-							  InvalidOid, subid);
+							  InvalidOid, subid, InvalidOid);
 }
 
 /*
@@ -90,7 +90,7 @@ PgStat_StatSubEntry *
 pgstat_fetch_stat_subscription(Oid subid)
 {
 	return (PgStat_StatSubEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index 1877d22f14..b25df5112b 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -30,7 +30,7 @@ static void AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool
 static void AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 											bool isCommit, int nestDepth);
 
-static PgStat_SubXactStatus *pgStatXactStack = NULL;
+PgStat_SubXactStatus *pgStatXactStack = NULL;
 
 
 /*
@@ -84,7 +84,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that dropped an object committed. Drop the stats
 			 * too.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 		else if (!isCommit && pending->is_create)
@@ -93,7 +93,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that created an object aborted. Drop the stats
 			 * associated with the object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 		}
 
@@ -105,6 +105,33 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 		pgstat_request_entry_refs_gc();
 }
 
+/*
+ * Remove a relfilenode stat from the list of stats to be dropped.
+ */
+void
+PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator)
+{
+	dlist_mutable_iter iter;
+
+	if (dclist_count(&xact_state->pending_drops) == 0)
+		return;
+
+	dclist_foreach_modify(iter, &xact_state->pending_drops)
+	{
+		PgStat_PendingDroppedStatsItem *pending =
+			dclist_container(PgStat_PendingDroppedStatsItem, node, iter.cur);
+		xl_xact_stats_item *it = &pending->item;
+
+		if (it->kind == PGSTAT_KIND_RELFILENODE && it->dboid == rlocator.dbOid
+			&& it->objoid == rlocator.spcOid && it->relfile == rlocator.relNumber)
+		{
+			dclist_delete_from(&xact_state->pending_drops, &pending->node);
+			pfree(pending);
+			return;
+		}
+	}
+}
+
 /*
  * Called from access/transam/xact.c at subtransaction commit/abort.
  */
@@ -158,7 +185,7 @@ AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 			 * Subtransaction creating a new stats object aborted. Drop the
 			 * stats object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 				not_freed_count++;
 			pfree(pending);
 		}
@@ -320,7 +347,11 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 	{
 		xl_xact_stats_item *it = &items[i];
 
-		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid))
+		/* leave it to pgstat_drop_transactional() in RelationDropStorage() */
+		if (it->kind == PGSTAT_KIND_RELFILENODE)
+			continue;
+
+		if (!pgstat_drop_entry(it->kind, it->dboid, it->objoid, it->relfile))
 			not_freed_count++;
 	}
 
@@ -329,7 +360,7 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 }
 
 static void
-create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool is_create)
+create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, bool is_create)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_SubXactStatus *xact_state;
@@ -342,6 +373,7 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
 	drop->item.kind = kind;
 	drop->item.dboid = dboid;
 	drop->item.objoid = objoid;
+	drop->item.relfile = relfile;
 
 	dclist_push_tail(&xact_state->pending_drops, &drop->node);
 }
@@ -354,18 +386,18 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, Oid objoid, bool
  * dropped.
  */
 void
-pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objoid, false, NULL))
+	if (pgstat_get_entry_ref(kind, dboid, objoid, relfile, false, NULL))
 	{
 		ereport(WARNING,
-				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u",
-					   (pgstat_get_kind_info(kind))->name, dboid, objoid));
+				errmsg("resetting existing statistics for kind %s, db=%u, oid=%u, relfile=%u",
+					   (pgstat_get_kind_info(kind))->name, dboid, objoid, relfile));
 
-		pgstat_reset(kind, dboid, objoid);
+		pgstat_reset(kind, dboid, objoid, relfile);
 	}
 
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ true);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ true);
 }
 
 /*
@@ -376,7 +408,7 @@ pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
  * alive.
  */
 void
-pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid)
+pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile)
 {
-	create_drop_transactional_internal(kind, dboid, objoid, /* create */ false);
+	create_drop_transactional_internal(kind, dboid, objoid, relfile, /* create */ false);
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 33c7b25560..2a53a8ee24 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -106,6 +106,30 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated)
 /* pg_stat_get_vacuum_count */
 PG_STAT_GET_RELENTRY_INT64(vacuum_count)
 
+#define PG_STAT_GET_RELFILEENTRY_INT64(stat)						\
+Datum															\
+CppConcat(pg_stat_get_relfilenode_,stat)(PG_FUNCTION_ARGS)					\
+{																\
+	Oid			dboid = PG_GETARG_OID(0);						\
+	Oid			 spcOid = PG_GETARG_OID(1);						\
+	RelFileNumber			 relfile = PG_GETARG_OID(2);						\
+	int64		result;											\
+	PgStat_StatRelFileNodeEntry *relfileentry;								\
+																\
+	if ((relfileentry = pgstat_fetch_stat_relfilenodeentry(dboid, spcOid, relfile)) == NULL)	\
+		result = 0;												\
+	else														\
+		result = (int64) (relfileentry->stat);						\
+																\
+	PG_RETURN_INT64(result);									\
+}
+
+/* pg_stat_get_relfilenode_blocks_written */
+PG_STAT_GET_RELFILEENTRY_INT64(blocks_written)
+
+/* pg_stat_get_blocks_written */
+PG_STAT_GET_RELENTRY_INT64(blocks_written)
+
 #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat)					\
 Datum															\
 CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)					\
@@ -1752,7 +1776,7 @@ pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 	Oid			taboid = PG_GETARG_OID(0);
 	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1762,7 +1786,7 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 {
 	Oid			funcoid = PG_GETARG_OID(0);
 
-	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid);
+	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1820,7 +1844,7 @@ pg_stat_reset_subscription_stats(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid subscription OID %u", subid)));
-		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 	}
 
 	PG_RETURN_VOID();
@@ -2047,7 +2071,9 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	char	   *stats_type = text_to_cstring(PG_GETARG_TEXT_P(0));
 	Oid			dboid = PG_GETARG_OID(1);
 	Oid			objoid = PG_GETARG_OID(2);
+	Oid			relfile = PG_GETARG_OID(3);
+
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
-	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid));
+	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objoid, relfile));
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index da661289c1..3614bae63c 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,7 +21,9 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "executor/tuptable.h"
+#include "pgstat.h"
 #include "storage/read_stream.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -1624,6 +1626,23 @@ table_relation_set_new_filelocator(Relation rel,
 								   TransactionId *freezeXid,
 								   MultiXactId *minmulti)
 {
+	PgStat_StatRelFileNodeEntry *relfileentry;
+	PgStat_StatTabEntry *tabentry = NULL;
+	PgStat_EntryRef *entry_ref = NULL;
+	PgStatShared_Relation *shtabentry;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_RELATION, MyDatabaseId, rel->rd_id, InvalidOid, false, NULL);
+	if (entry_ref)
+	{
+		shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
+		tabentry = &shtabentry->stats;
+	}
+
+	relfileentry = pgstat_fetch_stat_relfilenodeentry(rel->rd_locator.dbOid, rel->rd_locator.spcOid, rel->rd_locator.relNumber);
+
+	if (tabentry && relfileentry)
+		tabentry->blocks_written += relfileentry->blocks_written;
+
 	rel->rd_tableam->relation_set_new_filelocator(rel, newrlocator,
 												  persistence, freezeXid,
 												  minmulti);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 6d4439f052..3b9ed65ff6 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -284,6 +284,7 @@ typedef struct xl_xact_stats_item
 	int			kind;
 	Oid			dboid;
 	Oid			objoid;
+	RelFileNumber relfile;
 } xl_xact_stats_item;
 
 typedef struct xl_xact_stats_items
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ff5436acac..c098d58753 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5407,6 +5407,14 @@
   proname => 'pg_stat_get_tuples_updated', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_tuples_updated' },
+{ oid => '9280', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_relfilenode_blocks_written', provolatile => 's',
+  proparallel => 'r',
+  proargtypes => 'oid oid oid',
+  prorettype => 'int8',
+  proallargtypes => '{oid,oid,oid,int8}',
+  proargmodes => '{i,i,i,o}',
+  prosrc => 'pg_stat_get_relfilenode_blocks_written' },
 { oid => '1933', descr => 'statistics: number of tuples deleted',
   proname => 'pg_stat_get_tuples_deleted', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5446,6 +5454,10 @@
   proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8438', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_blocks_written', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_blocks_written' },
 { oid => '2781', descr => 'statistics: last manual vacuum time for a table',
   proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
   proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5532,7 +5544,7 @@
 
 { oid => '6230', descr => 'statistics: check if a stats object exists',
   proname => 'pg_stat_have_stats', provolatile => 'v', proparallel => 'r',
-  prorettype => 'bool', proargtypes => 'text oid oid',
+  prorettype => 'bool', proargtypes => 'text oid oid oid',
   prosrc => 'pg_stat_have_stats' },
 
 { oid => '6231', descr => 'statistics: information about subscription stats',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a..afb913a336 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -46,17 +47,18 @@
 /* stats for variable-numbered objects */
 #define PGSTAT_KIND_DATABASE	1	/* database-wide statistics */
 #define PGSTAT_KIND_RELATION	2	/* per-table statistics */
-#define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
-#define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
-#define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_RELFILENODE 3   /* per-relfilenode statistics */
+#define PGSTAT_KIND_FUNCTION	4	/* per-function statistics */
+#define PGSTAT_KIND_REPLSLOT	5	/* per-slot statistics */
+#define PGSTAT_KIND_SUBSCRIPTION	6	/* per-subscription statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -450,6 +452,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
 
 	TimestampTz last_vacuum_time;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
@@ -461,6 +464,13 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+typedef struct PgStat_StatRelFileNodeEntry
+{
+	PgStat_Counter blocks_fetched;
+	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
+} PgStat_StatRelFileNodeEntry;
+
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -511,7 +521,7 @@ extern long pgstat_report_stat(bool force);
 extern void pgstat_force_next_flush(void);
 
 extern void pgstat_reset_counters(void);
-extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_reset(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_reset_of_kind(PgStat_Kind kind);
 
 /* stats accessors */
@@ -520,7 +530,7 @@ extern TimestampTz pgstat_get_stat_snapshot_timestamp(bool *have_snapshot);
 
 /* helpers */
 extern PgStat_Kind pgstat_get_kind_from_str(char *kind_str);
-extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
@@ -629,6 +639,10 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
+extern void pgstat_report_relfilenode_blks_written(RelFileLocator locator);
+extern void pgstat_report_relfilenode_buffer_read(Relation reln);
+extern void pgstat_report_relfilenode_buffer_hit(Relation reln);
+
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
  * pgstat_assoc_relation() to do so. See its comment for why this is done
@@ -688,6 +702,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
+extern PgStat_StatRelFileNodeEntry *pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
 														   Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index bba90e898d..3f5a705789 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -53,7 +53,8 @@ typedef struct PgStat_HashKey
 {
 	PgStat_Kind kind;			/* statistics entry kind */
 	Oid			dboid;			/* database ID. InvalidOid for shared objects. */
-	Oid			objoid;			/* object ID, either table or function. */
+	Oid			objoid;			/* object ID, either table or function or tablespace. */
+	RelFileNumber relfile;		/* relfilenumber for RelFileLocator. */
 } PgStat_HashKey;
 
 /*
@@ -409,6 +410,12 @@ typedef struct PgStatShared_Relation
 	PgStat_StatTabEntry stats;
 } PgStatShared_Relation;
 
+typedef struct PgStatShared_RelFileNode
+{
+	PgStatShared_Common header;
+	PgStat_StatRelFileNodeEntry stats;
+} PgStatShared_RelFileNode;
+
 typedef struct PgStatShared_Function
 {
 	PgStatShared_Common header;
@@ -547,6 +554,9 @@ static inline void *pgstat_get_entry_data(PgStat_Kind kind, PgStatShared_Common
 static inline void *pgstat_get_custom_shmem_data(PgStat_Kind kind);
 static inline void *pgstat_get_custom_snapshot_data(PgStat_Kind kind);
 
+extern PgStat_SubXactStatus *pgStatXactStack;
+extern void PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator);
+
 
 /*
  * Functions in pgstat.c
@@ -563,10 +573,12 @@ extern void pgstat_assert_is_up(void);
 #endif
 
 extern void pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref);
-extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, bool *created_entry);
-extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
+												  Oid objoid, RelFileNumber relfile,
+												  bool *created_entry);
+extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
-extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_snapshot_fixed(PgStat_Kind kind);
 
 
@@ -641,6 +653,7 @@ extern void AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern bool pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 
 
@@ -661,15 +674,16 @@ extern void pgstat_attach_shmem(void);
 extern void pgstat_detach_shmem(void);
 
 extern PgStat_EntryRef *pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, Oid objoid,
-											 bool create, bool *created_entry);
+											 RelFileNumber relfile, bool create,
+											 bool *created_entry);
 extern bool pgstat_lock_entry(PgStat_EntryRef *entry_ref, bool nowait);
 extern bool pgstat_lock_entry_shared(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_unlock_entry(PgStat_EntryRef *entry_ref);
-extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 extern void pgstat_drop_all_entries(void);
 extern PgStat_EntryRef *pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, Oid objoid,
-													bool nowait);
-extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, TimestampTz ts);
+													RelFileNumber relfile, bool nowait);
+extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile, TimestampTz ts);
 extern void pgstat_reset_entries_of_kind(PgStat_Kind kind, TimestampTz ts);
 extern void pgstat_reset_matching_entries(bool (*do_reset) (PgStatShared_HashEntry *, Datum),
 										  Datum match_data,
@@ -718,8 +732,8 @@ extern void pgstat_subscription_reset_timestamp_cb(PgStatShared_Common *header,
  */
 
 extern PgStat_SubXactStatus *pgstat_get_xact_stack_level(int nest_level);
-extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
-extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid);
+extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
+extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, Oid objoid, RelFileNumber relfile);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 93a7209f69..f9988b5028 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -40,10 +40,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 my $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -64,10 +64,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -81,10 +81,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -95,10 +95,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -114,10 +114,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -130,10 +130,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -292,10 +292,10 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objoid) = @_;
+	my ($kind, $dboid, $objoid, $relfile) = @_;
 
 	return $node->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid)");
+		"SELECT pg_stat_have_stats('$kind', $dboid, $objoid, $relfile)");
 }
 
 sub overwrite_file
diff --git a/src/test/recovery/t/030_stats_cleanup_replica.pl b/src/test/recovery/t/030_stats_cleanup_replica.pl
index 74b516cc7c..317df24c4f 100644
--- a/src/test/recovery/t/030_stats_cleanup_replica.pl
+++ b/src/test/recovery/t/030_stats_cleanup_replica.pl
@@ -179,9 +179,9 @@ sub test_standby_func_tab_stats_status
 	my %stats;
 
 	$stats{rel} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid)");
+		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid, 0)");
 	$stats{func} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('function', $dboid, $funcoid)");
+		"SELECT pg_stat_have_stats('function', $dboid, $funcoid, 0)");
 
 	is_deeply(\%stats, \%expected, "$sect: standby stats as expected");
 
@@ -194,7 +194,7 @@ sub test_standby_db_stats_status
 	my ($connect_db, $dboid, $present) = @_;
 
 	is( $node_standby->safe_psql(
-			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0)"),
+			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0, 0)"),
 		$present,
 		"$sect: standby db stats as expected");
 }
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae..a9b3f36cd9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2340,6 +2340,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
     (pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)) AS heap_blks_read,
+    (pg_stat_get_blocks_written(c.oid) + pg_stat_get_relfilenode_blocks_written(d.oid,
+        CASE
+            WHEN (c.reltablespace <> (0)::oid) THEN c.reltablespace
+            ELSE d.dattablespace
+        END, c.relfilenode)) AS heap_blks_written,
     pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
     i.idx_blks_read,
     i.idx_blks_hit,
@@ -2347,7 +2352,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
     x.idx_blks_read AS tidx_blks_read,
     x.idx_blks_hit AS tidx_blks_hit
-   FROM ((((pg_class c
+   FROM pg_database d,
+    ((((pg_class c
      LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2358,7 +2364,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
             (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
            FROM pg_index
           WHERE (pg_index.indrelid = t.oid)) x ON (true))
-  WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]));
+  WHERE ((c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) AND (d.datname = current_database()));
 pg_statio_sys_indexes| SELECT relid,
     indexrelid,
     schemaname,
@@ -2379,6 +2385,7 @@ pg_statio_sys_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
@@ -2408,6 +2415,7 @@ pg_statio_user_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..eff0c9372c 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1111,23 +1111,23 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 ERROR:  invalid statistics kind: "zaphod"
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
  pg_stat_have_stats 
 --------------------
  f
 (1 row)
 
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1144,21 +1144,21 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1174,14 +1174,14 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1196,7 +1196,7 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1204,7 +1204,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1212,7 +1212,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1220,7 +1220,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1228,7 +1228,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1513,7 +1513,7 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 (1 row)
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..5a40779989 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -539,12 +539,12 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 -- db stats have objoid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
 
 -- pg_stat_have_stats returns true for committed index creation
 CREATE table stats_test_tab1 as select generate_series(1,10) a;
@@ -552,40 +552,40 @@ CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 SET enable_seqscan TO off;
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for rolled back index creation
 BEGIN;
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for reindex CONCURRENTLY
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- put enable_seqscan back to on
 SET enable_seqscan TO on;
@@ -759,7 +759,7 @@ SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
 SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
 SELECT pg_stat_reset_shared('io');
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 6b6a5b0b1b..89ebf5aa2c 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -290,7 +290,7 @@ $node_subscriber->safe_psql($db, qq(DROP SUBSCRIPTION $sub1_name));
 
 # Subscription stats for sub1 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub1_name' should be removed.));
 
@@ -309,7 +309,7 @@ DROP SUBSCRIPTION $sub2_name;
 
 # Subscription stats for sub2 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e9ebddde24..ab315e16dd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2126,6 +2126,7 @@ PgStatShared_InjectionPoint
 PgStatShared_InjectionPointFixed
 PgStatShared_IO
 PgStatShared_Relation
+PgStatShared_RelFileNode
 PgStatShared_ReplSlot
 PgStatShared_SLRU
 PgStatShared_Subscription
-- 
2.34.1

#23Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#22)
Re: relfilenode statistics

Hi,

On Tue, Sep 10, 2024 at 05:30:32AM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Sep 05, 2024 at 04:48:36AM +0000, Bertrand Drouvot wrote:

Please find attached a mandatory rebase.

In passing, checking if based on the previous discussion (and given that we
don't have the relation OID when writing buffers out) you see another approach
that the one this patch is implementing?

Attached v5, mandatory rebase due to recent changes in the stats area.

Attached v6, mandatory rebase due to b14e9ce7d5.

Note that 0001 is the same as the one proposed in [0]/messages/by-id/Zyb7RW1y9dVfO0UH@ip-10-97-1-34.eu-west-3.compute.internal and needs to be applied
here to make the stats machinery working as expected with the relfile added in
the stats hash key (though it deserves its own dedicated thread as explained in [0]/messages/by-id/Zyb7RW1y9dVfO0UH@ip-10-97-1-34.eu-west-3.compute.internal).

Don't look at 0001 and 0002 as I think we need more design discussion.

=== Sum up the feedback received up-thread

I re-read this thread and it appears that there is 3 main remarks:

R1: Andres did propose to add stuff like "n_dead_tup" (see [1]/messages/by-id/20240607033806.6gwgolihss72cj6r@awork3.anarazel.de), to provide
even more benefits.

R2: Robert mentioned ([2]/messages/by-id/CA+TgmoZtwT6h=nyuQ1J9GNSrRyhf0fv7Ai6FzO=bH0C9Bf6tew@mail.gmail.com) that we need to decide between "sometimes I
don't know the relation OID so I want to use the relfilenumber
instead, without changing the user experience" and "some
of these stats actually properly pertain to the relfilenode rather
than the relation so I want to associate them with the right object
and that will affect how the user sees things".

R3: Michael had concerns about adding a new field (the relfile) in the hash key,
see [3]/messages/by-id/Zo9j69GhexDpeV4k@paquier.xyz.

=== My thoughts:

While my initial idea was that the relfilenode stats would deal only with I/O
activities it also looks like that it would be benficial to add sutff like
"n_dead_tup".

Then I think we should go with the "sometimes I don't know the relation OID
so I want to use the relfilenumber instead, without changing the user experience"
way.

Regarding the concern about adding a new field in the hash key, I think we can't
avoid that as we don't have the relation OID when writing buffers out.

=== Moving forward

I would go for trying to store everything that is "relation" related into the
relfilenode stats (that will then include n_dead_tup among other things) and
try to hide the distinction between relfilenode stats and relation stats from
the user.

Thoughts of moving forward that way?

[0]: /messages/by-id/Zyb7RW1y9dVfO0UH@ip-10-97-1-34.eu-west-3.compute.internal
[1]: /messages/by-id/20240607033806.6gwgolihss72cj6r@awork3.anarazel.de
[2]: /messages/by-id/CA+TgmoZtwT6h=nyuQ1J9GNSrRyhf0fv7Ai6FzO=bH0C9Bf6tew@mail.gmail.com
[3]: /messages/by-id/Zo9j69GhexDpeV4k@paquier.xyz

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#24Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#23)
2 attachment(s)
Re: relfilenode statistics

On Mon, Nov 04, 2024 at 09:27:38AM +0000, Bertrand Drouvot wrote:

On Tue, Sep 10, 2024 at 05:30:32AM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Sep 05, 2024 at 04:48:36AM +0000, Bertrand Drouvot wrote:

Please find attached a mandatory rebase.

In passing, checking if based on the previous discussion (and given that we
don't have the relation OID when writing buffers out) you see another approach
that the one this patch is implementing?

Attached v5, mandatory rebase due to recent changes in the stats area.

Attached v6, mandatory rebase due to b14e9ce7d5.

Note that 0001 is the same as the one proposed in [0] and needs to be applied
here to make the stats machinery working as expected with the relfile added in
the stats hash key (though it deserves its own dedicated thread as explained in [0]).

Don't look at 0001 and 0002 as I think we need more design discussion.

=== Sum up the feedback received up-thread

I re-read this thread and it appears that there is 3 main remarks:

R1: Andres did propose to add stuff like "n_dead_tup" (see [1]), to provide
even more benefits.

R2: Robert mentioned ([2]) that we need to decide between "sometimes I
don't know the relation OID so I want to use the relfilenumber
instead, without changing the user experience" and "some
of these stats actually properly pertain to the relfilenode rather
than the relation so I want to associate them with the right object
and that will affect how the user sees things".

R3: Michael had concerns about adding a new field (the relfile) in the hash key,
see [3].

=== My thoughts:

While my initial idea was that the relfilenode stats would deal only with I/O
activities it also looks like that it would be benficial to add sutff like
"n_dead_tup".

Then I think we should go with the "sometimes I don't know the relation OID
so I want to use the relfilenumber instead, without changing the user experience"
way.

Regarding the concern about adding a new field in the hash key, I think we can't
avoid that as we don't have the relation OID when writing buffers out.

=== Moving forward

I would go for trying to store everything that is "relation" related into the
relfilenode stats (that will then include n_dead_tup among other things) and
try to hide the distinction between relfilenode stats and relation stats from
the user.

Thoughts of moving forward that way?

[0]: /messages/by-id/Zyb7RW1y9dVfO0UH@ip-10-97-1-34.eu-west-3.compute.internal
[1]: /messages/by-id/20240607033806.6gwgolihss72cj6r@awork3.anarazel.de
[2]: /messages/by-id/CA+TgmoZtwT6h=nyuQ1J9GNSrRyhf0fv7Ai6FzO=bH0C9Bf6tew@mail.gmail.com
[3]: /messages/by-id/Zo9j69GhexDpeV4k@paquier.xyz

+ Andres and Robert as both are quoted in my previous message.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-Clear-padding-in-PgStat_HashKey-keys.patchtext/x-diff; charset=us-asciiDownload
From 0dbb3dc1bd66c63730696b69fbe768024c8bfb04 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Sat, 2 Nov 2024 14:21:18 +0000
Subject: [PATCH v6 1/2] Clear padding in PgStat_HashKey keys

PgStat_HashKey keys are currently initialized in a way that could result in random
data in the padding bytes (if there was padding in PgStat_HashKey which is not
the case currently).

We are using sizeof(PgStat_HashKey) in pgstat_cmp_hash_key() and we compute the
hash hash key in pgstat_hash_hash_key() using the PgStat_HashKey struct size as
input. So, we have to ensure that no random data can be stored in the padding
bytes (if any) of a PgStat_HashKey key.
---
 src/backend/utils/activity/pgstat.c       |  3 +++
 src/backend/utils/activity/pgstat_shmem.c | 18 ++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index be48432cc3..ea8c5691e8 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -938,6 +938,9 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
 
 	pgstat_prep_snapshot();
 
+	/* clear padding */
+	memset(&key, 0, sizeof(struct PgStat_HashKey));
+
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objid = objid;
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index a09c6fee05..c1b7ff76b1 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -432,11 +432,18 @@ PgStat_EntryRef *
 pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
 					 bool *created_entry)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objid = objid};
+	PgStat_HashKey key;
 	PgStatShared_HashEntry *shhashent;
 	PgStatShared_Common *shheader = NULL;
 	PgStat_EntryRef *entry_ref;
 
+	/* clear padding */
+	memset(&key, 0, sizeof(struct PgStat_HashKey));
+
+	key.kind = kind;
+	key.dboid = dboid;
+	key.objid = objid;
+
 	/*
 	 * passing in created_entry only makes sense if we possibly could create
 	 * entry.
@@ -908,10 +915,17 @@ pgstat_drop_database_and_contents(Oid dboid)
 bool
 pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
 {
-	PgStat_HashKey key = {.kind = kind,.dboid = dboid,.objid = objid};
+	PgStat_HashKey key;
 	PgStatShared_HashEntry *shent;
 	bool		freed = true;
 
+	/* clear padding */
+	memset(&key, 0, sizeof(struct PgStat_HashKey));
+
+	key.kind = kind;
+	key.dboid = dboid;
+	key.objid = objid;
+
 	/* delete local reference */
 	if (pgStatEntryRefHash)
 	{
-- 
2.34.1

v6-0002-Provide-relfilenode-statistics.patchtext/x-diff; charset=utf-8Download
From a815f124e4283d4f0eb1ce6e58da81c1dcd5bf8e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Nov 2023 02:30:01 +0000
Subject: [PATCH v6 2/2] Provide relfilenode statistics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We currently don’t have writes counters for relations.
The reason is that we don’t have the relation OID when writing buffers out.
Tracking writes per relfilenode would allow us to track/consolidate writes per
relation.

relfilenode stats is also beneficial for the "Split index and table statistics
into different types of stats" work in progress: it would allow us to avoid
additional branches in some situations.

=== Remarks ===

This is a POC patch. There is still work to do: there is more places we should
add relfilenode counters, create more APIS to retrieve the relfilenode stats,
the patch takes care of rewrite generated by TRUNCATE but there is more to
care about like CLUSTER,VACUUM FULL.

The new logic to retrieve stats in pg_statio_all_tables has been implemented
only for the new blocks_written stat (we'd need to do the same for the existing
buffer read / buffer hit if we agree on the approach implemented here).

The goal of this patch is to start the discussion and agree on the design before
moving forward.
---
 src/backend/access/rmgrdesc/xactdesc.c        |   5 +-
 src/backend/catalog/storage.c                 |   8 ++
 src/backend/catalog/system_functions.sql      |   2 +-
 src/backend/catalog/system_views.sql          |   5 +-
 src/backend/postmaster/checkpointer.c         |   5 +
 src/backend/storage/buffer/bufmgr.c           |   6 +-
 src/backend/storage/smgr/md.c                 |   8 ++
 src/backend/utils/activity/pgstat.c           |  56 +++++----
 src/backend/utils/activity/pgstat_database.c  |  12 +-
 src/backend/utils/activity/pgstat_function.c  |  13 +-
 src/backend/utils/activity/pgstat_relation.c  | 112 ++++++++++++++++--
 src/backend/utils/activity/pgstat_replslot.c  |  13 +-
 src/backend/utils/activity/pgstat_shmem.c     |  19 +--
 .../utils/activity/pgstat_subscription.c      |  14 +--
 src/backend/utils/activity/pgstat_xact.c      |  66 ++++++++---
 src/backend/utils/adt/pgstatfuncs.c           |  34 +++++-
 src/include/access/tableam.h                  |  19 +++
 src/include/access/xact.h                     |   1 +
 src/include/catalog/pg_proc.dat               |  14 ++-
 src/include/pgstat.h                          |  37 ++++--
 src/include/utils/pgstat_internal.h           |  33 ++++--
 src/test/recovery/t/029_stats_restart.pl      |  40 +++----
 .../recovery/t/030_stats_cleanup_replica.pl   |   6 +-
 src/test/regress/expected/rules.out           |  12 +-
 src/test/regress/expected/stats.out           |  30 ++---
 src/test/regress/sql/stats.sql                |  30 ++---
 src/test/subscription/t/026_stats.pl          |   4 +-
 src/tools/pgindent/typedefs.list              |   1 +
 28 files changed, 443 insertions(+), 162 deletions(-)
   4.2% src/backend/catalog/
  48.9% src/backend/utils/activity/
   6.0% src/backend/utils/adt/
   3.7% src/backend/
   3.0% src/include/access/
   3.1% src/include/catalog/
   5.1% src/include/utils/
   6.3% src/include/
  11.2% src/test/recovery/t/
   5.1% src/test/regress/expected/

diff --git a/src/backend/access/rmgrdesc/xactdesc.c b/src/backend/access/rmgrdesc/xactdesc.c
index 889cb955c1..5a15cdd460 100644
--- a/src/backend/access/rmgrdesc/xactdesc.c
+++ b/src/backend/access/rmgrdesc/xactdesc.c
@@ -322,10 +322,11 @@ xact_desc_stats(StringInfo buf, const char *label,
 			uint64		objid =
 				((uint64) dropped_stats[i].objid_hi) << 32 | dropped_stats[i].objid_lo;
 
-			appendStringInfo(buf, " %d/%u/%llu",
+			appendStringInfo(buf, " %d/%u/%llu/%u",
 							 dropped_stats[i].kind,
 							 dropped_stats[i].dboid,
-							 (unsigned long long) objid);
+							 (unsigned long long) objid,
+							 dropped_stats[i].relfile);
 		}
 	}
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index f56b3cc0f2..db6107cd90 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -33,6 +33,7 @@
 #include "storage/smgr.h"
 #include "utils/hsearch.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 
 /* GUC variables */
@@ -152,6 +153,7 @@ RelationCreateStorage(RelFileLocator rlocator, char relpersistence,
 	if (needs_wal)
 		log_smgrcreate(&srel->smgr_rlocator.locator, MAIN_FORKNUM);
 
+	pgstat_create_transactional(PGSTAT_KIND_RELFILENODE, rlocator.dbOid, rlocator.spcOid, rlocator.relNumber);
 	/*
 	 * Add the relation to the list of stuff to delete at abort, if we are
 	 * asked to do so.
@@ -227,6 +229,8 @@ RelationDropStorage(Relation rel)
 	 * for now I'll keep the logic simple.
 	 */
 
+	pgstat_drop_transactional(PGSTAT_KIND_RELFILENODE, rel->rd_locator.dbOid, rel->rd_locator.spcOid,  rel->rd_locator.relNumber);
+
 	RelationCloseSmgr(rel);
 }
 
@@ -253,6 +257,9 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 	PendingRelDelete *pending;
 	PendingRelDelete *prev;
 	PendingRelDelete *next;
+	PgStat_SubXactStatus *xact_state;
+
+	xact_state = pgStatXactStack;
 
 	prev = NULL;
 	for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -267,6 +274,7 @@ RelationPreserveStorage(RelFileLocator rlocator, bool atCommit)
 			else
 				pendingDeletes = next;
 			pfree(pending);
+			PgStat_RemoveRelFileNodeFromDroppedStats(xact_state, rlocator);
 			/* prev does not change */
 		}
 		else
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index 20d3b9b73f..3b9e822de1 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -718,7 +718,7 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM publ
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
-REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
+REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8, oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_subscription_stats(oid) FROM public;
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3456b821bc..7e582399bb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -746,6 +746,7 @@ CREATE VIEW pg_statio_all_tables AS
             C.relname AS relname,
             pg_stat_get_blocks_fetched(C.oid) -
                     pg_stat_get_blocks_hit(C.oid) AS heap_blks_read,
+			pg_stat_get_blocks_written(C.oid) + pg_stat_get_relfilenode_blocks_written(d.oid, CASE WHEN C.reltablespace <> 0 THEN C.reltablespace ELSE d.dattablespace END, C.relfilenode) AS heap_blks_written,
             pg_stat_get_blocks_hit(C.oid) AS heap_blks_hit,
             I.idx_blks_read AS idx_blks_read,
             I.idx_blks_hit AS idx_blks_hit,
@@ -754,7 +755,7 @@ CREATE VIEW pg_statio_all_tables AS
             pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
             X.idx_blks_read AS tidx_blks_read,
             X.idx_blks_hit AS tidx_blks_hit
-    FROM pg_class C LEFT JOIN
+    FROM pg_database d, pg_class C LEFT JOIN
             pg_class T ON C.reltoastrelid = T.oid
             LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
             LEFT JOIN LATERAL (
@@ -771,7 +772,7 @@ CREATE VIEW pg_statio_all_tables AS
                      sum(pg_stat_get_blocks_hit(indexrelid))::bigint
                      AS idx_blks_hit
               FROM pg_index WHERE indrelid = T.oid ) X ON true
-    WHERE C.relkind IN ('r', 't', 'm');
+    WHERE C.relkind IN ('r', 't', 'm') AND d.datname = current_database();
 
 CREATE VIEW pg_statio_sys_tables AS
     SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 982572a75d..808c9d52d5 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -530,6 +530,11 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 		/* Report pending statistics to the cumulative stats system */
 		pgstat_report_checkpointer();
 		pgstat_report_wal(true);
+		/*
+		 *  No need to check for transaction state in checkpointer before
+		 *  calling pgstat_report_stat().
+		 */
+		pgstat_report_stat(true);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0f02bf62fa..38cd5dddb6 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1159,9 +1159,9 @@ PinBufferForBlock(Relation rel,
 		 * WaitReadBuffers() (so, not for hits, and not for buffers that are
 		 * zeroed instead), the per-relation stats always count them.
 		 */
-		pgstat_count_buffer_read(rel);
+		pgstat_report_relfilenode_buffer_read(rel);
 		if (*foundPtr)
-			pgstat_count_buffer_hit(rel);
+			pgstat_report_relfilenode_buffer_hit(rel);
 	}
 	if (*foundPtr)
 	{
@@ -3895,6 +3895,8 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 
 	pgBufferUsage.shared_blks_written++;
 
+	pgstat_report_relfilenode_blks_written(reln->smgr_rlocator.locator);
+
 	/*
 	 * Mark the buffer as clean (unless BM_JUST_DIRTIED has become set) and
 	 * end the BM_IO_IN_PROGRESS state.
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index cc8a80ee96..0f71e9016a 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -38,6 +38,7 @@
 #include "storage/smgr.h"
 #include "storage/sync.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 
 /*
  * The magnetic disk storage manager keeps track of open file
@@ -1468,12 +1469,16 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 {
 	SMgrRelation *srels;
 	int			i;
+	int         not_freed_count = 0;
 
 	srels = palloc(sizeof(SMgrRelation) * ndelrels);
 	for (i = 0; i < ndelrels; i++)
 	{
 		SMgrRelation srel = smgropen(delrels[i], INVALID_PROC_NUMBER);
 
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELFILENODE, delrels[i].dbOid, delrels[i].spcOid, delrels[i].relNumber))
+			not_freed_count++;
+
 		if (isRedo)
 		{
 			ForkNumber	fork;
@@ -1484,6 +1489,9 @@ DropRelationFiles(RelFileLocator *delrels, int ndelrels, bool isRedo)
 		srels[i] = srel;
 	}
 
+	if (not_freed_count > 0)
+		pgstat_request_entry_refs_gc();
+
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
 	for (i = 0; i < ndelrels; i++)
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index ea8c5691e8..151ba38aa7 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -307,6 +307,19 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
+	[PGSTAT_KIND_RELFILENODE] = {
+		.name = "relfilenode",
+
+		.fixed_amount = false,
+
+		.shared_size = sizeof(PgStatShared_RelFileNode),
+		.shared_data_off = offsetof(PgStatShared_RelFileNode, stats),
+		.shared_data_len = sizeof(((PgStatShared_RelFileNode *) 0)->stats),
+		.pending_size = sizeof(PgStat_StatRelFileNodeEntry),
+
+		.flush_pending_cb = pgstat_relfilenode_flush_cb,
+	},
+
 	[PGSTAT_KIND_FUNCTION] = {
 		.name = "function",
 
@@ -756,7 +769,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / relfilenode / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush of fixed-numbered stats */
@@ -845,7 +858,7 @@ pgstat_reset_counters(void)
  * GRANT system.
  */
 void
-pgstat_reset(PgStat_Kind kind, Oid dboid, uint64 objid)
+pgstat_reset(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile)
 {
 	const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 	TimestampTz ts = GetCurrentTimestamp();
@@ -854,7 +867,7 @@ pgstat_reset(PgStat_Kind kind, Oid dboid, uint64 objid)
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
 
 	/* reset the "single counter" */
-	pgstat_reset_entry(kind, dboid, objid, ts);
+	pgstat_reset_entry(kind, dboid, objid, relfile, ts);
 
 	if (!kind_info->accessed_across_databases)
 		pgstat_reset_database_timestamp(dboid, ts);
@@ -925,7 +938,7 @@ pgstat_clear_snapshot(void)
 }
 
 void *
-pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
+pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile)
 {
 	PgStat_HashKey key;
 	PgStat_EntryRef *entry_ref;
@@ -944,6 +957,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objid = objid;
+	key.relfile = relfile;
 
 	/* if we need to build a full snapshot, do so */
 	if (pgstat_fetch_consistency == PGSTAT_FETCH_CONSISTENCY_SNAPSHOT)
@@ -969,7 +983,7 @@ pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
 
 	pgStatLocal.snapshot.mode = pgstat_fetch_consistency;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->shared_entry->dropped)
 	{
@@ -1038,13 +1052,13 @@ pgstat_get_stat_snapshot_timestamp(bool *have_snapshot)
 }
 
 bool
-pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
+pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile)
 {
 	/* fixed-numbered stats always exist */
 	if (pgstat_get_kind_info(kind)->fixed_amount)
 		return true;
 
-	return pgstat_get_entry_ref(kind, dboid, objid, false, NULL) != NULL;
+	return pgstat_get_entry_ref(kind, dboid, objid, relfile, false, NULL) != NULL;
 }
 
 /*
@@ -1259,7 +1273,8 @@ pgstat_build_snapshot_fixed(PgStat_Kind kind)
  * created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid, bool *created_entry)
+pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid,
+						  RelFileNumber relfile, bool *created_entry)
 {
 	PgStat_EntryRef *entry_ref;
 
@@ -1274,7 +1289,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid, bool *creat
 								  ALLOCSET_SMALL_SIZES);
 	}
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objid,
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, relfile,
 									 true, created_entry);
 
 	if (entry_ref->pending == NULL)
@@ -1297,11 +1312,11 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid, bool *creat
  * that it shouldn't be needed.
  */
 PgStat_EntryRef *
-pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
+pgstat_fetch_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, relfile, false, NULL);
 
 	if (entry_ref == NULL || entry_ref->pending == NULL)
 		return NULL;
@@ -1330,7 +1345,7 @@ pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref)
 }
 
 /*
- * Flush out pending stats for database objects (databases, relations,
+ * Flush out pending stats for database objects (databases, relations, relfilenodes,
  * functions).
  */
 static bool
@@ -1650,9 +1665,10 @@ pgstat_write_statsfile(XLogRecPtr redo)
 		 */
 		if (!pgstat_is_kind_valid(ps->key.kind))
 		{
-			elog(WARNING, "found unknown stats entry %u/%u/%llu",
+			elog(WARNING, "found unknown stats entry %u/%u/%llu/%u",
 				 ps->key.kind, ps->key.dboid,
-				 (unsigned long long) ps->key.objid);
+				 (unsigned long long) ps->key.objid,
+				 ps->key.relfile);
 			continue;
 		}
 
@@ -1888,9 +1904,9 @@ pgstat_read_statsfile(XLogRecPtr redo)
 
 						if (!pgstat_is_kind_valid(key.kind))
 						{
-							elog(WARNING, "invalid stats kind for entry %u/%u/%llu of type %c",
+							elog(WARNING, "invalid stats kind for entry %u/%u/%llu/%u of type %c",
 								 key.kind, key.dboid,
-								 (unsigned long long) key.objid, t);
+								 (unsigned long long) key.objid, key.relfile, t);
 							goto error;
 						}
 					}
@@ -1961,9 +1977,9 @@ pgstat_read_statsfile(XLogRecPtr redo)
 					if (found)
 					{
 						dshash_release_lock(pgStatLocal.shared_hash, p);
-						elog(WARNING, "found duplicate stats entry %u/%u/%llu of type %c",
+						elog(WARNING, "found duplicate stats entry %u/%u/%llu/%u of type %c",
 							 key.kind, key.dboid,
-							 (unsigned long long) key.objid, t);
+							 (unsigned long long) key.objid, key.relfile, t);
 						goto error;
 					}
 
@@ -1974,9 +1990,9 @@ pgstat_read_statsfile(XLogRecPtr redo)
 									pgstat_get_entry_data(key.kind, header),
 									pgstat_get_entry_len(key.kind)))
 					{
-						elog(WARNING, "could not read data for entry %u/%u/%llu of type %c",
+						elog(WARNING, "could not read data for entry %u/%u/%llu/%u of type %c",
 							 key.kind, key.dboid,
-							 (unsigned long long) key.objid, t);
+							 (unsigned long long) key.objid, key.relfile, t);
 						goto error;
 					}
 
diff --git a/src/backend/utils/activity/pgstat_database.c b/src/backend/utils/activity/pgstat_database.c
index 29bc090974..cf77f2dbdb 100644
--- a/src/backend/utils/activity/pgstat_database.c
+++ b/src/backend/utils/activity/pgstat_database.c
@@ -43,7 +43,7 @@ static PgStat_Counter pgLastSessionReportTime = 0;
 void
 pgstat_drop_database(Oid databaseid)
 {
-	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid);
+	pgstat_drop_transactional(PGSTAT_KIND_DATABASE, databaseid, InvalidOid, InvalidOid);
 }
 
 /*
@@ -66,7 +66,7 @@ pgstat_report_autovac(Oid dboid)
 	 * operation so it doesn't matter if we get blocked here a little.
 	 */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE,
-											dboid, InvalidOid, false);
+											dboid, InvalidOid, InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) entry_ref->shared_stats;
 	dbentry->stats.last_autovac_time = GetCurrentTimestamp();
@@ -150,7 +150,7 @@ pgstat_report_checksum_failures_in_db(Oid dboid, int failurecount)
 	 * common enough for that to be a problem.
 	 */
 	entry_ref =
-		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, false);
+		pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid, false);
 
 	sharedent = (PgStatShared_Database *) entry_ref->shared_stats;
 	sharedent->stats.checksum_failures += failurecount;
@@ -242,7 +242,7 @@ PgStat_StatDBEntry *
 pgstat_fetch_stat_dbentry(Oid dboid)
 {
 	return (PgStat_StatDBEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid);
+		pgstat_fetch_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid, InvalidOid);
 }
 
 void
@@ -341,7 +341,7 @@ pgstat_prep_database_pending(Oid dboid)
 	Assert(!OidIsValid(dboid) || OidIsValid(MyDatabaseId));
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_DATABASE, dboid, InvalidOid,
-										  NULL);
+										  InvalidOid, NULL);
 
 	return entry_ref->pending;
 }
@@ -357,7 +357,7 @@ pgstat_reset_database_timestamp(Oid dboid, TimestampTz ts)
 	PgStatShared_Database *dbentry;
 
 	dbref = pgstat_get_entry_ref_locked(PGSTAT_KIND_DATABASE, MyDatabaseId, InvalidOid,
-										false);
+										InvalidOid, false);
 
 	dbentry = (PgStatShared_Database *) dbref->shared_stats;
 	dbentry->stats.stat_reset_timestamp = ts;
diff --git a/src/backend/utils/activity/pgstat_function.c b/src/backend/utils/activity/pgstat_function.c
index d26da551a4..440e44e300 100644
--- a/src/backend/utils/activity/pgstat_function.c
+++ b/src/backend/utils/activity/pgstat_function.c
@@ -46,7 +46,8 @@ pgstat_create_function(Oid proid)
 {
 	pgstat_create_transactional(PGSTAT_KIND_FUNCTION,
 								MyDatabaseId,
-								proid);
+								proid,
+								InvalidOid);
 }
 
 /*
@@ -61,7 +62,8 @@ pgstat_drop_function(Oid proid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_FUNCTION,
 							  MyDatabaseId,
-							  proid);
+							  proid,
+							  InvalidOid);
 }
 
 /*
@@ -86,6 +88,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_FUNCTION,
 										  MyDatabaseId,
 										  fcinfo->flinfo->fn_oid,
+										  InvalidOid,
 										  &created_entry);
 
 	/*
@@ -113,7 +116,7 @@ pgstat_init_function_usage(FunctionCallInfo fcinfo,
 		if (!SearchSysCacheExists1(PROCOID, ObjectIdGetDatum(fcinfo->flinfo->fn_oid)))
 		{
 			pgstat_drop_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId,
-							  fcinfo->flinfo->fn_oid);
+							  fcinfo->flinfo->fn_oid, InvalidOid);
 			ereport(ERROR, errcode(ERRCODE_UNDEFINED_FUNCTION),
 					errmsg("function call to dropped function"));
 		}
@@ -224,7 +227,7 @@ find_funcstat_entry(Oid func_id)
 {
 	PgStat_EntryRef *entry_ref;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 
 	if (entry_ref)
 		return entry_ref->pending;
@@ -239,5 +242,5 @@ PgStat_StatFuncEntry *
 pgstat_fetch_stat_funcentry(Oid func_id)
 {
 	return (PgStat_StatFuncEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id);
+		pgstat_fetch_entry(PGSTAT_KIND_FUNCTION, MyDatabaseId, func_id, InvalidOid);
 }
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index faba8b64d2..cc71c23760 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -43,6 +43,7 @@ typedef struct TwoPhasePgStatRecord
 
 
 static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+PgStat_StatRelFileNodeEntry *pgstat_prep_relfilenode_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -68,6 +69,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
 										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
 										  RelationGetRelid(dst),
+										  InvalidOid,
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
@@ -169,7 +171,7 @@ pgstat_create_relation(Relation rel)
 {
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
 								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								RelationGetRelid(rel), InvalidOid);
 }
 
 /*
@@ -183,7 +185,7 @@ pgstat_drop_relation(Relation rel)
 
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
 							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  RelationGetRelid(rel), InvalidOid);
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -224,7 +226,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											dboid, tableoid, InvalidOid, false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -317,6 +319,7 @@ pgstat_report_analyze(Relation rel,
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
 											RelationGetRelid(rel),
+											InvalidOid,
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -457,6 +460,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
 	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
 }
 
+/*
+ * Support function for the SQL-callable pgstat* functions. Returns
+ * the collected statistics for one relfilenode or NULL. NULL doesn't mean
+ * that the relfilenode doesn't exist, just that there are no statistics, so the
+ * caller is better off to report ZERO instead.
+ */
+PgStat_StatRelFileNodeEntry *
+pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile)
+{
+	return (PgStat_StatRelFileNodeEntry *)
+		pgstat_fetch_entry(PGSTAT_KIND_RELFILENODE, dboid, spcOid, relfile);
+}
+
 /*
  * More efficient version of pgstat_fetch_stat_tabentry(), allowing to specify
  * whether the to-be-accessed table is a shared relation or not.
@@ -467,7 +483,7 @@ pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
 	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 
 	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid, InvalidOid);
 }
 
 /*
@@ -490,10 +506,10 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id, InvalidOid);
 	if (!entry_ref)
 	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
+		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id, InvalidOid);
 		if (!entry_ref)
 			return tablestatus;
 	}
@@ -877,6 +893,38 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	return true;
 }
 
+/*
+ * Flush out pending stats for the relfilenode entry
+ *
+ * If nowait is true, this function returns false if lock could not
+ * immediately acquired, otherwise true is returned.
+ */
+bool
+pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_RelFileNode *sharedent;
+	PgStat_StatRelFileNodeEntry *pendingent;
+
+	pendingent = (PgStat_StatRelFileNodeEntry *) entry_ref->pending;
+	sharedent = (PgStatShared_RelFileNode *) entry_ref->shared_stats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+#define PGSTAT_ACCUM_RELFILENODECOUNT(item)      \
+		(sharedent)->stats.item += (pendingent)->item
+
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_fetched);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_hit);
+	PGSTAT_ACCUM_RELFILENODECOUNT(blocks_written);
+
+	pgstat_unlock_entry(entry_ref);
+
+	memset(pendingent, 0, sizeof(*pendingent));
+
+	return true;
+}
+
 void
 pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref)
 {
@@ -898,7 +946,7 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
 										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  rel_id, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->id = rel_id;
 	pending->shared = isshared;
@@ -906,6 +954,56 @@ pgstat_prep_relation_pending(Oid rel_id, bool isshared)
 	return pending;
 }
 
+PgStat_StatRelFileNodeEntry *
+pgstat_prep_relfilenode_pending(RelFileLocator locator)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELFILENODE, locator.dbOid,
+										  locator.spcOid, locator.relNumber, NULL);
+
+	return entry_ref->pending;
+}
+
+void
+pgstat_report_relfilenode_blks_written(RelFileLocator locator)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	relfileentry = pgstat_prep_relfilenode_pending(locator);
+
+	if (relfileentry)
+		relfileentry->blocks_written++;
+}
+
+void
+pgstat_report_relfilenode_buffer_read(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_read(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_fetched++;
+}
+
+void
+pgstat_report_relfilenode_buffer_hit(Relation reln)
+{
+	PgStat_StatRelFileNodeEntry *relfileentry = NULL;
+
+	/* For relation stats to survive after a rewrite */
+	pgstat_count_buffer_hit(reln);
+
+	relfileentry = pgstat_prep_relfilenode_pending(reln->rd_locator);
+
+	if (relfileentry)
+		relfileentry->blocks_hit++;
+}
+
 /*
  * add a new (sub)transaction state record
  */
diff --git a/src/backend/utils/activity/pgstat_replslot.c b/src/backend/utils/activity/pgstat_replslot.c
index ddf2ab9928..da7016313e 100644
--- a/src/backend/utils/activity/pgstat_replslot.c
+++ b/src/backend/utils/activity/pgstat_replslot.c
@@ -62,7 +62,7 @@ pgstat_reset_replslot(const char *name)
 	 */
 	if (SlotIsLogical(slot))
 		pgstat_reset(PGSTAT_KIND_REPLSLOT, InvalidOid,
-					 ReplicationSlotIndex(slot));
+					 ReplicationSlotIndex(slot), InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 }
@@ -82,7 +82,7 @@ pgstat_report_replslot(ReplicationSlot *slot, const PgStat_StatReplSlotEntry *re
 	PgStat_StatReplSlotEntry *statent;
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 	statent = &shstatent->stats;
 
@@ -116,7 +116,7 @@ pgstat_create_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_REPLSLOT, InvalidOid,
-											ReplicationSlotIndex(slot), false);
+											ReplicationSlotIndex(slot), InvalidOid, false);
 	shstatent = (PgStatShared_ReplSlot *) entry_ref->shared_stats;
 
 	/*
@@ -146,7 +146,7 @@ void
 pgstat_acquire_replslot(ReplicationSlot *slot)
 {
 	pgstat_get_entry_ref(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						 ReplicationSlotIndex(slot), true, NULL);
+						 ReplicationSlotIndex(slot), InvalidOid, true, NULL);
 }
 
 /*
@@ -158,7 +158,7 @@ pgstat_drop_replslot(ReplicationSlot *slot)
 	Assert(LWLockHeldByMeInMode(ReplicationSlotAllocationLock, LW_EXCLUSIVE));
 
 	if (!pgstat_drop_entry(PGSTAT_KIND_REPLSLOT, InvalidOid,
-						   ReplicationSlotIndex(slot)))
+						   ReplicationSlotIndex(slot), InvalidOid))
 		pgstat_request_entry_refs_gc();
 }
 
@@ -178,7 +178,7 @@ pgstat_fetch_replslot(NameData slotname)
 
 	if (idx != -1)
 		slotentry = (PgStat_StatReplSlotEntry *) pgstat_fetch_entry(PGSTAT_KIND_REPLSLOT,
-																	InvalidOid, idx);
+																	InvalidOid, idx, InvalidOid);
 
 	LWLockRelease(ReplicationSlotControlLock);
 
@@ -210,6 +210,7 @@ pgstat_replslot_from_serialized_name_cb(const NameData *name, PgStat_HashKey *ke
 	key->kind = PGSTAT_KIND_REPLSLOT;
 	key->dboid = InvalidOid;
 	key->objid = idx;
+	key->relfile = InvalidOid;
 
 	return true;
 }
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index c1b7ff76b1..82f955a0b2 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -429,8 +429,8 @@ pgstat_get_entry_ref_cached(PgStat_HashKey key, PgStat_EntryRef **entry_ref_p)
  * if the entry is newly created, false otherwise.
  */
 PgStat_EntryRef *
-pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
-					 bool *created_entry)
+pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile,
+					 bool create, bool *created_entry)
 {
 	PgStat_HashKey key;
 	PgStatShared_HashEntry *shhashent;
@@ -443,6 +443,7 @@ pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid, bool create,
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objid = objid;
+	key.relfile = relfile;
 
 	/*
 	 * passing in created_entry only makes sense if we possibly could create
@@ -652,12 +653,12 @@ pgstat_unlock_entry(PgStat_EntryRef *entry_ref)
  */
 PgStat_EntryRef *
 pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, uint64 objid,
-							bool nowait)
+							RelFileNumber relfile, bool nowait)
 {
 	PgStat_EntryRef *entry_ref;
 
 	/* find shared table stats entry corresponding to the local entry */
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, true, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, relfile, true, NULL);
 
 	/* lock the shared entry to protect the content, skip if failed */
 	if (!pgstat_lock_entry(entry_ref, nowait))
@@ -827,10 +828,11 @@ pgstat_drop_entry_internal(PgStatShared_HashEntry *shent,
 	 */
 	if (shent->dropped)
 		elog(ERROR,
-			 "trying to drop stats entry already dropped: kind=%s dboid=%u objid=%llu refcount=%u",
+			 "trying to drop stats entry already dropped: kind=%s dboid=%u objid=%llu relfile=%u refcount=%u",
 			 pgstat_get_kind_info(shent->key.kind)->name,
 			 shent->key.dboid,
 			 (unsigned long long) shent->key.objid,
+			 shent->key.relfile,
 			 pg_atomic_read_u32(&shent->refcount));
 	shent->dropped = true;
 
@@ -913,7 +915,7 @@ pgstat_drop_database_and_contents(Oid dboid)
  * pgstat_gc_entry_refs().
  */
 bool
-pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
+pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile)
 {
 	PgStat_HashKey key;
 	PgStatShared_HashEntry *shent;
@@ -925,6 +927,7 @@ pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
 	key.kind = kind;
 	key.dboid = dboid;
 	key.objid = objid;
+	key.relfile = relfile;
 
 	/* delete local reference */
 	if (pgStatEntryRefHash)
@@ -995,13 +998,13 @@ shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header,
  * Reset one variable-numbered stats entry.
  */
 void
-pgstat_reset_entry(PgStat_Kind kind, Oid dboid, uint64 objid, TimestampTz ts)
+pgstat_reset_entry(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile, TimestampTz ts)
 {
 	PgStat_EntryRef *entry_ref;
 
 	Assert(!pgstat_get_kind_info(kind)->fixed_amount);
 
-	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, relfile, false, NULL);
 	if (!entry_ref || entry_ref->shared_entry->dropped)
 		return;
 
diff --git a/src/backend/utils/activity/pgstat_subscription.c b/src/backend/utils/activity/pgstat_subscription.c
index e06c92727e..417c81246d 100644
--- a/src/backend/utils/activity/pgstat_subscription.c
+++ b/src/backend/utils/activity/pgstat_subscription.c
@@ -30,7 +30,7 @@ pgstat_report_subscription_error(Oid subid, bool is_apply_error)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 
 	if (is_apply_error)
@@ -49,7 +49,7 @@ pgstat_report_subscription_conflict(Oid subid, ConflictType type)
 	PgStat_BackendSubEntry *pending;
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_SUBSCRIPTION,
-										  InvalidOid, subid, NULL);
+										  InvalidOid, subid, InvalidOid, NULL);
 	pending = entry_ref->pending;
 	pending->conflict_count[type]++;
 }
@@ -62,12 +62,12 @@ pgstat_create_subscription(Oid subid)
 {
 	/* Ensures that stats are dropped if transaction rolls back */
 	pgstat_create_transactional(PGSTAT_KIND_SUBSCRIPTION,
-								InvalidOid, subid);
+								InvalidOid, subid, InvalidOid);
 
 	/* Create and initialize the subscription stats entry */
-	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid,
+	pgstat_get_entry_ref(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid,
 						 true, NULL);
-	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, 0);
+	pgstat_reset_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid, 0);
 }
 
 /*
@@ -79,7 +79,7 @@ void
 pgstat_drop_subscription(Oid subid)
 {
 	pgstat_drop_transactional(PGSTAT_KIND_SUBSCRIPTION,
-							  InvalidOid, subid);
+							  InvalidOid, subid, InvalidOid);
 }
 
 /*
@@ -90,7 +90,7 @@ PgStat_StatSubEntry *
 pgstat_fetch_stat_subscription(Oid subid)
 {
 	return (PgStat_StatSubEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_fetch_entry(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index f87a195996..4c663b7d62 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -30,7 +30,7 @@ static void AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool
 static void AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 											bool isCommit, int nestDepth);
 
-static PgStat_SubXactStatus *pgStatXactStack = NULL;
+PgStat_SubXactStatus *pgStatXactStack = NULL;
 
 
 /*
@@ -85,7 +85,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that dropped an object committed. Drop the stats
 			 * too.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, objid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, objid, it->relfile))
 				not_freed_count++;
 		}
 		else if (!isCommit && pending->is_create)
@@ -94,7 +94,7 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 			 * Transaction that created an object aborted. Drop the stats
 			 * associated with the object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, objid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, objid, it->relfile))
 				not_freed_count++;
 		}
 
@@ -106,6 +106,38 @@ AtEOXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state, bool isCommit)
 		pgstat_request_entry_refs_gc();
 }
 
+/*
+ * Remove a relfilenode stat from the list of stats to be dropped.
+ */
+void
+PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator)
+{
+	dlist_mutable_iter iter;
+
+	if (dclist_count(&xact_state->pending_drops) == 0)
+		return;
+
+	dclist_foreach_modify(iter, &xact_state->pending_drops)
+	{
+		PgStat_PendingDroppedStatsItem *pending =
+			dclist_container(PgStat_PendingDroppedStatsItem, node, iter.cur);
+		xl_xact_stats_item *it = &pending->item;
+
+		if (it->kind == PGSTAT_KIND_RELFILENODE && it->dboid == rlocator.dbOid
+			&& it->relfile == rlocator.relNumber)
+		{
+			uint64		objid = ((uint64) it->objid_hi) << 32 | it->objid_lo;
+
+			if (objid == rlocator.spcOid)
+			{
+				dclist_delete_from(&xact_state->pending_drops, &pending->node);
+				pfree(pending);
+				return;
+			}
+		}
+	}
+}
+
 /*
  * Called from access/transam/xact.c at subtransaction commit/abort.
  */
@@ -160,7 +192,7 @@ AtEOSubXact_PgStat_DroppedStats(PgStat_SubXactStatus *xact_state,
 			 * Subtransaction creating a new stats object aborted. Drop the
 			 * stats object.
 			 */
-			if (!pgstat_drop_entry(it->kind, it->dboid, objid))
+			if (!pgstat_drop_entry(it->kind, it->dboid, objid, it->relfile))
 				not_freed_count++;
 			pfree(pending);
 		}
@@ -323,7 +355,11 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 		xl_xact_stats_item *it = &items[i];
 		uint64		objid = ((uint64) it->objid_hi) << 32 | it->objid_lo;
 
-		if (!pgstat_drop_entry(it->kind, it->dboid, objid))
+		/* leave it to pgstat_drop_transactional() in RelationDropStorage() */
+		if (it->kind == PGSTAT_KIND_RELFILENODE)
+			continue;
+
+		if (!pgstat_drop_entry(it->kind, it->dboid, objid, it->relfile))
 			not_freed_count++;
 	}
 
@@ -332,7 +368,8 @@ pgstat_execute_transactional_drops(int ndrops, struct xl_xact_stats_item *items,
 }
 
 static void
-create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid, bool is_create)
+create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid,
+								   RelFileNumber relfile, bool is_create)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_SubXactStatus *xact_state;
@@ -346,6 +383,7 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid, bo
 	drop->item.dboid = dboid;
 	drop->item.objid_lo = (uint32) objid;
 	drop->item.objid_hi = (uint32) (objid >> 32);
+	drop->item.relfile = relfile;
 
 	dclist_push_tail(&xact_state->pending_drops, &drop->node);
 }
@@ -358,19 +396,19 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid, bo
  * dropped.
  */
 void
-pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid)
+pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objid, false, NULL))
+	if (pgstat_get_entry_ref(kind, dboid, objid, relfile, false, NULL))
 	{
 		ereport(WARNING,
-				errmsg("resetting existing statistics for kind %s, db=%u, oid=%llu",
+				errmsg("resetting existing statistics for kind %s, db=%u, oid=%llu, relfile=%u",
 					   (pgstat_get_kind_info(kind))->name, dboid,
-					   (unsigned long long) objid));
+					   (unsigned long long) objid, relfile));
 
-		pgstat_reset(kind, dboid, objid);
+		pgstat_reset(kind, dboid, objid, relfile);
 	}
 
-	create_drop_transactional_internal(kind, dboid, objid, /* create */ true);
+	create_drop_transactional_internal(kind, dboid, objid, relfile, /* create */ true);
 }
 
 /*
@@ -381,7 +419,7 @@ pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid)
  * alive.
  */
 void
-pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, uint64 objid)
+pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile)
 {
-	create_drop_transactional_internal(kind, dboid, objid, /* create */ false);
+	create_drop_transactional_internal(kind, dboid, objid, relfile, /* create */ false);
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index f7b50e0b5a..d7cc55c993 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -106,6 +106,30 @@ PG_STAT_GET_RELENTRY_INT64(tuples_updated)
 /* pg_stat_get_vacuum_count */
 PG_STAT_GET_RELENTRY_INT64(vacuum_count)
 
+#define PG_STAT_GET_RELFILEENTRY_INT64(stat)						\
+Datum															\
+CppConcat(pg_stat_get_relfilenode_,stat)(PG_FUNCTION_ARGS)					\
+{																\
+	Oid			dboid = PG_GETARG_OID(0);						\
+	Oid			 spcOid = PG_GETARG_OID(1);						\
+	RelFileNumber			 relfile = PG_GETARG_OID(2);						\
+	int64		result;											\
+	PgStat_StatRelFileNodeEntry *relfileentry;								\
+																\
+	if ((relfileentry = pgstat_fetch_stat_relfilenodeentry(dboid, spcOid, relfile)) == NULL)	\
+		result = 0;												\
+	else														\
+		result = (int64) (relfileentry->stat);						\
+																\
+	PG_RETURN_INT64(result);									\
+}
+
+/* pg_stat_get_relfilenode_blocks_written */
+PG_STAT_GET_RELFILEENTRY_INT64(blocks_written)
+
+/* pg_stat_get_blocks_written */
+PG_STAT_GET_RELENTRY_INT64(blocks_written)
+
 #define PG_STAT_GET_RELENTRY_TIMESTAMPTZ(stat)					\
 Datum															\
 CppConcat(pg_stat_get_,stat)(PG_FUNCTION_ARGS)					\
@@ -1764,7 +1788,7 @@ pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 	Oid			taboid = PG_GETARG_OID(0);
 	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1774,7 +1798,7 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 {
 	Oid			funcoid = PG_GETARG_OID(0);
 
-	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid);
+	pgstat_reset(PGSTAT_KIND_FUNCTION, MyDatabaseId, funcoid, InvalidOid);
 
 	PG_RETURN_VOID();
 }
@@ -1832,7 +1856,7 @@ pg_stat_reset_subscription_stats(PG_FUNCTION_ARGS)
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 					 errmsg("invalid subscription OID %u", subid)));
-		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid);
+		pgstat_reset(PGSTAT_KIND_SUBSCRIPTION, InvalidOid, subid, InvalidOid);
 	}
 
 	PG_RETURN_VOID();
@@ -2059,7 +2083,9 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	char	   *stats_type = text_to_cstring(PG_GETARG_TEXT_P(0));
 	Oid			dboid = PG_GETARG_OID(1);
 	uint64		objid = PG_GETARG_INT64(2);
+	Oid			relfile = PG_GETARG_OID(3);
+
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
-	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objid));
+	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objid, relfile));
 }
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index adb478a93c..a3ae8465dd 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -21,7 +21,9 @@
 #include "access/sdir.h"
 #include "access/xact.h"
 #include "executor/tuptable.h"
+#include "pgstat.h"
 #include "storage/read_stream.h"
+#include "utils/pgstat_internal.h"
 #include "utils/rel.h"
 #include "utils/snapshot.h"
 
@@ -1633,6 +1635,23 @@ table_relation_set_new_filelocator(Relation rel,
 								   TransactionId *freezeXid,
 								   MultiXactId *minmulti)
 {
+	PgStat_StatRelFileNodeEntry *relfileentry;
+	PgStat_StatTabEntry *tabentry = NULL;
+	PgStat_EntryRef *entry_ref = NULL;
+	PgStatShared_Relation *shtabentry;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_RELATION, MyDatabaseId, rel->rd_id, InvalidOid, false, NULL);
+	if (entry_ref)
+	{
+		shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
+		tabentry = &shtabentry->stats;
+	}
+
+	relfileentry = pgstat_fetch_stat_relfilenodeentry(rel->rd_locator.dbOid, rel->rd_locator.spcOid, rel->rd_locator.relNumber);
+
+	if (tabentry && relfileentry)
+		tabentry->blocks_written += relfileentry->blocks_written;
+
 	rel->rd_tableam->relation_set_new_filelocator(rel, newrlocator,
 												  persistence, freezeXid,
 												  minmulti);
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index fb64d7413a..7a50e5d008 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -290,6 +290,7 @@ typedef struct xl_xact_stats_item
 	 */
 	uint32		objid_lo;
 	uint32		objid_hi;
+	RelFileNumber relfile;
 } xl_xact_stats_item;
 
 typedef struct xl_xact_stats_items
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a38e20f5d9..2a565366cd 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5469,6 +5469,14 @@
   proname => 'pg_stat_get_tuples_updated', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_tuples_updated' },
+{ oid => '9280', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_relfilenode_blocks_written', provolatile => 's',
+  proparallel => 'r',
+  proargtypes => 'oid oid oid',
+  prorettype => 'int8',
+  proallargtypes => '{oid,oid,oid,int8}',
+  proargmodes => '{i,i,i,o}',
+  prosrc => 'pg_stat_get_relfilenode_blocks_written' },
 { oid => '1933', descr => 'statistics: number of tuples deleted',
   proname => 'pg_stat_get_tuples_deleted', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
@@ -5508,6 +5516,10 @@
   proname => 'pg_stat_get_blocks_hit', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_blocks_hit' },
+{ oid => '8438', descr => 'statistics: number of blocks written',
+  proname => 'pg_stat_get_blocks_written', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_blocks_written' },
 { oid => '2781', descr => 'statistics: last manual vacuum time for a table',
   proname => 'pg_stat_get_last_vacuum_time', provolatile => 's',
   proparallel => 'r', prorettype => 'timestamptz', proargtypes => 'oid',
@@ -5594,7 +5606,7 @@
 
 { oid => '6230', descr => 'statistics: check if a stats object exists',
   proname => 'pg_stat_have_stats', provolatile => 'v', proparallel => 'r',
-  prorettype => 'bool', proargtypes => 'text oid int8',
+  prorettype => 'bool', proargtypes => 'text oid int8 oid',
   prosrc => 'pg_stat_have_stats' },
 
 { oid => '6231', descr => 'statistics: information about subscription stats',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index df53fa2d4f..ecfbb7cace 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -46,17 +47,18 @@
 /* stats for variable-numbered objects */
 #define PGSTAT_KIND_DATABASE	1	/* database-wide statistics */
 #define PGSTAT_KIND_RELATION	2	/* per-table statistics */
-#define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
-#define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
-#define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_RELFILENODE 3   /* per-relfilenode statistics */
+#define PGSTAT_KIND_FUNCTION	4	/* per-function statistics */
+#define PGSTAT_KIND_REPLSLOT	5	/* per-slot statistics */
+#define PGSTAT_KIND_SUBSCRIPTION	6	/* per-subscription statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -452,6 +454,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
 
 	TimestampTz last_vacuum_time;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
@@ -463,6 +466,13 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter autoanalyze_count;
 } PgStat_StatTabEntry;
 
+typedef struct PgStat_StatRelFileNodeEntry
+{
+	PgStat_Counter blocks_fetched;
+	PgStat_Counter blocks_hit;
+	PgStat_Counter blocks_written;
+} PgStat_StatRelFileNodeEntry;
+
 typedef struct PgStat_WalStats
 {
 	PgStat_Counter wal_records;
@@ -513,7 +523,7 @@ extern long pgstat_report_stat(bool force);
 extern void pgstat_force_next_flush(void);
 
 extern void pgstat_reset_counters(void);
-extern void pgstat_reset(PgStat_Kind kind, Oid dboid, uint64 objid);
+extern void pgstat_reset(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile);
 extern void pgstat_reset_of_kind(PgStat_Kind kind);
 
 /* stats accessors */
@@ -522,7 +532,7 @@ extern TimestampTz pgstat_get_stat_snapshot_timestamp(bool *have_snapshot);
 
 /* helpers */
 extern PgStat_Kind pgstat_get_kind_from_str(char *kind_str);
-extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
+extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile);
 
 
 /*
@@ -631,6 +641,10 @@ extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
 
+extern void pgstat_report_relfilenode_blks_written(RelFileLocator locator);
+extern void pgstat_report_relfilenode_buffer_read(Relation reln);
+extern void pgstat_report_relfilenode_buffer_hit(Relation reln);
+
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
  * pgstat_assoc_relation() to do so. See its comment for why this is done
@@ -690,6 +704,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
+extern PgStat_StatRelFileNodeEntry *pgstat_fetch_stat_relfilenodeentry(Oid dboid, Oid spcOid, RelFileNumber relfile);
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
 														   Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 61b2e1f96b..b5700708fc 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -55,6 +55,7 @@ typedef struct PgStat_HashKey
 	Oid			dboid;			/* database ID. InvalidOid for shared objects. */
 	uint64		objid;			/* object ID (table, function, etc.), or
 								 * identifier. */
+	RelFileNumber	relfile;	/* relfilenumber for RelFileLocator. */
 } PgStat_HashKey;
 
 /*
@@ -410,6 +411,12 @@ typedef struct PgStatShared_Relation
 	PgStat_StatTabEntry stats;
 } PgStatShared_Relation;
 
+typedef struct PgStatShared_RelFileNode
+{
+	PgStatShared_Common header;
+	PgStat_StatRelFileNodeEntry stats;
+} PgStatShared_RelFileNode;
+
 typedef struct PgStatShared_Function
 {
 	PgStatShared_Common header;
@@ -548,6 +555,9 @@ static inline void *pgstat_get_entry_data(PgStat_Kind kind, PgStatShared_Common
 static inline void *pgstat_get_custom_shmem_data(PgStat_Kind kind);
 static inline void *pgstat_get_custom_snapshot_data(PgStat_Kind kind);
 
+extern PgStat_SubXactStatus *pgStatXactStack;
+extern void PgStat_RemoveRelFileNodeFromDroppedStats(PgStat_SubXactStatus *xact_state, RelFileLocator rlocator);
+
 
 /*
  * Functions in pgstat.c
@@ -565,12 +575,14 @@ extern void pgstat_assert_is_up(void);
 
 extern void pgstat_delete_pending_entry(PgStat_EntryRef *entry_ref);
 extern PgStat_EntryRef *pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid,
-												  uint64 objid,
+												  uint64 objid, RelFileNumber relfile,
 												  bool *created_entry);
 extern PgStat_EntryRef *pgstat_fetch_pending_entry(PgStat_Kind kind,
-												   Oid dboid, uint64 objid);
+												   Oid dboid, uint64 objid,
+												   RelFileNumber relfile);
 
-extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
+extern void *pgstat_fetch_entry(PgStat_Kind kind, Oid dboid, uint64 objid,
+								RelFileNumber relfile);
 extern void pgstat_snapshot_fixed(PgStat_Kind kind);
 
 
@@ -645,6 +657,7 @@ extern void AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern bool pgstat_relfilenode_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 
 
@@ -665,15 +678,17 @@ extern void pgstat_attach_shmem(void);
 extern void pgstat_detach_shmem(void);
 
 extern PgStat_EntryRef *pgstat_get_entry_ref(PgStat_Kind kind, Oid dboid, uint64 objid,
-											 bool create, bool *created_entry);
+											 RelFileNumber relfile, bool create,
+											 bool *created_entry);
 extern bool pgstat_lock_entry(PgStat_EntryRef *entry_ref, bool nowait);
 extern bool pgstat_lock_entry_shared(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_unlock_entry(PgStat_EntryRef *entry_ref);
-extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
+extern bool pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile);
 extern void pgstat_drop_all_entries(void);
 extern PgStat_EntryRef *pgstat_get_entry_ref_locked(PgStat_Kind kind, Oid dboid, uint64 objid,
-													bool nowait);
-extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, uint64 objid, TimestampTz ts);
+													RelFileNumber relfile, bool nowait);
+extern void pgstat_reset_entry(PgStat_Kind kind, Oid dboid, uint64 objid,
+							   RelFileNumber relfile, TimestampTz ts);
 extern void pgstat_reset_entries_of_kind(PgStat_Kind kind, TimestampTz ts);
 extern void pgstat_reset_matching_entries(bool (*do_reset) (PgStatShared_HashEntry *, Datum),
 										  Datum match_data,
@@ -722,8 +737,8 @@ extern void pgstat_subscription_reset_timestamp_cb(PgStatShared_Common *header,
  */
 
 extern PgStat_SubXactStatus *pgstat_get_xact_stack_level(int nest_level);
-extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, uint64 objid);
-extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid);
+extern void pgstat_drop_transactional(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile);
+extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid, RelFileNumber relfile);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index d14ac12418..4c83a0c167 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -40,10 +40,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 my $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -64,10 +64,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -81,10 +81,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -95,10 +95,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 't', "$sect: db stats do exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -114,10 +114,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -130,10 +130,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats('database', $dboid, 0, 0), 'f', "$sect: db stats do not exist");
+is(have_stats('function', $dboid, $funcoid, 0),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats('relation', $dboid, $tableoid, 0),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -292,10 +292,10 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objid) = @_;
+	my ($kind, $dboid, $objid, $relfile) = @_;
 
 	return $node->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('$kind', $dboid, $objid)");
+		"SELECT pg_stat_have_stats('$kind', $dboid, $objid, $relfile)");
 }
 
 sub overwrite_file
diff --git a/src/test/recovery/t/030_stats_cleanup_replica.pl b/src/test/recovery/t/030_stats_cleanup_replica.pl
index 74b516cc7c..317df24c4f 100644
--- a/src/test/recovery/t/030_stats_cleanup_replica.pl
+++ b/src/test/recovery/t/030_stats_cleanup_replica.pl
@@ -179,9 +179,9 @@ sub test_standby_func_tab_stats_status
 	my %stats;
 
 	$stats{rel} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid)");
+		"SELECT pg_stat_have_stats('relation', $dboid, $tableoid, 0)");
 	$stats{func} = $node_standby->safe_psql($connect_db,
-		"SELECT pg_stat_have_stats('function', $dboid, $funcoid)");
+		"SELECT pg_stat_have_stats('function', $dboid, $funcoid, 0)");
 
 	is_deeply(\%stats, \%expected, "$sect: standby stats as expected");
 
@@ -194,7 +194,7 @@ sub test_standby_db_stats_status
 	my ($connect_db, $dboid, $present) = @_;
 
 	is( $node_standby->safe_psql(
-			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0)"),
+			$connect_db, "SELECT pg_stat_have_stats('database', $dboid, 0, 0)"),
 		$present,
 		"$sect: standby db stats as expected");
 }
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2b47013f11..e2010a7dc4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2342,6 +2342,11 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
     (pg_stat_get_blocks_fetched(c.oid) - pg_stat_get_blocks_hit(c.oid)) AS heap_blks_read,
+    (pg_stat_get_blocks_written(c.oid) + pg_stat_get_relfilenode_blocks_written(d.oid,
+        CASE
+            WHEN (c.reltablespace <> (0)::oid) THEN c.reltablespace
+            ELSE d.dattablespace
+        END, c.relfilenode)) AS heap_blks_written,
     pg_stat_get_blocks_hit(c.oid) AS heap_blks_hit,
     i.idx_blks_read,
     i.idx_blks_hit,
@@ -2349,7 +2354,8 @@ pg_statio_all_tables| SELECT c.oid AS relid,
     pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit,
     x.idx_blks_read AS tidx_blks_read,
     x.idx_blks_hit AS tidx_blks_hit
-   FROM ((((pg_class c
+   FROM pg_database d,
+    ((((pg_class c
      LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid)))
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN LATERAL ( SELECT (sum((pg_stat_get_blocks_fetched(pg_index.indexrelid) - pg_stat_get_blocks_hit(pg_index.indexrelid))))::bigint AS idx_blks_read,
@@ -2360,7 +2366,7 @@ pg_statio_all_tables| SELECT c.oid AS relid,
             (sum(pg_stat_get_blocks_hit(pg_index.indexrelid)))::bigint AS idx_blks_hit
            FROM pg_index
           WHERE (pg_index.indrelid = t.oid)) x ON (true))
-  WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"]));
+  WHERE ((c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) AND (d.datname = current_database()));
 pg_statio_sys_indexes| SELECT relid,
     indexrelid,
     schemaname,
@@ -2381,6 +2387,7 @@ pg_statio_sys_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
@@ -2410,6 +2417,7 @@ pg_statio_user_tables| SELECT relid,
     schemaname,
     relname,
     heap_blks_read,
+    heap_blks_written,
     heap_blks_hit,
     idx_blks_read,
     idx_blks_hit,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..6ae4c13376 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1111,23 +1111,23 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 ERROR:  invalid statistics kind: "zaphod"
 -- db stats have objid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
  pg_stat_have_stats 
 --------------------
  f
 (1 row)
 
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1144,21 +1144,21 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1174,14 +1174,14 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
 (1 row)
 
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1196,7 +1196,7 @@ select a from stats_test_tab1 where a = 3;
  3
 (1 row)
 
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1204,7 +1204,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  f
@@ -1212,7 +1212,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1220,7 +1220,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1228,7 +1228,7 @@ SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
 
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
  pg_stat_have_stats 
 --------------------
  t
@@ -1513,7 +1513,7 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 (1 row)
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
  pg_stat_have_stats 
 --------------------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..a992737a34 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -539,12 +539,12 @@ ROLLBACK;
 -- pg_stat_have_stats behavior
 ----
 -- fixed-numbered stats exist
-SELECT pg_stat_have_stats('bgwriter', 0, 0);
+SELECT pg_stat_have_stats('bgwriter', 0, 0, 0);
 -- unknown stats kinds error out
-SELECT pg_stat_have_stats('zaphod', 0, 0);
+SELECT pg_stat_have_stats('zaphod', 0, 0, 0);
 -- db stats have objid 0
-SELECT pg_stat_have_stats('database', :dboid, 1);
-SELECT pg_stat_have_stats('database', :dboid, 0);
+SELECT pg_stat_have_stats('database', :dboid, 1, 0);
+SELECT pg_stat_have_stats('database', :dboid, 0, 0);
 
 -- pg_stat_have_stats returns true for committed index creation
 CREATE table stats_test_tab1 as select generate_series(1,10) a;
@@ -552,40 +552,40 @@ CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 SET enable_seqscan TO off;
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for dropped index with stats
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns false for rolled back index creation
 BEGIN;
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for reindex CONCURRENTLY
 CREATE index stats_test_idx1 on stats_test_tab1(a);
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
 select a from stats_test_tab1 where a = 3;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 REINDEX index CONCURRENTLY stats_test_idx1;
 -- false for previous oid
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 -- true for new oid
 SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- pg_stat_have_stats returns true for a rolled back drop index with stats
 BEGIN;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 DROP index stats_test_idx1;
 ROLLBACK;
-SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid, 0);
 
 -- put enable_seqscan back to on
 SET enable_seqscan TO on;
@@ -759,7 +759,7 @@ SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
 SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
 
 -- Test IO stats reset
-SELECT pg_stat_have_stats('io', 0, 0);
+SELECT pg_stat_have_stats('io', 0, 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
 SELECT pg_stat_reset_shared('io');
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl
index 6b6a5b0b1b..89ebf5aa2c 100644
--- a/src/test/subscription/t/026_stats.pl
+++ b/src/test/subscription/t/026_stats.pl
@@ -290,7 +290,7 @@ $node_subscriber->safe_psql($db, qq(DROP SUBSCRIPTION $sub1_name));
 
 # Subscription stats for sub1 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub1_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub1_name' should be removed.));
 
@@ -309,7 +309,7 @@ DROP SUBSCRIPTION $sub2_name;
 
 # Subscription stats for sub2 should be gone
 is( $node_subscriber->safe_psql(
-		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid))),
+		$db, qq(SELECT pg_stat_have_stats('subscription', 0, $sub2_oid, 0))),
 	qq(f),
 	qq(Subscription stats for subscription '$sub2_name' should be removed.));
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 171a7dd5d2..e0b26287f3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2127,6 +2127,7 @@ PgStatShared_InjectionPoint
 PgStatShared_InjectionPointFixed
 PgStatShared_IO
 PgStatShared_Relation
+PgStatShared_RelFileNode
 PgStatShared_ReplSlot
 PgStatShared_SLRU
 PgStatShared_Subscription
-- 
2.34.1

#25Robert Haas
robertmhaas@gmail.com
In reply to: Bertrand Drouvot (#23)
Re: relfilenode statistics

On Mon, Nov 4, 2024 at 4:27 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Then I think we should go with the "sometimes I don't know the relation OID
so I want to use the relfilenumber instead, without changing the user experience"
way.

So does the latest version of the patch implement that principal
uniformly throughout?

--
Robert Haas
EDB: http://www.enterprisedb.com

#26Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Robert Haas (#25)
Re: relfilenode statistics

Hi,

On Mon, Nov 04, 2024 at 02:51:10PM -0500, Robert Haas wrote:

On Mon, Nov 4, 2024 at 4:27 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Then I think we should go with the "sometimes I don't know the relation OID
so I want to use the relfilenumber instead, without changing the user experience"
way.

So does the latest version of the patch implement that principal
uniformly throughout?

No, please don't look at v6-0001 and 0002 (as mentioned up-thread). The purpose
here is mainly to get an agreement on the design before moving forward.

Does it sound ok to you to move with the above principal? (I'm +1 on it).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#27Robert Haas
robertmhaas@gmail.com
In reply to: Bertrand Drouvot (#26)
Re: relfilenode statistics

On Tue, Nov 5, 2024 at 1:06 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Does it sound ok to you to move with the above principal? (I'm +1 on it).

Yes, provided we can get a clean implementation of it.

--
Robert Haas
EDB: http://www.enterprisedb.com

#28Kirill Reshke
reshkekirill@gmail.com
In reply to: Bertrand Drouvot (#26)
Re: relfilenode statistics

On Tue, 5 Nov 2024 at 11:06, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Does it sound ok to you to move with the above principal? (I'm +1 on it).

Hi! I looked through this thread.
Looks like we are still awaiting a patch which stores more counters
(n_dead_tup, ... etc) into relfilenode stats. So, I assume this should
be moved to the next CF.

I also have a very stupid question:
If we don’t have the relation OID when writing buffers out, can we
just store oid to buffertag mapping somewhere and use it?
I suspect that this is a horrible idea, but what's the exact reason?
Is it that we will break too many abstraction layers for such a minor
matter?

--
Best regards,
Kirill Reshke

#29Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kirill Reshke (#28)
Re: relfilenode statistics

Hi,

On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:

On Tue, 5 Nov 2024 at 11:06, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Does it sound ok to you to move with the above principal? (I'm +1 on it).

Hi! I looked through this thread.

Thanks for looking at it!

Looks like we are still awaiting a patch which stores more counters
(n_dead_tup, ... etc) into relfilenode stats.

Yes.

If we don’t have the relation OID when writing buffers out, can we
just store oid to buffertag mapping somewhere and use it?

Do you mean add the relation OID into the BufferTag? While that could probably
be done from a technical point of view (with probably non negligible amount
of refactoring), I can see those cons:

1. We'd increase the BufferDesc size and approaching the 64 bytes limit (cache
line size) that we don't want to exceed (see comment above BufferDesc definition)
2. Probably lot of refactoring
3. This new member would be there "only" for stats and reporting purpose as
it is not needed at all for buffer related operations
4. 3. seems to indicate that's not the right place

Then I think 1. and 2. are not worth it given 3. and 4.

There is probably other cons too though.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#30Kirill Reshke
reshkekirill@gmail.com
In reply to: Bertrand Drouvot (#29)
Re: relfilenode statistics

On Fri, 29 Nov 2024 at 20:20, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:

If we don’t have the relation OID when writing buffers out, can we
just store oid to buffertag mapping somewhere and use it?

Do you mean add the relation OID into the BufferTag? While that could probably
be done from a technical point of view (with probably non negligible amount
of refactoring), I can see those cons:

Not exactly, what i had in mind was a separate hashmap into shared
memory, mapping buffertag<>oid.

2. Probably lot of refactoring
3. This new member would be there "only" for stats and reporting purpose as
it is not needed at all for buffer related operations

To this design, your points 2&3 apply.

--
Best regards,
Kirill Reshke

#31Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kirill Reshke (#30)
Re: relfilenode statistics

Hi,

On Fri, Nov 29, 2024 at 08:52:13PM +0500, Kirill Reshke wrote:

On Fri, 29 Nov 2024 at 20:20, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:

If we don’t have the relation OID when writing buffers out, can we
just store oid to buffertag mapping somewhere and use it?

Do you mean add the relation OID into the BufferTag? While that could probably
be done from a technical point of view (with probably non negligible amount
of refactoring), I can see those cons:

Not exactly, what i had in mind was a separate hashmap into shared
memory, mapping buffertag<>oid.

I see.

2. Probably lot of refactoring
3. This new member would be there "only" for stats and reporting purpose as
it is not needed at all for buffer related operations

To this design, your points 2&3 apply.

That said, it might also help for DropRelationBuffers() where we need to scan
the entire buffer pool (there is an optimization in place though). We could
imagine buffertag as key and the value could be the relation OID and each entry
would have next/prev pointers linking to other BufferTags with same OID.

That's probably much more refactoring (and more invasive) that the initial idea
in this thread but could lead to multiple pros though. I'm not very familar with
the "buffer" area of the code and would also need to study the performance impact
to maintain this new hash map.

Do you and/or others have any thoughts/ideas about it?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#32Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#31)
Re: relfilenode statistics

Hi,

On Tue, Dec 03, 2024 at 10:31:15AM +0000, Bertrand Drouvot wrote:

Hi,

On Fri, Nov 29, 2024 at 08:52:13PM +0500, Kirill Reshke wrote:

On Fri, 29 Nov 2024 at 20:20, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Fri, Nov 29, 2024 at 11:23:12AM +0500, Kirill Reshke wrote:

If we don’t have the relation OID when writing buffers out, can we
just store oid to buffertag mapping somewhere and use it?

Do you mean add the relation OID into the BufferTag? While that could probably
be done from a technical point of view (with probably non negligible amount
of refactoring), I can see those cons:

Not exactly, what i had in mind was a separate hashmap into shared
memory, mapping buffertag<>oid.

I see.

2. Probably lot of refactoring
3. This new member would be there "only" for stats and reporting purpose as
it is not needed at all for buffer related operations

To this design, your points 2&3 apply.

That said, it might also help for DropRelationBuffers() where we need to scan
the entire buffer pool (there is an optimization in place though). We could
imagine buffertag as key and the value could be the relation OID and each entry
would have next/prev pointers linking to other BufferTags with same OID.

That's probably much more refactoring (and more invasive) that the initial idea
in this thread but could lead to multiple pros though. I'm not very familar with
the "buffer" area of the code and would also need to study the performance impact
to maintain this new hash map.

Do you and/or others have any thoughts/ideas about it?

As mentioned by Andres in [1]/messages/by-id/xvetwjsnkhx2gp6np225g2h64f4mfmg6oopkuaiivrpzd2futj@pflk55su36ho, relying on the relation OID would not work to
"recover" the stats because we don't have access to the relation oid during crash
recovery. So, I'm going to resume working on the "initial" idea (i.e having the
stats keyed by relfilenode).

[1]: /messages/by-id/xvetwjsnkhx2gp6np225g2h64f4mfmg6oopkuaiivrpzd2futj@pflk55su36ho

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#33Kirill Reshke
reshkekirill@gmail.com
In reply to: Bertrand Drouvot (#32)
Re: relfilenode statistics

On Fri, 3 Jan 2025 at 21:18, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

As mentioned by Andres in [1], relying on the relation OID would not work to
"recover" the stats because we don't have access to the relation oid during crash
recovery. So, I'm going to resume working on the "initial" idea (i.e having the
stats keyed by relfilenode).

[1]: /messages/by-id/xvetwjsnkhx2gp6np225g2h64f4mfmg6oopkuaiivrpzd2futj@pflk55su36ho

Hmm. While it is true that catalog lookups cannot be performed during
crash recovery, is it really necessary to save and retrieve statistics
after a crash? Given that statistics are permitted to be outdated and
server crashes are anticipated to be infrequent, it looks loke losing
a few analysis runs due to server crashes is acceptable.
In any case, I am totally OK with the relfilenode-based method because
it is generally less restricted (to other postgresql parts e.g. wal-
replay ) and simpler.

Also, this patch needs a rebase;)

--
Best regards,
Kirill Reshke

#34Michael Paquier
michael@paquier.xyz
In reply to: Kirill Reshke (#33)
Re: relfilenode statistics

On Thu, Mar 13, 2025 at 02:00:52PM +0500, Kirill Reshke wrote:

Hmm. While it is true that catalog lookups cannot be performed during
crash recovery, is it really necessary to save and retrieve statistics
after a crash?

Yes, losing stats on crash is a *very* annoying thing. Having no
stats for a relation means that autovacuum gives up entirely on
relations it has no stats of, skipping it entirely until they have
rebuilt and bloat would accumulate. Being able to recover these stats
from crash recovery is a cheap design, that would improve reliability
by a large degree.

Given that statistics are permitted to be outdated and
server crashes are anticipated to be infrequent, it looks loke losing
a few analysis runs due to server crashes is acceptable.
In any case, I am totally OK with the relfilenode-based method because
it is generally less restricted (to other postgresql parts e.g. wal-
replay ) and simpler.

The startup process is not connected to a database and has no access
to pg_class: the only thing we can know about are the on-disk files,
not their in-catalog OIDs. FWIW, I think that this patch would be a
huge step forward a more reliable stats system.

True that the patch needs a rebase. Bertrand has also mentioned that
some points needed more work.
--
Michael

#35Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#34)
Re: relfilenode statistics

Hi,

On Tue, Sep 16, 2025 at 03:44:25PM +0900, Michael Paquier wrote:

On Thu, Mar 13, 2025 at 02:00:52PM +0500, Kirill Reshke wrote:

Hmm. While it is true that catalog lookups cannot be performed during
crash recovery, is it really necessary to save and retrieve statistics
after a crash?

Yes, losing stats on crash is a *very* annoying thing. Having no
stats for a relation means that autovacuum gives up entirely on
relations it has no stats of, skipping it entirely until they have
rebuilt and bloat would accumulate. Being able to recover these stats
from crash recovery is a cheap design, that would improve reliability
by a large degree.

+1.

The startup process is not connected to a database and has no access
to pg_class: the only thing we can know about are the on-disk files,
not their in-catalog OIDs. FWIW, I think that this patch would be a
huge step forward a more reliable stats system.

True that the patch needs a rebase. Bertrand has also mentioned that
some points needed more work.

Right. I'll come back with a rebase, and a POC proposal on some stats so that
we could agree on the design. Also, it looks like that we have a consensus on
"sometimes I don't know the relation OID so I want to use the relfilenumber instead,
without changing the user experience" (see [1)).

As far Michael's concern about adding a new field in the hash key, as 8 bytes
is allocated for the object ID, then we can go with:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid and to the RelFileLocator's relNumber)

and avoid adding a new field in the key.

[1]: /messages/by-id/CA+TgmoZ0u6ek_xxYJaGVBk0uEvH5txoYsCwbvxKWe-2xn_G_qg@mail.gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#36Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#35)
Re: relfilenode statistics

On Tue, Sep 30, 2025 at 10:13:57AM +0000, Bertrand Drouvot wrote:

As far Michael's concern about adding a new field in the hash key, as 8 bytes
is allocated for the object ID, then we can go with:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid and to the RelFileLocator's relNumber)

and avoid adding a new field in the key.

RelFileNumber is a 4-byte Oid, so this mapping should be able to work.

Is there any reason why you would want an efficient filtering of the
contents of the shared hashtable based only on a relnumber or a
tablespace OID? Perhaps yes, like when a relfilenode is dropped into
a bin for an efficient removal from the shared hashtable so as we
don't need to do a seqscan, I just don't remember all the details of
the patch and if it could act as a bottleneck in some scenarios.
--
Michael

#37Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#36)
Re: relfilenode statistics

Hi,

On Wed, Oct 01, 2025 at 08:05:16AM +0900, Michael Paquier wrote:

On Tue, Sep 30, 2025 at 10:13:57AM +0000, Bertrand Drouvot wrote:

As far Michael's concern about adding a new field in the hash key, as 8 bytes
is allocated for the object ID, then we can go with:

dboid (linked to RelFileLocator's dbOid)
objoid (linked to RelFileLocator's spcOid and to the RelFileLocator's relNumber)

and avoid adding a new field in the key.

RelFileNumber is a 4-byte Oid, so this mapping should be able to work.

Right.

Is there any reason why you would want an efficient filtering of the
contents of the shared hashtable based only on a relnumber or a
tablespace OID?

Not that I can think of currently.

Perhaps yes, like when a relfilenode is dropped into
a bin for an efficient removal from the shared hashtable so as we
don't need to do a seqscan, I just don't remember all the details of
the patch and if it could act as a bottleneck in some scenarios.

I think the first step is to replace (i.e get rid) PGSTAT_KIND_RELATION by a brand
new PGSTAT_KIND_RELFILENODE and move all the existing stats that are currently
under the PGSTAT_KIND_RELATION to this new PGSTAT_KIND_RELFILENODE.

Let's do this by keeping the pg_stat_all_tables|indexes and pg_statio_all_tables|indexes
on top of the PGSTAT_KIND_RELFILENODE and ensure that a relation rewrite keeps
those stats. Once done, we could work from there to add new stats (add writes
counters and ensure that some counters (n_dead_tup and friends) are replicated).

Does that make sense to you?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#38Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#37)
Re: relfilenode statistics

On Wed, Oct 01, 2025 at 02:33:11PM +0000, Bertrand Drouvot wrote:

I think the first step is to replace (i.e get rid) PGSTAT_KIND_RELATION by a brand
new PGSTAT_KIND_RELFILENODE and move all the existing stats that are currently
under the PGSTAT_KIND_RELATION to this new PGSTAT_KIND_RELFILENODE.

Likely so, yes.

Let's do this by keeping the pg_stat_all_tables|indexes and pg_statio_all_tables|indexes
on top of the PGSTAT_KIND_RELFILENODE and ensure that a relation rewrite keeps
those stats. Once done, we could work from there to add new stats (add writes
counters and ensure that some counters (n_dead_tup and friends) are replicated).

Do you think it is OK to define non-transactional pending stats as
being always a subset of the transactional stats? I don't quite see
if there would be a case to have stats that are only flushed in a
non-transactional path, while being discarded at the stats report done
at transaction commit time. This means that it may be possible to
structure things so as the pending non-transaction stats structure are
always part of the transactional bits, and that the other way around
is not possible. Perhaps that influences the design choices, at least
a bit.
--
Michael

#39Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#38)
3 attachment(s)
Re: relfilenode statistics

Hi,

On Thu, Oct 02, 2025 at 10:23:11AM +0900, Michael Paquier wrote:

On Wed, Oct 01, 2025 at 02:33:11PM +0000, Bertrand Drouvot wrote:

I think the first step is to replace (i.e get rid) PGSTAT_KIND_RELATION by a brand
new PGSTAT_KIND_RELFILENODE and move all the existing stats that are currently
under the PGSTAT_KIND_RELATION to this new PGSTAT_KIND_RELFILENODE.

Likely so, yes.

PFA the new implementation. It does not introduce a new PGSTAT_KIND_RELFILENODE,
instead it keys the PGSTAT_KIND_RELATION by relfile locator. We may want to
rename PGSTAT_KIND_RELATION to PGSTAT_KIND_RELFILENODE as a next step.

The patch is structured that way:

==== 0001

Add stats tests related to rewrite

While there are existing rewrite tests, the stats behavior during rewrites
doesn't have a good coverage. This patch adds some tests to record some stats
after different rewrite scenarios.

That way, we'll be able to test that the stats are still the ones we
expect after rewrites. Note that it generates a new stats_1.out (which is quite
large), so we may want to move those new tests to "isolation" instead.

==== 0002

Key PGSTAT_KIND_RELATION by relfile locator

This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
Instead of the relation oid, it now relies on:

- dboid (linked to RelFileLocator's dbOid)
- objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
that computes an objoid based on the RelFileLocator's spcOid and the RelFileLocator's
relNumber.

That will allow us to add new stats (add writes counters) and ensure that some
counters (n_dead_tup and friends) are replicated.

The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
RelationIdGetRelation() to get the relfilelocator based on the relation oid
and 2) handle the partitioned table case.

Please note that:

- when running pg_stat_have_stats('relation',...) we now need to be connected
to the database that hosts the relation. As pg_stat_have_stats() is not
documented publicly, then the changes done in 029_stats_restart.pl look
enough.

- this patch does not handle rewrites so some tests are failing. It's only
intent is to ease the review and should not be pushed without being
merged with the following patch that handles the rewrites.

- it can be used to test that stats are incremented correctly and that we're
able to retrieve them as long as rewrites are not involved.

==== 0003

handle relation statistics correctly during rewrites

Now that PGSTAT_KIND_RELATION is keyed by refilenode, we need to handle rewrites.

To do so, this patch:

- Adds PgStat_PendingRewrite, a new struct to track rewrite operations within
a transaction, storing the old locator, new locator, and original locator (for
rewrite chains). This allows stats to be copied from the original location to
the final location at commit time.

- Adds a new function, pgstat_mark_rewrite(), called when a table rewrite begins.
It records the rewrite operation in a local list and detects rewrite chains by
checking if the old_locator matches any existing new_locator, preserving the
chain's original_locator.

- Modifies pgstat_copy_relation_stats(), to accept RelFileLocators instead of
Relations, with a new increment parameter to accumulate stats (needed for rewrite
chains with DML between rewrites).

- Ensures that AtEOXact_PgStat_Relations(), AtPrepare_PgStat_Relations(),
pgstat_twophase_postcommit()/postabort() pgstat_drop_relation() handle the
PgStat_PendingRewrite list correctly.

Note that due to the new flush call in pgstat_twophase_postcommit() we can not
call GetCurrentTransactionStopTimestamp() in pgstat_relation_flush_cb(). So,
adding a check to handle this special case and call GetCurrentTimestamp() instead.
Note that we'd call GetCurrentTimestamp() only if there is a rewrite, so that
the GetCurrentTimestamp() extra cost should be negligible. Another solution
could be to trigger the flush from FinishPreparedTransaction() but that's not
worth the extra complexity.

The new pending_rewrites list is traversed in multiple places. The overhead
should be negligible in comparison to a rewrite and the list should not contain
a lot of rewrites in practice.

Another design that I tried was to copy the stats in pgstat_mark_rewrite() but
that lead to difficulties during abort, subtransactions. It looks to me that
the list approach proposed here makes more sense.

We could also imagine adding a function similar to pg_stat_have_stats() that
would take relfile locator as arguments. That could help validate that after
a rewrite the old stats are gone.

Do you think it is OK to define non-transactional pending stats as
being always a subset of the transactional stats? I don't quite see
if there would be a case to have stats that are only flushed in a
non-transactional path, while being discarded at the stats report done
at transaction commit time. This means that it may be possible to
structure things so as the pending non-transaction stats structure are
always part of the transactional bits, and that the other way around
is not possible. Perhaps that influences the design choices, at least
a bit.

The proposed patch does not change anything it that regard.
It keeps the relation's behavior as it is.

This patch just ensure that a relation rewrite keeps its stats.

Adding new stats (add writes counters) and ensure that some counters
(n_dead_tup and friends) are replicated will be done once this one gets in.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v7-0001-Add-stats-tests-related-to-rewrite.patchtext/x-diff; charset=us-asciiDownload
From 40903079d2f011e6fb33045d7b36b1014fc7a981 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 27 Oct 2025 14:54:23 +0000
Subject: [PATCH v7 1/3] Add stats tests related to rewrite

While there are existing rewrite tests, the stats behavior during rewrites
doesn't have a good coverage. This patch adds some tests to record some stats after
different rewrite scenarios.

That is useful for a following patch where relation statistics will be keyed
by relfilenode. We'll be able to test that the stats are still the ones we
expect after rewrites.
---
 src/test/regress/expected/stats.out   |  321 ++++
 src/test/regress/expected/stats_1.out | 2255 +++++++++++++++++++++++++
 src/test/regress/sql/stats.sql        |  186 ++
 3 files changed, 2762 insertions(+)
  91.8% src/test/regress/expected/
   8.1% src/test/regress/sql/

diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 67e1860e984..06487e367d8 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1910,4 +1910,325 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
 (1 row)
 
 DROP TABLE table_fillfactor;
+-- Test some rewrites
+CREATE TABLE test_2pc_timestamp (a int) WITH (autovacuum_enabled = false);
+VACUUM ANALYZE test_2pc_timestamp;
+SELECT last_analyze AS last_vacuum_analyze FROM pg_stat_all_tables WHERE relname = 'test_2pc_timestamp' \gset
+BEGIN;
+ALTER TABLE test_2pc_timestamp ALTER COLUMN a TYPE int;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT last_analyze = :'last_vacuum_analyze'::timestamptz FROM pg_stat_all_tables WHERE relname = 'test_2pc_timestamp';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_2pc_timestamp;
+CREATE TABLE test_2pc_rewrite_alone (a int);
+INSERT INTO test_2pc_rewrite_alone VALUES (1);
+BEGIN;
+ALTER TABLE test_2pc_rewrite_alone ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_rewrite_alone';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_2pc_rewrite_alone;
+CREATE TABLE test_2pc (a int);
+INSERT INTO test_2pc VALUES (1);
+BEGIN;
+INSERT INTO test_2pc VALUES (1);
+INSERT INTO test_2pc VALUES (2);
+INSERT INTO test_2pc VALUES (3);
+ALTER TABLE test_2pc ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          4 |          0
+(1 row)
+
+DROP TABLE test_2pc;
+CREATE TABLE test_2pc_multi (a int);
+INSERT INTO test_2pc_multi VALUES (1);
+BEGIN;
+INSERT INTO test_2pc_multi VALUES (1);
+INSERT INTO test_2pc_multi VALUES (2);
+ALTER TABLE test_2pc_multi ALTER COLUMN a TYPE bigint;
+INSERT INTO test_2pc_multi VALUES (3);
+INSERT INTO test_2pc_multi VALUES (4);
+ALTER TABLE test_2pc_multi ALTER COLUMN a TYPE int;
+INSERT INTO test_2pc_multi VALUES (5);
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_multi';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          6 |          0
+(1 row)
+
+DROP TABLE test_2pc_multi;
+CREATE TABLE test_2pc_rewrite_alone_abort (a int);
+INSERT INTO test_2pc_rewrite_alone_abort VALUES (1);
+BEGIN;
+ALTER TABLE test_2pc_rewrite_alone_abort ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+ROLLBACK PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_rewrite_alone_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_2pc_rewrite_alone_abort;
+CREATE TABLE test_2pc_abort (a int);
+INSERT INTO test_2pc_abort VALUES (1);
+BEGIN;
+INSERT INTO test_2pc_abort VALUES (1);
+INSERT INTO test_2pc_abort VALUES (2);
+ALTER TABLE test_2pc_abort ALTER COLUMN a TYPE bigint;
+INSERT INTO test_2pc_abort VALUES (3);
+PREPARE TRANSACTION 'test';
+ROLLBACK PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          1 |          3
+(1 row)
+
+DROP TABLE test_2pc_abort;
+CREATE TABLE test_2pc_savepoint (a int);
+INSERT INTO test_2pc_savepoint VALUES (1);
+BEGIN;
+SAVEPOINT a;
+INSERT INTO test_2pc_savepoint VALUES (1);
+INSERT INTO test_2pc_savepoint VALUES (2);
+ALTER TABLE test_2pc_savepoint ALTER COLUMN a TYPE bigint;
+SAVEPOINT b;
+INSERT INTO test_2pc_savepoint VALUES (3);
+ALTER TABLE test_2pc_savepoint ALTER COLUMN a TYPE int;
+SAVEPOINT c;
+INSERT INTO test_2pc_savepoint VALUES (4);
+INSERT INTO test_2pc_savepoint VALUES (5);
+ROLLBACK TO SAVEPOINT b;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_savepoint';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          3 |          3
+(1 row)
+
+DROP TABLE test_2pc_savepoint;
+-- Rewrite without 2PC
+CREATE TABLE test_timestamp (a int) WITH (autovacuum_enabled = false);
+VACUUM ANALYZE test_timestamp;
+SELECT last_analyze AS last_vacuum_analyze FROM pg_stat_all_tables WHERE relname = 'test_timestamp' \gset
+ALTER TABLE test_timestamp ALTER COLUMN a TYPE bigint;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT last_analyze = :'last_vacuum_analyze'::timestamptz FROM pg_stat_all_tables WHERE relname = 'test_timestamp';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_timestamp;
+CREATE TABLE test_alone (a int);
+INSERT INTO test_alone VALUES (1);
+BEGIN;
+ALTER TABLE test_alone ALTER COLUMN a TYPE bigint;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_alone';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_alone;
+CREATE TABLE test (a int);
+INSERT INTO test VALUES (1);
+BEGIN;
+INSERT INTO test VALUES (1);
+INSERT INTO test VALUES (2);
+INSERT INTO test VALUES (3);
+ALTER TABLE test ALTER COLUMN a TYPE bigint;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          4 |          0
+(1 row)
+
+DROP TABLE test;
+CREATE TABLE test_multi (a int);
+INSERT INTO test_multi VALUES (1);
+BEGIN;
+INSERT INTO test_multi VALUES (1);
+INSERT INTO test_multi VALUES (2);
+ALTER TABLE test_multi ALTER COLUMN a TYPE bigint;
+INSERT INTO test_multi VALUES (3);
+INSERT INTO test_multi VALUES (4);
+ALTER TABLE test_multi ALTER COLUMN a TYPE int;
+INSERT INTO test_multi VALUES (5);
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_multi';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          6 |          0
+(1 row)
+
+DROP TABLE test_multi;
+CREATE TABLE test_rewrite_alone_abort (a int);
+INSERT INTO test_rewrite_alone_abort VALUES (1);
+BEGIN;
+ALTER TABLE test_rewrite_alone_abort ALTER COLUMN a TYPE bigint;
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_rewrite_alone_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_rewrite_alone_abort;
+CREATE TABLE test_abort (a int);
+INSERT INTO test_abort VALUES (1);
+BEGIN;
+INSERT INTO test_abort VALUES (1);
+INSERT INTO test_abort VALUES (2);
+ALTER TABLE test_abort ALTER COLUMN a TYPE bigint;
+INSERT INTO test_abort VALUES (3);
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          1 |          3
+(1 row)
+
+DROP TABLE test_abort;
+CREATE TABLE test_savepoint (a int);
+INSERT INTO test_savepoint VALUES (1);
+BEGIN;
+SAVEPOINT a;
+INSERT INTO test_savepoint VALUES (1);
+INSERT INTO test_savepoint VALUES (2);
+ALTER TABLE test_savepoint ALTER COLUMN a TYPE bigint;
+SAVEPOINT b;
+INSERT INTO test_savepoint VALUES (3);
+ALTER TABLE test_savepoint ALTER COLUMN a TYPE int;
+SAVEPOINT c;
+INSERT INTO test_savepoint VALUES (4);
+INSERT INTO test_savepoint VALUES (5);
+ROLLBACK TO SAVEPOINT b;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_savepoint';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          3 |          3
+(1 row)
+
+DROP TABLE test_savepoint;
+CREATE TABLE test_tbs (a int);
+INSERT INTO test_tbs VALUES (1);
+ALTER TABLE test_tbs SET TABLESPACE pg_default;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_tbs';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_tbs;
 -- End of Stats Test
diff --git a/src/test/regress/expected/stats_1.out b/src/test/regress/expected/stats_1.out
new file mode 100644
index 00000000000..629e71fce0d
--- /dev/null
+++ b/src/test/regress/expected/stats_1.out
@@ -0,0 +1,2255 @@
+--
+-- Test cumulative stats system
+--
+-- Must be run after tenk2 has been created (by create_table),
+-- populated (by create_misc) and indexed (by create_index).
+--
+-- conditio sine qua non
+SHOW track_counts;  -- must be on
+ track_counts 
+--------------
+ on
+(1 row)
+
+-- List of backend types, contexts and objects tracked in pg_stat_io.
+\a
+SELECT backend_type, object, context FROM pg_stat_io
+  ORDER BY backend_type COLLATE "C", object COLLATE "C", context COLLATE "C";
+backend_type|object|context
+autovacuum launcher|relation|bulkread
+autovacuum launcher|relation|init
+autovacuum launcher|relation|normal
+autovacuum launcher|wal|init
+autovacuum launcher|wal|normal
+autovacuum worker|relation|bulkread
+autovacuum worker|relation|init
+autovacuum worker|relation|normal
+autovacuum worker|relation|vacuum
+autovacuum worker|wal|init
+autovacuum worker|wal|normal
+background worker|relation|bulkread
+background worker|relation|bulkwrite
+background worker|relation|init
+background worker|relation|normal
+background worker|relation|vacuum
+background worker|temp relation|normal
+background worker|wal|init
+background worker|wal|normal
+background writer|relation|init
+background writer|relation|normal
+background writer|wal|init
+background writer|wal|normal
+checkpointer|relation|init
+checkpointer|relation|normal
+checkpointer|wal|init
+checkpointer|wal|normal
+client backend|relation|bulkread
+client backend|relation|bulkwrite
+client backend|relation|init
+client backend|relation|normal
+client backend|relation|vacuum
+client backend|temp relation|normal
+client backend|wal|init
+client backend|wal|normal
+io worker|relation|bulkread
+io worker|relation|bulkwrite
+io worker|relation|init
+io worker|relation|normal
+io worker|relation|vacuum
+io worker|temp relation|normal
+io worker|wal|init
+io worker|wal|normal
+slotsync worker|relation|bulkread
+slotsync worker|relation|bulkwrite
+slotsync worker|relation|init
+slotsync worker|relation|normal
+slotsync worker|relation|vacuum
+slotsync worker|temp relation|normal
+slotsync worker|wal|init
+slotsync worker|wal|normal
+standalone backend|relation|bulkread
+standalone backend|relation|bulkwrite
+standalone backend|relation|init
+standalone backend|relation|normal
+standalone backend|relation|vacuum
+standalone backend|wal|init
+standalone backend|wal|normal
+startup|relation|bulkread
+startup|relation|bulkwrite
+startup|relation|init
+startup|relation|normal
+startup|relation|vacuum
+startup|wal|init
+startup|wal|normal
+walreceiver|wal|init
+walreceiver|wal|normal
+walsender|relation|bulkread
+walsender|relation|bulkwrite
+walsender|relation|init
+walsender|relation|normal
+walsender|relation|vacuum
+walsender|temp relation|normal
+walsender|wal|init
+walsender|wal|normal
+walsummarizer|wal|init
+walsummarizer|wal|normal
+walwriter|wal|init
+walwriter|wal|normal
+(79 rows)
+\a
+-- ensure that both seqscan and indexscan plans are allowed
+SET enable_seqscan TO on;
+SET enable_indexscan TO on;
+-- for the moment, we don't want index-only scans here
+SET enable_indexonlyscan TO off;
+-- not enabled by default, but we want to test it...
+SET track_functions TO 'all';
+-- record dboid for later use
+SELECT oid AS dboid from pg_database where datname = current_database() \gset
+-- save counters
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+CREATE TABLE prevstats AS
+SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
+       (b.heap_blks_read + b.heap_blks_hit) AS heap_blks,
+       (b.idx_blks_read + b.idx_blks_hit) AS idx_blks,
+       pg_stat_get_snapshot_timestamp() as snap_ts
+  FROM pg_catalog.pg_stat_user_tables AS t,
+       pg_catalog.pg_statio_user_tables AS b
+ WHERE t.relname='tenk2' AND b.relname='tenk2';
+COMMIT;
+-- test effects of TRUNCATE on n_live_tup/n_dead_tup counters
+CREATE TABLE trunc_stats_test(id serial);
+CREATE TABLE trunc_stats_test1(id serial, stuff text);
+CREATE TABLE trunc_stats_test2(id serial);
+CREATE TABLE trunc_stats_test3(id serial, stuff text);
+CREATE TABLE trunc_stats_test4(id serial);
+-- check that n_live_tup is reset to 0 after truncate
+INSERT INTO trunc_stats_test DEFAULT VALUES;
+INSERT INTO trunc_stats_test DEFAULT VALUES;
+INSERT INTO trunc_stats_test DEFAULT VALUES;
+TRUNCATE trunc_stats_test;
+-- test involving a truncate in a transaction; 4 ins but only 1 live
+INSERT INTO trunc_stats_test1 DEFAULT VALUES;
+INSERT INTO trunc_stats_test1 DEFAULT VALUES;
+INSERT INTO trunc_stats_test1 DEFAULT VALUES;
+UPDATE trunc_stats_test1 SET id = id + 10 WHERE id IN (1, 2);
+DELETE FROM trunc_stats_test1 WHERE id = 3;
+BEGIN;
+UPDATE trunc_stats_test1 SET id = id + 100;
+TRUNCATE trunc_stats_test1;
+INSERT INTO trunc_stats_test1 DEFAULT VALUES;
+COMMIT;
+-- use a savepoint: 1 insert, 1 live
+BEGIN;
+INSERT INTO trunc_stats_test2 DEFAULT VALUES;
+INSERT INTO trunc_stats_test2 DEFAULT VALUES;
+SAVEPOINT p1;
+INSERT INTO trunc_stats_test2 DEFAULT VALUES;
+TRUNCATE trunc_stats_test2;
+INSERT INTO trunc_stats_test2 DEFAULT VALUES;
+RELEASE SAVEPOINT p1;
+COMMIT;
+-- rollback a savepoint: this should count 4 inserts and have 2
+-- live tuples after commit (and 2 dead ones due to aborted subxact)
+BEGIN;
+INSERT INTO trunc_stats_test3 DEFAULT VALUES;
+INSERT INTO trunc_stats_test3 DEFAULT VALUES;
+SAVEPOINT p1;
+INSERT INTO trunc_stats_test3 DEFAULT VALUES;
+INSERT INTO trunc_stats_test3 DEFAULT VALUES;
+TRUNCATE trunc_stats_test3;
+INSERT INTO trunc_stats_test3 DEFAULT VALUES;
+ROLLBACK TO SAVEPOINT p1;
+COMMIT;
+-- rollback a truncate: this should count 2 inserts and produce 2 dead tuples
+BEGIN;
+INSERT INTO trunc_stats_test4 DEFAULT VALUES;
+INSERT INTO trunc_stats_test4 DEFAULT VALUES;
+TRUNCATE trunc_stats_test4;
+INSERT INTO trunc_stats_test4 DEFAULT VALUES;
+ROLLBACK;
+-- do a seqscan
+SELECT count(*) FROM tenk2;
+ count 
+-------
+ 10000
+(1 row)
+
+-- do an indexscan
+-- make sure it is not a bitmap scan, which might skip fetching heap tuples
+SET enable_bitmapscan TO off;
+SELECT count(*) FROM tenk2 WHERE unique1 = 1;
+ count 
+-------
+     1
+(1 row)
+
+RESET enable_bitmapscan;
+-- ensure pending stats are flushed
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+-- check effects
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT relname, n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup
+  FROM pg_stat_user_tables
+ WHERE relname like 'trunc_stats_test%' order by relname;
+      relname      | n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup 
+-------------------+-----------+-----------+-----------+------------+------------
+ trunc_stats_test  |         3 |         0 |         0 |          0 |          0
+ trunc_stats_test1 |         4 |         2 |         1 |          1 |          0
+ trunc_stats_test2 |         1 |         0 |         0 |          1 |          0
+ trunc_stats_test3 |         4 |         0 |         0 |          2 |          2
+ trunc_stats_test4 |         2 |         0 |         0 |          0 |          2
+(5 rows)
+
+SELECT st.seq_scan >= pr.seq_scan + 1,
+       st.seq_tup_read >= pr.seq_tup_read + cl.reltuples,
+       st.idx_scan >= pr.idx_scan + 1,
+       st.idx_tup_fetch >= pr.idx_tup_fetch + 1
+  FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
+ WHERE st.relname='tenk2' AND cl.relname='tenk2';
+ ?column? | ?column? | ?column? | ?column? 
+----------+----------+----------+----------
+ t        | t        | t        | t
+(1 row)
+
+SELECT st.heap_blks_read + st.heap_blks_hit >= pr.heap_blks + cl.relpages,
+       st.idx_blks_read + st.idx_blks_hit >= pr.idx_blks + 1
+  FROM pg_statio_user_tables AS st, pg_class AS cl, prevstats AS pr
+ WHERE st.relname='tenk2' AND cl.relname='tenk2';
+ ?column? | ?column? 
+----------+----------
+ t        | t
+(1 row)
+
+SELECT pr.snap_ts < pg_stat_get_snapshot_timestamp() as snapshot_newer
+FROM prevstats AS pr;
+ snapshot_newer 
+----------------
+ t
+(1 row)
+
+COMMIT;
+----
+-- Basic tests for track_functions
+---
+CREATE FUNCTION stats_test_func1() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
+SELECT 'stats_test_func1()'::regprocedure::oid AS stats_test_func1_oid \gset
+CREATE FUNCTION stats_test_func2() RETURNS VOID LANGUAGE plpgsql AS $$BEGIN END;$$;
+SELECT 'stats_test_func2()'::regprocedure::oid AS stats_test_func2_oid \gset
+-- test that stats are accumulated
+BEGIN;
+SET LOCAL stats_fetch_consistency = none;
+SELECT pg_stat_get_function_calls(:stats_test_func1_oid);
+ pg_stat_get_function_calls 
+----------------------------
+                           
+(1 row)
+
+SELECT pg_stat_get_xact_function_calls(:stats_test_func1_oid);
+ pg_stat_get_xact_function_calls 
+---------------------------------
+                                
+(1 row)
+
+SELECT stats_test_func1();
+ stats_test_func1 
+------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_xact_function_calls(:stats_test_func1_oid);
+ pg_stat_get_xact_function_calls 
+---------------------------------
+                               1
+(1 row)
+
+SELECT stats_test_func1();
+ stats_test_func1 
+------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_xact_function_calls(:stats_test_func1_oid);
+ pg_stat_get_xact_function_calls 
+---------------------------------
+                               2
+(1 row)
+
+SELECT pg_stat_get_function_calls(:stats_test_func1_oid);
+ pg_stat_get_function_calls 
+----------------------------
+                          0
+(1 row)
+
+COMMIT;
+-- Verify that function stats are not transactional
+-- rolled back savepoint in committing transaction
+BEGIN;
+SELECT stats_test_func2();
+ stats_test_func2 
+------------------
+ 
+(1 row)
+
+SAVEPOINT foo;
+SELECT stats_test_func2();
+ stats_test_func2 
+------------------
+ 
+(1 row)
+
+ROLLBACK TO SAVEPOINT foo;
+SELECT pg_stat_get_xact_function_calls(:stats_test_func2_oid);
+ pg_stat_get_xact_function_calls 
+---------------------------------
+                               2
+(1 row)
+
+SELECT stats_test_func2();
+ stats_test_func2 
+------------------
+ 
+(1 row)
+
+COMMIT;
+-- rolled back transaction
+BEGIN;
+SELECT stats_test_func2();
+ stats_test_func2 
+------------------
+ 
+(1 row)
+
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+-- check collected stats
+SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = :stats_test_func1_oid;
+     funcname     | calls 
+------------------+-------
+ stats_test_func1 |     2
+(1 row)
+
+SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = :stats_test_func2_oid;
+     funcname     | calls 
+------------------+-------
+ stats_test_func2 |     4
+(1 row)
+
+-- check that a rolled back drop function stats leaves stats alive
+BEGIN;
+SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = :stats_test_func1_oid;
+     funcname     | calls 
+------------------+-------
+ stats_test_func1 |     2
+(1 row)
+
+DROP FUNCTION stats_test_func1();
+-- shouldn't be visible via view
+SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = :stats_test_func1_oid;
+ funcname | calls 
+----------+-------
+(0 rows)
+
+-- but still via oid access
+SELECT pg_stat_get_function_calls(:stats_test_func1_oid);
+ pg_stat_get_function_calls 
+----------------------------
+                          2
+(1 row)
+
+ROLLBACK;
+SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = :stats_test_func1_oid;
+     funcname     | calls 
+------------------+-------
+ stats_test_func1 |     2
+(1 row)
+
+SELECT pg_stat_get_function_calls(:stats_test_func1_oid);
+ pg_stat_get_function_calls 
+----------------------------
+                          2
+(1 row)
+
+-- check that function dropped in main transaction leaves no stats behind
+BEGIN;
+DROP FUNCTION stats_test_func1();
+COMMIT;
+SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = :stats_test_func1_oid;
+ funcname | calls 
+----------+-------
+(0 rows)
+
+SELECT pg_stat_get_function_calls(:stats_test_func1_oid);
+ pg_stat_get_function_calls 
+----------------------------
+                           
+(1 row)
+
+-- check that function dropped in a subtransaction leaves no stats behind
+BEGIN;
+SELECT stats_test_func2();
+ stats_test_func2 
+------------------
+ 
+(1 row)
+
+SAVEPOINT a;
+SELECT stats_test_func2();
+ stats_test_func2 
+------------------
+ 
+(1 row)
+
+SAVEPOINT b;
+DROP FUNCTION stats_test_func2();
+COMMIT;
+SELECT funcname, calls FROM pg_stat_user_functions WHERE funcid = :stats_test_func2_oid;
+ funcname | calls 
+----------+-------
+(0 rows)
+
+SELECT pg_stat_get_function_calls(:stats_test_func2_oid);
+ pg_stat_get_function_calls 
+----------------------------
+                           
+(1 row)
+
+-- Check that stats for relations are dropped. For that we need to access stats
+-- by oid after the DROP TABLE. Save oids.
+CREATE TABLE drop_stats_test();
+INSERT INTO drop_stats_test DEFAULT VALUES;
+SELECT 'drop_stats_test'::regclass::oid AS drop_stats_test_oid \gset
+CREATE TABLE drop_stats_test_xact();
+INSERT INTO drop_stats_test_xact DEFAULT VALUES;
+SELECT 'drop_stats_test_xact'::regclass::oid AS drop_stats_test_xact_oid \gset
+CREATE TABLE drop_stats_test_subxact();
+INSERT INTO drop_stats_test_subxact DEFAULT VALUES;
+SELECT 'drop_stats_test_subxact'::regclass::oid AS drop_stats_test_subxact_oid \gset
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_live_tuples(:drop_stats_test_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       1
+(1 row)
+
+DROP TABLE drop_stats_test;
+SELECT pg_stat_get_live_tuples(:drop_stats_test_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       0
+(1 row)
+
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                0
+(1 row)
+
+-- check that rollback protects against having stats dropped and that local
+-- modifications don't pose a problem
+SELECT pg_stat_get_live_tuples(:drop_stats_test_xact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       1
+(1 row)
+
+SELECT pg_stat_get_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_tuples_inserted 
+-----------------------------
+                           1
+(1 row)
+
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                0
+(1 row)
+
+BEGIN;
+INSERT INTO drop_stats_test_xact DEFAULT VALUES;
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                1
+(1 row)
+
+DROP TABLE drop_stats_test_xact;
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                0
+(1 row)
+
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_live_tuples(:drop_stats_test_xact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       1
+(1 row)
+
+SELECT pg_stat_get_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_tuples_inserted 
+-----------------------------
+                           2
+(1 row)
+
+-- transactional drop
+SELECT pg_stat_get_live_tuples(:drop_stats_test_xact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       1
+(1 row)
+
+SELECT pg_stat_get_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_tuples_inserted 
+-----------------------------
+                           2
+(1 row)
+
+BEGIN;
+INSERT INTO drop_stats_test_xact DEFAULT VALUES;
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                1
+(1 row)
+
+DROP TABLE drop_stats_test_xact;
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                0
+(1 row)
+
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_live_tuples(:drop_stats_test_xact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       0
+(1 row)
+
+SELECT pg_stat_get_tuples_inserted(:drop_stats_test_xact_oid);
+ pg_stat_get_tuples_inserted 
+-----------------------------
+                           0
+(1 row)
+
+-- savepoint rollback (2 levels)
+SELECT pg_stat_get_live_tuples(:drop_stats_test_subxact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       1
+(1 row)
+
+BEGIN;
+INSERT INTO drop_stats_test_subxact DEFAULT VALUES;
+SAVEPOINT sp1;
+INSERT INTO drop_stats_test_subxact DEFAULT VALUES;
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_subxact_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                2
+(1 row)
+
+SAVEPOINT sp2;
+DROP TABLE drop_stats_test_subxact;
+ROLLBACK TO SAVEPOINT sp2;
+SELECT pg_stat_get_xact_tuples_inserted(:drop_stats_test_subxact_oid);
+ pg_stat_get_xact_tuples_inserted 
+----------------------------------
+                                2
+(1 row)
+
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_live_tuples(:drop_stats_test_subxact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       3
+(1 row)
+
+-- savepoint rolback (1 level)
+SELECT pg_stat_get_live_tuples(:drop_stats_test_subxact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       3
+(1 row)
+
+BEGIN;
+SAVEPOINT sp1;
+DROP TABLE drop_stats_test_subxact;
+SAVEPOINT sp2;
+ROLLBACK TO SAVEPOINT sp1;
+COMMIT;
+SELECT pg_stat_get_live_tuples(:drop_stats_test_subxact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       3
+(1 row)
+
+-- and now actually drop
+SELECT pg_stat_get_live_tuples(:drop_stats_test_subxact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       3
+(1 row)
+
+BEGIN;
+SAVEPOINT sp1;
+DROP TABLE drop_stats_test_subxact;
+SAVEPOINT sp2;
+RELEASE SAVEPOINT sp1;
+COMMIT;
+SELECT pg_stat_get_live_tuples(:drop_stats_test_subxact_oid);
+ pg_stat_get_live_tuples 
+-------------------------
+                       0
+(1 row)
+
+DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
+DROP TABLE prevstats;
+-----
+-- Test that last_seq_scan, last_idx_scan are correctly maintained
+--
+-- Perform test using a temporary table. That way autovacuum etc won't
+-- interfere. To be able to check that timestamps increase, we sleep for 100ms
+-- between tests, assuming that there aren't systems with a coarser timestamp
+-- granularity.
+-----
+BEGIN;
+CREATE TEMPORARY TABLE test_last_scan(idx_col int primary key, noidx_col int);
+INSERT INTO test_last_scan(idx_col, noidx_col) VALUES(1, 1);
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT last_seq_scan, last_idx_scan FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
+ last_seq_scan | last_idx_scan 
+---------------+---------------
+               | 
+(1 row)
+
+COMMIT;
+SELECT stats_reset IS NOT NULL AS has_stats_reset
+  FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
+ has_stats_reset 
+-----------------
+ f
+(1 row)
+
+SELECT pg_stat_reset_single_table_counters('test_last_scan'::regclass);
+ pg_stat_reset_single_table_counters 
+-------------------------------------
+ 
+(1 row)
+
+SELECT seq_scan, idx_scan, stats_reset IS NOT NULL AS has_stats_reset
+  FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
+ seq_scan | idx_scan | has_stats_reset 
+----------+----------+-----------------
+        0 |        0 | t
+(1 row)
+
+-- ensure we start out with exactly one index and sequential scan
+BEGIN;
+SET LOCAL enable_seqscan TO on;
+SET LOCAL enable_indexscan TO on;
+SET LOCAL enable_bitmapscan TO off;
+EXPLAIN (COSTS off) SELECT count(*) FROM test_last_scan WHERE noidx_col = 1;
+            QUERY PLAN            
+----------------------------------
+ Aggregate
+   ->  Seq Scan on test_last_scan
+         Filter: (noidx_col = 1)
+(3 rows)
+
+SELECT count(*) FROM test_last_scan WHERE noidx_col = 1;
+ count 
+-------
+     1
+(1 row)
+
+SET LOCAL enable_seqscan TO off;
+EXPLAIN (COSTS off) SELECT count(*) FROM test_last_scan WHERE idx_col = 1;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Aggregate
+   ->  Index Scan using test_last_scan_pkey on test_last_scan
+         Index Cond: (idx_col = 1)
+(3 rows)
+
+SELECT count(*) FROM test_last_scan WHERE idx_col = 1;
+ count 
+-------
+     1
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+COMMIT;
+-- fetch timestamps from before the next test
+SELECT last_seq_scan AS test_last_seq, last_idx_scan AS test_last_idx
+FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass \gset
+SELECT pg_sleep(0.1); -- assume a minimum timestamp granularity of 100ms
+ pg_sleep 
+----------
+ 
+(1 row)
+
+-- cause one sequential scan
+BEGIN;
+SET LOCAL enable_seqscan TO on;
+SET LOCAL enable_indexscan TO off;
+SET LOCAL enable_bitmapscan TO off;
+EXPLAIN (COSTS off) SELECT count(*) FROM test_last_scan WHERE noidx_col = 1;
+            QUERY PLAN            
+----------------------------------
+ Aggregate
+   ->  Seq Scan on test_last_scan
+         Filter: (noidx_col = 1)
+(3 rows)
+
+SELECT count(*) FROM test_last_scan WHERE noidx_col = 1;
+ count 
+-------
+     1
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+COMMIT;
+-- check that just sequential scan stats were incremented
+SELECT seq_scan, :'test_last_seq' < last_seq_scan AS seq_ok, idx_scan, :'test_last_idx' = last_idx_scan AS idx_ok
+FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
+ seq_scan | seq_ok | idx_scan | idx_ok 
+----------+--------+----------+--------
+        2 | t      |        1 | t
+(1 row)
+
+-- fetch timestamps from before the next test
+SELECT last_seq_scan AS test_last_seq, last_idx_scan AS test_last_idx
+FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass \gset
+SELECT pg_sleep(0.1);
+ pg_sleep 
+----------
+ 
+(1 row)
+
+-- cause one index scan
+BEGIN;
+SET LOCAL enable_seqscan TO off;
+SET LOCAL enable_indexscan TO on;
+SET LOCAL enable_bitmapscan TO off;
+EXPLAIN (COSTS off) SELECT count(*) FROM test_last_scan WHERE idx_col = 1;
+                          QUERY PLAN                          
+--------------------------------------------------------------
+ Aggregate
+   ->  Index Scan using test_last_scan_pkey on test_last_scan
+         Index Cond: (idx_col = 1)
+(3 rows)
+
+SELECT count(*) FROM test_last_scan WHERE idx_col = 1;
+ count 
+-------
+     1
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+COMMIT;
+-- check that just index scan stats were incremented
+SELECT seq_scan, :'test_last_seq' = last_seq_scan AS seq_ok, idx_scan, :'test_last_idx' < last_idx_scan AS idx_ok
+FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
+ seq_scan | seq_ok | idx_scan | idx_ok 
+----------+--------+----------+--------
+        2 | t      |        2 | t
+(1 row)
+
+-- fetch timestamps from before the next test
+SELECT last_seq_scan AS test_last_seq, last_idx_scan AS test_last_idx
+FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass \gset
+SELECT pg_sleep(0.1);
+ pg_sleep 
+----------
+ 
+(1 row)
+
+-- cause one bitmap index scan
+BEGIN;
+SET LOCAL enable_seqscan TO off;
+SET LOCAL enable_indexscan TO off;
+SET LOCAL enable_bitmapscan TO on;
+EXPLAIN (COSTS off) SELECT count(*) FROM test_last_scan WHERE idx_col = 1;
+                      QUERY PLAN                      
+------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on test_last_scan
+         Recheck Cond: (idx_col = 1)
+         ->  Bitmap Index Scan on test_last_scan_pkey
+               Index Cond: (idx_col = 1)
+(5 rows)
+
+SELECT count(*) FROM test_last_scan WHERE idx_col = 1;
+ count 
+-------
+     1
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+COMMIT;
+-- check that just index scan stats were incremented
+SELECT seq_scan, :'test_last_seq' = last_seq_scan AS seq_ok, idx_scan, :'test_last_idx' < last_idx_scan AS idx_ok
+FROM pg_stat_all_tables WHERE relid = 'test_last_scan'::regclass;
+ seq_scan | seq_ok | idx_scan | idx_ok 
+----------+--------+----------+--------
+        2 | t      |        3 | t
+(1 row)
+
+-- check the stats in pg_stat_all_indexes
+SELECT idx_scan, :'test_last_idx' < last_idx_scan AS idx_ok,
+  stats_reset IS NOT NULL AS has_stats_reset
+  FROM pg_stat_all_indexes WHERE indexrelid = 'test_last_scan_pkey'::regclass;
+ idx_scan | idx_ok | has_stats_reset 
+----------+--------+-----------------
+        3 | t      | f
+(1 row)
+
+-- check that the stats in pg_stat_all_indexes are reset
+SELECT pg_stat_reset_single_table_counters('test_last_scan_pkey'::regclass);
+ pg_stat_reset_single_table_counters 
+-------------------------------------
+ 
+(1 row)
+
+SELECT idx_scan, stats_reset IS NOT NULL AS has_stats_reset
+  FROM pg_stat_all_indexes WHERE indexrelid = 'test_last_scan_pkey'::regclass;
+ idx_scan | has_stats_reset 
+----------+-----------------
+        0 | t
+(1 row)
+
+-----
+-- Test reset of some stats for shared table
+-----
+-- This updates the comment of the database currently in use in
+-- pg_shdescription with a fake value, then sets it back to its
+-- original value.
+SELECT shobj_description(d.oid, 'pg_database') as description_before
+  FROM pg_database d WHERE datname = current_database() \gset
+-- force some stats in pg_shdescription.
+BEGIN;
+SELECT current_database() as datname \gset
+COMMENT ON DATABASE :"datname" IS 'This is a test comment';
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+COMMIT;
+-- check that the stats are reset.
+SELECT (n_tup_ins + n_tup_upd) > 0 AS has_data FROM pg_stat_all_tables
+  WHERE relid = 'pg_shdescription'::regclass;
+ has_data 
+----------
+ t
+(1 row)
+
+SELECT pg_stat_reset_single_table_counters('pg_shdescription'::regclass);
+ pg_stat_reset_single_table_counters 
+-------------------------------------
+ 
+(1 row)
+
+SELECT (n_tup_ins + n_tup_upd) > 0 AS has_data FROM pg_stat_all_tables
+  WHERE relid = 'pg_shdescription'::regclass;
+ has_data 
+----------
+ f
+(1 row)
+
+-- set back comment
+\if :{?description_before}
+  COMMENT ON DATABASE :"datname" IS :'description_before';
+\else
+  COMMENT ON DATABASE :"datname" IS NULL;
+\endif
+-----
+-- Test that various stats views are being properly populated
+-----
+-- Test that sessions is incremented when a new session is started in pg_stat_database
+SELECT sessions AS db_stat_sessions FROM pg_stat_database WHERE datname = (SELECT current_database()) \gset
+\c
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELECT current_database());
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test pg_stat_checkpointer checkpointer-related stats, together with pg_stat_wal
+SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
+-- Test pg_stat_wal
+SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal()
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+-- Make a temp table so our temp schema exists
+CREATE TEMP TABLE test_stats_temp AS SELECT 17;
+DROP TABLE test_stats_temp;
+-- Checkpoint twice: The checkpointer reports stats after reporting completion
+-- of the checkpoint. But after a second checkpoint we'll see at least the
+-- results of the first.
+--
+-- While at it, test checkpoint options.  Note that we don't test MODE SPREAD
+-- because it would prolong the test.
+CHECKPOINT (WRONG);
+ERROR:  unrecognized CHECKPOINT option "wrong"
+LINE 1: CHECKPOINT (WRONG);
+                    ^
+CHECKPOINT (MODE WRONG);
+ERROR:  unrecognized MODE option "wrong"
+LINE 1: CHECKPOINT (MODE WRONG);
+                    ^
+CHECKPOINT (MODE FAST, FLUSH_UNLOGGED FALSE);
+CHECKPOINT (FLUSH_UNLOGGED);
+SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test pg_stat_get_backend_idset() and some allied functions.
+-- In particular, verify that their notion of backend ID matches
+-- our temp schema index.
+SELECT (current_schemas(true))[1] = ('pg_temp_' || beid::text) AS match
+FROM pg_stat_get_backend_idset() beid
+WHERE pg_stat_get_backend_pid(beid) = pg_backend_pid();
+ match 
+-------
+ t
+(1 row)
+
+-----
+-- Test that resetting stats works for reset timestamp
+-----
+-- Test that reset_slru with a specified SLRU works.
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
+SELECT stats_reset AS slru_notify_reset_ts FROM pg_stat_slru WHERE name = 'notify' \gset
+SELECT pg_stat_reset_slru('commit_timestamp');
+ pg_stat_reset_slru 
+--------------------
+ 
+(1 row)
+
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT stats_reset AS slru_commit_ts_reset_ts FROM pg_stat_slru WHERE name = 'commit_timestamp' \gset
+-- Test that multiple SLRUs are reset when no specific SLRU provided to reset function
+SELECT pg_stat_reset_slru();
+ pg_stat_reset_slru 
+--------------------
+ 
+(1 row)
+
+SELECT stats_reset > :'slru_commit_ts_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'commit_timestamp';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT stats_reset > :'slru_notify_reset_ts'::timestamptz FROM pg_stat_slru WHERE name = 'notify';
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test that reset_shared with archiver specified as the stats type works
+SELECT stats_reset AS archiver_reset_ts FROM pg_stat_archiver \gset
+SELECT pg_stat_reset_shared('archiver');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT stats_reset > :'archiver_reset_ts'::timestamptz FROM pg_stat_archiver;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test that reset_shared with bgwriter specified as the stats type works
+SELECT stats_reset AS bgwriter_reset_ts FROM pg_stat_bgwriter \gset
+SELECT pg_stat_reset_shared('bgwriter');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT stats_reset > :'bgwriter_reset_ts'::timestamptz FROM pg_stat_bgwriter;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test that reset_shared with checkpointer specified as the stats type works
+SELECT stats_reset AS checkpointer_reset_ts FROM pg_stat_checkpointer \gset
+SELECT pg_stat_reset_shared('checkpointer');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT stats_reset > :'checkpointer_reset_ts'::timestamptz FROM pg_stat_checkpointer;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test that reset_shared with recovery_prefetch specified as the stats type works
+SELECT stats_reset AS recovery_prefetch_reset_ts FROM pg_stat_recovery_prefetch \gset
+SELECT pg_stat_reset_shared('recovery_prefetch');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT stats_reset > :'recovery_prefetch_reset_ts'::timestamptz FROM pg_stat_recovery_prefetch;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test that reset_shared with slru specified as the stats type works
+SELECT max(stats_reset) AS slru_reset_ts FROM pg_stat_slru \gset
+SELECT pg_stat_reset_shared('slru');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT max(stats_reset) > :'slru_reset_ts'::timestamptz FROM pg_stat_slru;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test that reset_shared with wal specified as the stats type works
+SELECT stats_reset AS wal_reset_ts FROM pg_stat_wal \gset
+SELECT pg_stat_reset_shared('wal');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT stats_reset > :'wal_reset_ts'::timestamptz FROM pg_stat_wal;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test error case for reset_shared with unknown stats type
+SELECT pg_stat_reset_shared('unknown');
+ERROR:  unrecognized reset target: "unknown"
+HINT:  Target must be "archiver", "bgwriter", "checkpointer", "io", "recovery_prefetch", "slru", or "wal".
+-- Test that reset works for pg_stat_database
+-- Since pg_stat_database stats_reset starts out as NULL, reset it once first so we have something to compare it to
+SELECT pg_stat_reset();
+ pg_stat_reset 
+---------------
+ 
+(1 row)
+
+SELECT stats_reset AS db_reset_ts FROM pg_stat_database WHERE datname = (SELECT current_database()) \gset
+SELECT pg_stat_reset();
+ pg_stat_reset 
+---------------
+ 
+(1 row)
+
+SELECT stats_reset > :'db_reset_ts'::timestamptz FROM pg_stat_database WHERE datname = (SELECT current_database());
+ ?column? 
+----------
+ t
+(1 row)
+
+----
+-- pg_stat_get_snapshot_timestamp behavior
+----
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+-- no snapshot yet, return NULL
+SELECT pg_stat_get_snapshot_timestamp();
+ pg_stat_get_snapshot_timestamp 
+--------------------------------
+ 
+(1 row)
+
+-- any attempt at accessing stats will build snapshot
+SELECT pg_stat_get_function_calls(0);
+ pg_stat_get_function_calls 
+----------------------------
+                           
+(1 row)
+
+SELECT pg_stat_get_snapshot_timestamp() >= NOW();
+ ?column? 
+----------
+ t
+(1 row)
+
+-- shows NULL again after clearing
+SELECT pg_stat_clear_snapshot();
+ pg_stat_clear_snapshot 
+------------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_snapshot_timestamp();
+ pg_stat_get_snapshot_timestamp 
+--------------------------------
+ 
+(1 row)
+
+COMMIT;
+----
+-- Changing stats_fetch_consistency in a transaction.
+----
+BEGIN;
+-- Stats filled under the cache mode
+SET LOCAL stats_fetch_consistency = cache;
+SELECT pg_stat_get_function_calls(0);
+ pg_stat_get_function_calls 
+----------------------------
+                           
+(1 row)
+
+SELECT pg_stat_get_snapshot_timestamp() IS NOT NULL AS snapshot_ok;
+ snapshot_ok 
+-------------
+ f
+(1 row)
+
+-- Success in accessing pre-existing snapshot data.
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT pg_stat_get_snapshot_timestamp() IS NOT NULL AS snapshot_ok;
+ snapshot_ok 
+-------------
+ f
+(1 row)
+
+SELECT pg_stat_get_function_calls(0);
+ pg_stat_get_function_calls 
+----------------------------
+                           
+(1 row)
+
+SELECT pg_stat_get_snapshot_timestamp() IS NOT NULL AS snapshot_ok;
+ snapshot_ok 
+-------------
+ t
+(1 row)
+
+-- Snapshot cleared.
+SET LOCAL stats_fetch_consistency = none;
+SELECT pg_stat_get_snapshot_timestamp() IS NOT NULL AS snapshot_ok;
+ snapshot_ok 
+-------------
+ f
+(1 row)
+
+SELECT pg_stat_get_function_calls(0);
+ pg_stat_get_function_calls 
+----------------------------
+                           
+(1 row)
+
+SELECT pg_stat_get_snapshot_timestamp() IS NOT NULL AS snapshot_ok;
+ snapshot_ok 
+-------------
+ f
+(1 row)
+
+ROLLBACK;
+----
+-- pg_stat_have_stats behavior
+----
+-- fixed-numbered stats exist
+SELECT pg_stat_have_stats('bgwriter', 0, 0);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+-- unknown stats kinds error out
+SELECT pg_stat_have_stats('zaphod', 0, 0);
+ERROR:  invalid statistics kind: "zaphod"
+-- db stats have objid 0
+SELECT pg_stat_have_stats('database', :dboid, 1);
+ pg_stat_have_stats 
+--------------------
+ f
+(1 row)
+
+SELECT pg_stat_have_stats('database', :dboid, 0);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+-- pg_stat_have_stats returns true for committed index creation
+CREATE table stats_test_tab1 as select generate_series(1,10) a;
+CREATE index stats_test_idx1 on stats_test_tab1(a);
+SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
+SET enable_seqscan TO off;
+select a from stats_test_tab1 where a = 3;
+ a 
+---
+ 3
+(1 row)
+
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+-- pg_stat_have_stats returns false for dropped index with stats
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+DROP index stats_test_idx1;
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ f
+(1 row)
+
+-- pg_stat_have_stats returns false for rolled back index creation
+BEGIN;
+CREATE index stats_test_idx1 on stats_test_tab1(a);
+SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
+select a from stats_test_tab1 where a = 3;
+ a 
+---
+ 3
+(1 row)
+
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+ROLLBACK;
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ f
+(1 row)
+
+-- pg_stat_have_stats returns true for reindex CONCURRENTLY
+CREATE index stats_test_idx1 on stats_test_tab1(a);
+SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
+select a from stats_test_tab1 where a = 3;
+ a 
+---
+ 3
+(1 row)
+
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+REINDEX index CONCURRENTLY stats_test_idx1;
+-- false for previous oid
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ f
+(1 row)
+
+-- true for new oid
+SELECT 'stats_test_idx1'::regclass::oid AS stats_test_idx1_oid \gset
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+-- pg_stat_have_stats returns true for a rolled back drop index with stats
+BEGIN;
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+DROP index stats_test_idx1;
+ROLLBACK;
+SELECT pg_stat_have_stats('relation', :dboid, :stats_test_idx1_oid);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+-- put enable_seqscan back to on
+SET enable_seqscan TO on;
+-- ensure that stats accessors handle NULL input correctly
+SELECT pg_stat_get_replication_slot(NULL);
+ pg_stat_get_replication_slot 
+------------------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_subscription_stats(NULL);
+ pg_stat_get_subscription_stats 
+--------------------------------
+ 
+(1 row)
+
+-- Test that the following operations are tracked in pg_stat_io and in
+-- backend stats:
+-- - reads of target blocks into shared buffers
+-- - writes of shared buffers to permanent storage
+-- - extends of relations using shared buffers
+-- - fsyncs done to ensure the durability of data dirtying shared buffers
+-- - shared buffer hits
+-- - WAL writes and fsyncs in IOContext IOCONTEXT_NORMAL
+-- There is no test for blocks evicted from shared buffers, because we cannot
+-- be sure of the state of shared buffers at the point the test is run.
+-- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
+-- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
+SELECT sum(extends) AS io_sum_shared_before_extends
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE context = 'normal' AND object = 'wal' \gset io_sum_wal_normal_before_
+CREATE TABLE test_io_shared(a int);
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extends) AS io_sum_shared_after_extends
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
+-- and fsyncs in the global stats (usually not for the backend).
+-- See comment above for rationale for two explicit CHECKPOINTs.
+CHECKPOINT;
+CHECKPOINT;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE object = 'relation' \gset io_sum_shared_after_
+SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR :my_io_sum_shared_after_fsyncs >= :my_io_sum_shared_before_fsyncs;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_io
+  WHERE context = 'normal' AND object = 'wal' \gset io_sum_wal_normal_after_
+SELECT current_setting('synchronous_commit') = 'on';
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT :io_sum_wal_normal_after_writes > :io_sum_wal_normal_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR current_setting('wal_sync_method') IN ('open_sync', 'open_datasync')
+  OR :io_sum_wal_normal_after_fsyncs > :io_sum_wal_normal_before_fsyncs;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Change the tablespace so that the table is rewritten directly, then SELECT
+-- from it to cause it to be read back into shared buffers.
+SELECT sum(reads) AS io_sum_shared_before_reads
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+-- Do this in a transaction to prevent spurious failures due to concurrent accesses to our newly
+-- rewritten table, e.g. by autovacuum.
+BEGIN;
+ALTER TABLE test_io_shared SET TABLESPACE regress_tblspace;
+-- SELECT from the table so that the data is read into shared buffers and
+-- context 'normal', object 'relation' reads are counted.
+SELECT COUNT(*) FROM test_io_shared;
+ count 
+-------
+   100
+(1 row)
+
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(reads) AS io_sum_shared_after_reads
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'relation'  \gset
+SELECT :io_sum_shared_after_reads > :io_sum_shared_before_reads;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(hits) AS io_sum_shared_before_hits
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+-- Select from the table again to count hits.
+-- Ensure we generate hits by forcing a nested loop self-join with no
+-- materialize node. The outer side's buffer will stay pinned, preventing its
+-- eviction, while we loop through the inner side and generate hits.
+BEGIN;
+SET LOCAL enable_nestloop TO on; SET LOCAL enable_mergejoin TO off;
+SET LOCAL enable_hashjoin TO off; SET LOCAL enable_material TO off;
+-- ensure plan stays as we expect it to
+EXPLAIN (COSTS OFF) SELECT COUNT(*) FROM test_io_shared t1 INNER JOIN test_io_shared t2 USING (a);
+                QUERY PLAN                 
+-------------------------------------------
+ Aggregate
+   ->  Nested Loop
+         Join Filter: (t1.a = t2.a)
+         ->  Seq Scan on test_io_shared t1
+         ->  Seq Scan on test_io_shared t2
+(5 rows)
+
+SELECT COUNT(*) FROM test_io_shared t1 INNER JOIN test_io_shared t2 USING (a);
+ count 
+-------
+   100
+(1 row)
+
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(hits) AS io_sum_shared_after_hits
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :io_sum_shared_after_hits > :io_sum_shared_before_hits;
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_io_shared;
+-- Test that the follow IOCONTEXT_LOCAL IOOps are tracked in pg_stat_io:
+-- - eviction of local buffers in order to reuse them
+-- - reads of temporary table blocks into local buffers
+-- - writes of local buffers to permanent storage
+-- - extends of temporary tables
+-- Set temp_buffers to its minimum so that we can trigger writes with fewer
+-- inserted tuples. Do so in a new session in case temporary tables have been
+-- accessed by previous tests in this session.
+\c
+SET temp_buffers TO 100;
+CREATE TEMPORARY TABLE test_io_local(a int, b TEXT);
+SELECT sum(extends) AS extends, sum(evictions) AS evictions, sum(writes) AS writes
+  FROM pg_stat_io
+  WHERE context = 'normal' AND object = 'temp relation' \gset io_sum_local_before_
+-- Insert tuples into the temporary table, generating extends in the stats.
+-- Insert enough values that we need to reuse and write out dirty local
+-- buffers, generating evictions and writes.
+INSERT INTO test_io_local SELECT generate_series(1, 5000) as id, repeat('a', 200);
+-- Ensure the table is large enough to exceed our temp_buffers setting.
+SELECT pg_relation_size('test_io_local') / current_setting('block_size')::int8 > 100;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(reads) AS io_sum_local_before_reads
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'temp relation' \gset
+-- Read in evicted buffers, generating reads.
+SELECT COUNT(*) FROM test_io_local;
+ count 
+-------
+  5000
+(1 row)
+
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) AS evictions,
+       sum(reads) AS reads,
+       sum(writes) AS writes,
+       sum(extends) AS extends
+  FROM pg_stat_io
+  WHERE context = 'normal' AND object = 'temp relation'  \gset io_sum_local_after_
+SELECT :io_sum_local_after_evictions > :io_sum_local_before_evictions,
+       :io_sum_local_after_reads > :io_sum_local_before_reads,
+       :io_sum_local_after_writes > :io_sum_local_before_writes,
+       :io_sum_local_after_extends > :io_sum_local_before_extends;
+ ?column? | ?column? | ?column? | ?column? 
+----------+----------+----------+----------
+ t        | t        | t        | t
+(1 row)
+
+-- Change the tablespaces so that the temporary table is rewritten to other
+-- local buffers, exercising a different codepath than standard local buffer
+-- writes.
+ALTER TABLE test_io_local SET TABLESPACE regress_tblspace;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(writes) AS io_sum_local_new_tblspc_writes
+  FROM pg_stat_io WHERE context = 'normal' AND object = 'temp relation'  \gset
+SELECT :io_sum_local_new_tblspc_writes > :io_sum_local_after_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+RESET temp_buffers;
+-- Test that reuse of strategy buffers and reads of blocks into these reused
+-- buffers while VACUUMing are tracked in pg_stat_io. If there is sufficient
+-- demand for shared buffers from concurrent queries, some buffers may be
+-- pinned by other backends before they can be reused. In such cases, the
+-- backend will evict a buffer from outside the ring and add it to the
+-- ring. This is considered an eviction and not a reuse.
+-- Set wal_skip_threshold smaller than the expected size of
+-- test_io_vac_strategy so that, even if wal_level is minimal, VACUUM FULL will
+-- fsync the newly rewritten test_io_vac_strategy instead of writing it to WAL.
+-- Writing it to WAL will result in the newly written relation pages being in
+-- shared buffers -- preventing us from testing BAS_VACUUM BufferAccessStrategy
+-- reads.
+SET wal_skip_threshold = '1 kB';
+SELECT sum(reuses) AS reuses, sum(reads) AS reads, sum(evictions) AS evictions
+  FROM pg_stat_io WHERE context = 'vacuum' \gset io_sum_vac_strategy_before_
+CREATE TABLE test_io_vac_strategy(a int, b int) WITH (autovacuum_enabled = 'false');
+INSERT INTO test_io_vac_strategy SELECT i, i from generate_series(1, 4500)i;
+-- Ensure that the next VACUUM will need to perform IO by rewriting the table
+-- first with VACUUM (FULL).
+VACUUM (FULL) test_io_vac_strategy;
+-- Use the minimum BUFFER_USAGE_LIMIT to cause reuses or evictions with the
+-- smallest table possible.
+VACUUM (PARALLEL 0, BUFFER_USAGE_LIMIT 128) test_io_vac_strategy;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(reuses) AS reuses, sum(reads) AS reads, sum(evictions) AS evictions
+  FROM pg_stat_io WHERE context = 'vacuum' \gset io_sum_vac_strategy_after_
+SELECT :io_sum_vac_strategy_after_reads > :io_sum_vac_strategy_before_reads;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT (:io_sum_vac_strategy_after_reuses + :io_sum_vac_strategy_after_evictions) >
+  (:io_sum_vac_strategy_before_reuses + :io_sum_vac_strategy_before_evictions);
+ ?column? 
+----------
+ t
+(1 row)
+
+RESET wal_skip_threshold;
+-- Test that extends done by a CTAS, which uses a BAS_BULKWRITE
+-- BufferAccessStrategy, are tracked in pg_stat_io.
+SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_before
+  FROM pg_stat_io WHERE context = 'bulkwrite' \gset
+CREATE TABLE test_io_bulkwrite_strategy AS SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extends) AS io_sum_bulkwrite_strategy_extends_after
+  FROM pg_stat_io WHERE context = 'bulkwrite' \gset
+SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_extends_before;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Test IO stats reset
+SELECT pg_stat_have_stats('io', 0, 0);
+ pg_stat_have_stats 
+--------------------
+ t
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
+  FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT pg_stat_reset_shared('io');
+ pg_stat_reset_shared 
+----------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
+  FROM pg_stat_io \gset
+SELECT :io_stats_post_reset < :io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+ pg_stat_reset_backend_stats 
+-----------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Check invalid input for pg_stat_get_backend_io()
+SELECT pg_stat_get_backend_io(NULL);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+SELECT pg_stat_get_backend_io(0);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+-- Auxiliary processes return no data.
+SELECT pg_stat_get_backend_io(:checkpointer_pid);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+-- test BRIN index doesn't block HOT update
+CREATE TABLE brin_hot (
+  id  integer PRIMARY KEY,
+  val integer NOT NULL
+) WITH (autovacuum_enabled = off, fillfactor = 70);
+INSERT INTO brin_hot SELECT *, 0 FROM generate_series(1, 235);
+CREATE INDEX val_brin ON brin_hot using brin(val);
+CREATE FUNCTION wait_for_hot_stats() RETURNS void AS $$
+DECLARE
+  start_time timestamptz := clock_timestamp();
+  updated bool;
+BEGIN
+  -- we don't want to wait forever; loop will exit after 30 seconds
+  FOR i IN 1 .. 300 LOOP
+    SELECT (pg_stat_get_tuples_hot_updated('brin_hot'::regclass::oid) > 0) INTO updated;
+    EXIT WHEN updated;
+
+    -- wait a little
+    PERFORM pg_sleep_for('100 milliseconds');
+    -- reset stats snapshot so we can test again
+    PERFORM pg_stat_clear_snapshot();
+  END LOOP;
+  -- report time waited in postmaster log (where it won't change test output)
+  RAISE log 'wait_for_hot_stats delayed % seconds',
+    EXTRACT(epoch FROM clock_timestamp() - start_time);
+END
+$$ LANGUAGE plpgsql;
+UPDATE brin_hot SET val = -3 WHERE id = 42;
+-- We can't just call wait_for_hot_stats() at this point, because we only
+-- transmit stats when the session goes idle, and we probably didn't
+-- transmit the last couple of counts yet thanks to the rate-limiting logic
+-- in pgstat_report_stat().  But instead of waiting for the rate limiter's
+-- timeout to elapse, let's just start a new session.  The old one will
+-- then send its stats before dying.
+\c -
+SELECT wait_for_hot_stats();
+ wait_for_hot_stats 
+--------------------
+ 
+(1 row)
+
+SELECT pg_stat_get_tuples_hot_updated('brin_hot'::regclass::oid);
+ pg_stat_get_tuples_hot_updated 
+--------------------------------
+                              1
+(1 row)
+
+DROP TABLE brin_hot;
+DROP FUNCTION wait_for_hot_stats();
+-- Test handling of index predicates - updating attributes in precicates
+-- should not block HOT when summarizing indexes are involved. We update
+-- a row that was not indexed due to the index predicate, and becomes
+-- indexable - the HOT-updated tuple is forwarded to the BRIN index.
+CREATE TABLE brin_hot_2 (a int, b int);
+INSERT INTO brin_hot_2 VALUES (1, 100);
+CREATE INDEX ON brin_hot_2 USING brin (b) WHERE a = 2;
+UPDATE brin_hot_2 SET a = 2;
+EXPLAIN (COSTS OFF) SELECT * FROM brin_hot_2 WHERE a = 2 AND b = 100;
+            QUERY PLAN             
+-----------------------------------
+ Seq Scan on brin_hot_2
+   Filter: ((a = 2) AND (b = 100))
+(2 rows)
+
+SELECT COUNT(*) FROM brin_hot_2 WHERE a = 2 AND b = 100;
+ count 
+-------
+     1
+(1 row)
+
+SET enable_seqscan = off;
+EXPLAIN (COSTS OFF) SELECT * FROM brin_hot_2 WHERE a = 2 AND b = 100;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on brin_hot_2
+   Recheck Cond: ((b = 100) AND (a = 2))
+   ->  Bitmap Index Scan on brin_hot_2_b_idx
+         Index Cond: (b = 100)
+(4 rows)
+
+SELECT COUNT(*) FROM brin_hot_2 WHERE a = 2 AND b = 100;
+ count 
+-------
+     1
+(1 row)
+
+DROP TABLE brin_hot_2;
+-- Test that updates to indexed columns are still propagated to the
+-- BRIN column.
+-- https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
+CREATE TABLE brin_hot_3 (a int, filler text) WITH (fillfactor = 10);
+INSERT INTO brin_hot_3 SELECT 1, repeat(' ', 500) FROM generate_series(1, 20);
+CREATE INDEX ON brin_hot_3 USING brin (a) WITH (pages_per_range = 1);
+UPDATE brin_hot_3 SET a = 2;
+EXPLAIN (COSTS OFF) SELECT * FROM brin_hot_3 WHERE a = 2;
+                 QUERY PLAN                  
+---------------------------------------------
+ Bitmap Heap Scan on brin_hot_3
+   Recheck Cond: (a = 2)
+   ->  Bitmap Index Scan on brin_hot_3_a_idx
+         Index Cond: (a = 2)
+(4 rows)
+
+SELECT COUNT(*) FROM brin_hot_3 WHERE a = 2;
+ count 
+-------
+    20
+(1 row)
+
+DROP TABLE brin_hot_3;
+SET enable_seqscan = on;
+-- Test that estimation of relation size works with tuples wider than the
+-- relation fillfactor. We create a table with wide inline attributes and
+-- low fillfactor, insert rows and then see how many rows EXPLAIN shows
+-- before running analyze. We disable autovacuum so that it does not
+-- interfere with the test.
+CREATE TABLE table_fillfactor (
+  n char(1000)
+) with (fillfactor=10, autovacuum_enabled=off);
+INSERT INTO table_fillfactor
+SELECT 'x' FROM generate_series(1,1000);
+SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
+ estimated | actual 
+-----------+--------
+      1000 |   1000
+(1 row)
+
+DROP TABLE table_fillfactor;
+-- Test some rewrites
+CREATE TABLE test_2pc_timestamp (a int) WITH (autovacuum_enabled = false);
+VACUUM ANALYZE test_2pc_timestamp;
+SELECT last_analyze AS last_vacuum_analyze FROM pg_stat_all_tables WHERE relname = 'test_2pc_timestamp' \gset
+BEGIN;
+ALTER TABLE test_2pc_timestamp ALTER COLUMN a TYPE int;
+PREPARE TRANSACTION 'test';
+ERROR:  prepared transactions are disabled
+HINT:  Set "max_prepared_transactions" to a nonzero value.
+COMMIT PREPARED 'test';
+ERROR:  prepared transaction with identifier "test" does not exist
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT last_analyze = :'last_vacuum_analyze'::timestamptz FROM pg_stat_all_tables WHERE relname = 'test_2pc_timestamp';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_2pc_timestamp;
+CREATE TABLE test_2pc_rewrite_alone (a int);
+INSERT INTO test_2pc_rewrite_alone VALUES (1);
+BEGIN;
+ALTER TABLE test_2pc_rewrite_alone ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+ERROR:  prepared transactions are disabled
+HINT:  Set "max_prepared_transactions" to a nonzero value.
+COMMIT PREPARED 'test';
+ERROR:  prepared transaction with identifier "test" does not exist
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_rewrite_alone';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_2pc_rewrite_alone;
+CREATE TABLE test_2pc (a int);
+INSERT INTO test_2pc VALUES (1);
+BEGIN;
+INSERT INTO test_2pc VALUES (1);
+INSERT INTO test_2pc VALUES (2);
+INSERT INTO test_2pc VALUES (3);
+ALTER TABLE test_2pc ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+ERROR:  prepared transactions are disabled
+HINT:  Set "max_prepared_transactions" to a nonzero value.
+COMMIT PREPARED 'test';
+ERROR:  prepared transaction with identifier "test" does not exist
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          1 |          3
+(1 row)
+
+DROP TABLE test_2pc;
+CREATE TABLE test_2pc_multi (a int);
+INSERT INTO test_2pc_multi VALUES (1);
+BEGIN;
+INSERT INTO test_2pc_multi VALUES (1);
+INSERT INTO test_2pc_multi VALUES (2);
+ALTER TABLE test_2pc_multi ALTER COLUMN a TYPE bigint;
+INSERT INTO test_2pc_multi VALUES (3);
+INSERT INTO test_2pc_multi VALUES (4);
+ALTER TABLE test_2pc_multi ALTER COLUMN a TYPE int;
+INSERT INTO test_2pc_multi VALUES (5);
+PREPARE TRANSACTION 'test';
+ERROR:  prepared transactions are disabled
+HINT:  Set "max_prepared_transactions" to a nonzero value.
+COMMIT PREPARED 'test';
+ERROR:  prepared transaction with identifier "test" does not exist
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_multi';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          1 |          5
+(1 row)
+
+DROP TABLE test_2pc_multi;
+CREATE TABLE test_2pc_rewrite_alone_abort (a int);
+INSERT INTO test_2pc_rewrite_alone_abort VALUES (1);
+BEGIN;
+ALTER TABLE test_2pc_rewrite_alone_abort ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+ERROR:  prepared transactions are disabled
+HINT:  Set "max_prepared_transactions" to a nonzero value.
+ROLLBACK PREPARED 'test';
+ERROR:  prepared transaction with identifier "test" does not exist
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_rewrite_alone_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_2pc_rewrite_alone_abort;
+CREATE TABLE test_2pc_abort (a int);
+INSERT INTO test_2pc_abort VALUES (1);
+BEGIN;
+INSERT INTO test_2pc_abort VALUES (1);
+INSERT INTO test_2pc_abort VALUES (2);
+ALTER TABLE test_2pc_abort ALTER COLUMN a TYPE bigint;
+INSERT INTO test_2pc_abort VALUES (3);
+PREPARE TRANSACTION 'test';
+ERROR:  prepared transactions are disabled
+HINT:  Set "max_prepared_transactions" to a nonzero value.
+ROLLBACK PREPARED 'test';
+ERROR:  prepared transaction with identifier "test" does not exist
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          1 |          3
+(1 row)
+
+DROP TABLE test_2pc_abort;
+CREATE TABLE test_2pc_savepoint (a int);
+INSERT INTO test_2pc_savepoint VALUES (1);
+BEGIN;
+SAVEPOINT a;
+INSERT INTO test_2pc_savepoint VALUES (1);
+INSERT INTO test_2pc_savepoint VALUES (2);
+ALTER TABLE test_2pc_savepoint ALTER COLUMN a TYPE bigint;
+SAVEPOINT b;
+INSERT INTO test_2pc_savepoint VALUES (3);
+ALTER TABLE test_2pc_savepoint ALTER COLUMN a TYPE int;
+SAVEPOINT c;
+INSERT INTO test_2pc_savepoint VALUES (4);
+INSERT INTO test_2pc_savepoint VALUES (5);
+ROLLBACK TO SAVEPOINT b;
+PREPARE TRANSACTION 'test';
+ERROR:  prepared transactions are disabled
+HINT:  Set "max_prepared_transactions" to a nonzero value.
+COMMIT PREPARED 'test';
+ERROR:  prepared transaction with identifier "test" does not exist
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_savepoint';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          1 |          5
+(1 row)
+
+DROP TABLE test_2pc_savepoint;
+-- Rewrite without 2PC
+CREATE TABLE test_timestamp (a int) WITH (autovacuum_enabled = false);
+VACUUM ANALYZE test_timestamp;
+SELECT last_analyze AS last_vacuum_analyze FROM pg_stat_all_tables WHERE relname = 'test_timestamp' \gset
+ALTER TABLE test_timestamp ALTER COLUMN a TYPE bigint;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT last_analyze = :'last_vacuum_analyze'::timestamptz FROM pg_stat_all_tables WHERE relname = 'test_timestamp';
+ ?column? 
+----------
+ t
+(1 row)
+
+DROP TABLE test_timestamp;
+CREATE TABLE test_alone (a int);
+INSERT INTO test_alone VALUES (1);
+BEGIN;
+ALTER TABLE test_alone ALTER COLUMN a TYPE bigint;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_alone';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_alone;
+CREATE TABLE test (a int);
+INSERT INTO test VALUES (1);
+BEGIN;
+INSERT INTO test VALUES (1);
+INSERT INTO test VALUES (2);
+INSERT INTO test VALUES (3);
+ALTER TABLE test ALTER COLUMN a TYPE bigint;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          4 |          0
+(1 row)
+
+DROP TABLE test;
+CREATE TABLE test_multi (a int);
+INSERT INTO test_multi VALUES (1);
+BEGIN;
+INSERT INTO test_multi VALUES (1);
+INSERT INTO test_multi VALUES (2);
+ALTER TABLE test_multi ALTER COLUMN a TYPE bigint;
+INSERT INTO test_multi VALUES (3);
+INSERT INTO test_multi VALUES (4);
+ALTER TABLE test_multi ALTER COLUMN a TYPE int;
+INSERT INTO test_multi VALUES (5);
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_multi';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          6 |          0
+(1 row)
+
+DROP TABLE test_multi;
+CREATE TABLE test_rewrite_alone_abort (a int);
+INSERT INTO test_rewrite_alone_abort VALUES (1);
+BEGIN;
+ALTER TABLE test_rewrite_alone_abort ALTER COLUMN a TYPE bigint;
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_rewrite_alone_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_rewrite_alone_abort;
+CREATE TABLE test_abort (a int);
+INSERT INTO test_abort VALUES (1);
+BEGIN;
+INSERT INTO test_abort VALUES (1);
+INSERT INTO test_abort VALUES (2);
+ALTER TABLE test_abort ALTER COLUMN a TYPE bigint;
+INSERT INTO test_abort VALUES (3);
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_abort';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         4 |          1 |          3
+(1 row)
+
+DROP TABLE test_abort;
+CREATE TABLE test_savepoint (a int);
+INSERT INTO test_savepoint VALUES (1);
+BEGIN;
+SAVEPOINT a;
+INSERT INTO test_savepoint VALUES (1);
+INSERT INTO test_savepoint VALUES (2);
+ALTER TABLE test_savepoint ALTER COLUMN a TYPE bigint;
+SAVEPOINT b;
+INSERT INTO test_savepoint VALUES (3);
+ALTER TABLE test_savepoint ALTER COLUMN a TYPE int;
+SAVEPOINT c;
+INSERT INTO test_savepoint VALUES (4);
+INSERT INTO test_savepoint VALUES (5);
+ROLLBACK TO SAVEPOINT b;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_savepoint';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         6 |          3 |          3
+(1 row)
+
+DROP TABLE test_savepoint;
+CREATE TABLE test_tbs (a int);
+INSERT INTO test_tbs VALUES (1);
+ALTER TABLE test_tbs SET TABLESPACE pg_default;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_tbs';
+ n_tup_ins | n_live_tup | n_dead_tup 
+-----------+------------+------------
+         1 |          1 |          0
+(1 row)
+
+DROP TABLE test_tbs;
+-- End of Stats Test
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 8768e0f27fd..4130f9254a5 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -944,4 +944,190 @@ SELECT * FROM check_estimated_rows('SELECT * FROM table_fillfactor');
 
 DROP TABLE table_fillfactor;
 
+-- Test some rewrites
+CREATE TABLE test_2pc_timestamp (a int) WITH (autovacuum_enabled = false);
+VACUUM ANALYZE test_2pc_timestamp;
+SELECT last_analyze AS last_vacuum_analyze FROM pg_stat_all_tables WHERE relname = 'test_2pc_timestamp' \gset
+BEGIN;
+ALTER TABLE test_2pc_timestamp ALTER COLUMN a TYPE int;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+SELECT last_analyze = :'last_vacuum_analyze'::timestamptz FROM pg_stat_all_tables WHERE relname = 'test_2pc_timestamp';
+DROP TABLE test_2pc_timestamp;
+
+CREATE TABLE test_2pc_rewrite_alone (a int);
+INSERT INTO test_2pc_rewrite_alone VALUES (1);
+BEGIN;
+ALTER TABLE test_2pc_rewrite_alone ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_rewrite_alone';
+DROP TABLE test_2pc_rewrite_alone;
+
+CREATE TABLE test_2pc (a int);
+INSERT INTO test_2pc VALUES (1);
+BEGIN;
+INSERT INTO test_2pc VALUES (1);
+INSERT INTO test_2pc VALUES (2);
+INSERT INTO test_2pc VALUES (3);
+ALTER TABLE test_2pc ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc';
+DROP TABLE test_2pc;
+
+CREATE TABLE test_2pc_multi (a int);
+INSERT INTO test_2pc_multi VALUES (1);
+BEGIN;
+INSERT INTO test_2pc_multi VALUES (1);
+INSERT INTO test_2pc_multi VALUES (2);
+ALTER TABLE test_2pc_multi ALTER COLUMN a TYPE bigint;
+INSERT INTO test_2pc_multi VALUES (3);
+INSERT INTO test_2pc_multi VALUES (4);
+ALTER TABLE test_2pc_multi ALTER COLUMN a TYPE int;
+INSERT INTO test_2pc_multi VALUES (5);
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_multi';
+DROP TABLE test_2pc_multi;
+
+CREATE TABLE test_2pc_rewrite_alone_abort (a int);
+INSERT INTO test_2pc_rewrite_alone_abort VALUES (1);
+BEGIN;
+ALTER TABLE test_2pc_rewrite_alone_abort ALTER COLUMN a TYPE bigint;
+PREPARE TRANSACTION 'test';
+ROLLBACK PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_rewrite_alone_abort';
+DROP TABLE test_2pc_rewrite_alone_abort;
+
+CREATE TABLE test_2pc_abort (a int);
+INSERT INTO test_2pc_abort VALUES (1);
+BEGIN;
+INSERT INTO test_2pc_abort VALUES (1);
+INSERT INTO test_2pc_abort VALUES (2);
+ALTER TABLE test_2pc_abort ALTER COLUMN a TYPE bigint;
+INSERT INTO test_2pc_abort VALUES (3);
+PREPARE TRANSACTION 'test';
+ROLLBACK PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_abort';
+DROP TABLE test_2pc_abort;
+
+CREATE TABLE test_2pc_savepoint (a int);
+INSERT INTO test_2pc_savepoint VALUES (1);
+BEGIN;
+SAVEPOINT a;
+INSERT INTO test_2pc_savepoint VALUES (1);
+INSERT INTO test_2pc_savepoint VALUES (2);
+ALTER TABLE test_2pc_savepoint ALTER COLUMN a TYPE bigint;
+SAVEPOINT b;
+INSERT INTO test_2pc_savepoint VALUES (3);
+ALTER TABLE test_2pc_savepoint ALTER COLUMN a TYPE int;
+SAVEPOINT c;
+INSERT INTO test_2pc_savepoint VALUES (4);
+INSERT INTO test_2pc_savepoint VALUES (5);
+ROLLBACK TO SAVEPOINT b;
+PREPARE TRANSACTION 'test';
+COMMIT PREPARED 'test';
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_2pc_savepoint';
+DROP TABLE test_2pc_savepoint;
+
+-- Rewrite without 2PC
+CREATE TABLE test_timestamp (a int) WITH (autovacuum_enabled = false);
+VACUUM ANALYZE test_timestamp;
+SELECT last_analyze AS last_vacuum_analyze FROM pg_stat_all_tables WHERE relname = 'test_timestamp' \gset
+ALTER TABLE test_timestamp ALTER COLUMN a TYPE bigint;
+SELECT pg_stat_force_next_flush();
+SELECT last_analyze = :'last_vacuum_analyze'::timestamptz FROM pg_stat_all_tables WHERE relname = 'test_timestamp';
+DROP TABLE test_timestamp;
+
+CREATE TABLE test_alone (a int);
+INSERT INTO test_alone VALUES (1);
+BEGIN;
+ALTER TABLE test_alone ALTER COLUMN a TYPE bigint;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_alone';
+DROP TABLE test_alone;
+
+CREATE TABLE test (a int);
+INSERT INTO test VALUES (1);
+BEGIN;
+INSERT INTO test VALUES (1);
+INSERT INTO test VALUES (2);
+INSERT INTO test VALUES (3);
+ALTER TABLE test ALTER COLUMN a TYPE bigint;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test';
+DROP TABLE test;
+
+CREATE TABLE test_multi (a int);
+INSERT INTO test_multi VALUES (1);
+BEGIN;
+INSERT INTO test_multi VALUES (1);
+INSERT INTO test_multi VALUES (2);
+ALTER TABLE test_multi ALTER COLUMN a TYPE bigint;
+INSERT INTO test_multi VALUES (3);
+INSERT INTO test_multi VALUES (4);
+ALTER TABLE test_multi ALTER COLUMN a TYPE int;
+INSERT INTO test_multi VALUES (5);
+COMMIT;
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_multi';
+DROP TABLE test_multi;
+
+CREATE TABLE test_rewrite_alone_abort (a int);
+INSERT INTO test_rewrite_alone_abort VALUES (1);
+BEGIN;
+ALTER TABLE test_rewrite_alone_abort ALTER COLUMN a TYPE bigint;
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_rewrite_alone_abort';
+DROP TABLE test_rewrite_alone_abort;
+
+CREATE TABLE test_abort (a int);
+INSERT INTO test_abort VALUES (1);
+BEGIN;
+INSERT INTO test_abort VALUES (1);
+INSERT INTO test_abort VALUES (2);
+ALTER TABLE test_abort ALTER COLUMN a TYPE bigint;
+INSERT INTO test_abort VALUES (3);
+ROLLBACK;
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_abort';
+DROP TABLE test_abort;
+
+CREATE TABLE test_savepoint (a int);
+INSERT INTO test_savepoint VALUES (1);
+BEGIN;
+SAVEPOINT a;
+INSERT INTO test_savepoint VALUES (1);
+INSERT INTO test_savepoint VALUES (2);
+ALTER TABLE test_savepoint ALTER COLUMN a TYPE bigint;
+SAVEPOINT b;
+INSERT INTO test_savepoint VALUES (3);
+ALTER TABLE test_savepoint ALTER COLUMN a TYPE int;
+SAVEPOINT c;
+INSERT INTO test_savepoint VALUES (4);
+INSERT INTO test_savepoint VALUES (5);
+ROLLBACK TO SAVEPOINT b;
+COMMIT;
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_savepoint';
+DROP TABLE test_savepoint;
+
+CREATE TABLE test_tbs (a int);
+INSERT INTO test_tbs VALUES (1);
+ALTER TABLE test_tbs SET TABLESPACE pg_default;
+SELECT pg_stat_force_next_flush();
+SELECT n_tup_ins, n_live_tup, n_dead_tup FROM pg_stat_all_tables WHERE relname = 'test_tbs';
+DROP TABLE test_tbs;
+
 -- End of Stats Test
-- 
2.34.1

v7-0002-Key-PGSTAT_KIND_RELATION-by-relfile-locator.patchtext/x-diff; charset=us-asciiDownload
From 2cc88c7d89f95a7f72476593e861721236d12354 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 1 Oct 2025 09:45:26 +0000
Subject: [PATCH v7 2/3] Key PGSTAT_KIND_RELATION by relfile locator

This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
Instead of the relation oid, it now relies on:

- dboid (linked to RelFileLocator's dbOid)
- objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
that computes an objoid based on the RelFileLocator's spcOid and the
RelFileLocator's relNumber.

That will allow us to add new stats (add writes counters) and ensure that some
counters (n_dead_tup and friends) are replicated.

The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
RelationIdGetRelation() to get the relfilelocator based on the relation oid
and 2) handle the partitioned table case.

Please note that:

- when running pg_stat_have_stats('relation',...) we now need to be connected
to the database that hosts the relation. As pg_stat_have_stats() is not
documented publicly, then the changes done in 029_stats_restart.pl look
enough.

- this patch does not handle rewrites so some tests are failing. It's only
intent is to ease the review and should not be pushed without being
merged with the following patch that handles the rewrites.

- it can be used to test that stats are incremented correctly and that we're
able to retrieve them as long as rewrites are not involved.
---
 src/backend/access/heap/vacuumlazy.c         |   3 +-
 src/backend/postmaster/autovacuum.c          |   9 +-
 src/backend/utils/activity/pgstat_relation.c | 234 +++++++++++++++----
 src/backend/utils/adt/pgstatfuncs.c          |  22 +-
 src/include/pgstat.h                         |  18 +-
 src/include/utils/pgstat_internal.h          |   1 +
 src/test/recovery/t/029_stats_restart.pl     |  40 ++--
 7 files changed, 249 insertions(+), 78 deletions(-)
   3.3% src/backend/postmaster/
  64.6% src/backend/utils/activity/
   5.3% src/backend/utils/adt/
   7.0% src/include/
  18.7% src/test/recovery/t/

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 61fe623cc60..073f1ff3fe4 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -947,8 +947,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
 	 * soon in cases where the failsafe prevented significant amounts of heap
 	 * vacuuming.
 	 */
-	pgstat_report_vacuum(RelationGetRelid(rel),
-						 rel->rd_rel->relisshared,
+	pgstat_report_vacuum(rel->rd_locator,
 						 Max(vacrel->new_live_tuples, 0),
 						 vacrel->recently_dead_tuples +
 						 vacrel->missed_dead_tuples,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index ed19c74bb19..662495c72fc 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2048,8 +2048,7 @@ do_autovacuum(void)
 
 		/* Fetch reloptions and the pgstat entry for this table */
 		relopts = extract_autovac_opts(tuple, pg_class_desc);
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 		/* Check if it needs vacuum or analyze */
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
@@ -2141,8 +2140,7 @@ do_autovacuum(void)
 		}
 
 		/* Fetch the pgstat entry for this table */
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
 								  effective_multixact_freeze_max_age,
@@ -2939,8 +2937,7 @@ recheck_relation_needs_vacanalyze(Oid relid,
 	PgStat_StatTabEntry *tabentry;
 
 	/* fetch the pgstat table entry */
-	tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-											  relid);
+	tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
 							  effective_multixact_freeze_max_age,
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 1de477cbeeb..7debb14bb5d 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -17,12 +17,17 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_tablespace.h"
+#include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 #include "utils/rel.h"
+#include "utils/relmapper.h"
+#include "utils/syscache.h"
 #include "utils/timestamp.h"
 
 
@@ -36,13 +41,12 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter inserted_pre_truncdrop;
 	PgStat_Counter updated_pre_truncdrop;
 	PgStat_Counter deleted_pre_truncdrop;
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
 } TwoPhasePgStatRecord;
 
 
-static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+static PgStat_TableStatus *pgstat_prep_relation_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -60,8 +64,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(src->rd_rel->relisshared,
-											  RelationGetRelid(src));
+	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
 	if (!srcstats)
 		return;
 
@@ -94,8 +97,10 @@ pgstat_init_relation(Relation rel)
 
 	/*
 	 * We only count stats for relations with storage and partitioned tables
+	 * and we don't count stats generated during a rewrite.
 	 */
-	if (!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE)
+	if ((!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE) ||
+		OidIsValid(rel->rd_rel->relrewrite))
 	{
 		rel->pgstat_enabled = false;
 		rel->pgstat_info = NULL;
@@ -130,12 +135,38 @@ pgstat_init_relation(Relation rel)
 void
 pgstat_assoc_relation(Relation rel)
 {
+	RelFileLocator locator;
+
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
 
+	/*
+	 * Don't associate stats for relations without storage and non partitioned
+	 * tables.
+	 */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind) &&
+		rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		return;
+
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use the relation OID as relNumber.
+		 * No collision with regular relations is possible because relNumbers
+		 * are also assigned from the pg_class OID space (see
+		 * GetNewRelFileNumber()), making each value unique across the
+		 * database regardless of spcOid.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = InvalidOid;
+		locator.relNumber = rel->rd_id;
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(RelationGetRelid(rel),
-													rel->rd_rel->relisshared);
+	rel->pgstat_info = pgstat_prep_relation_pending(locator);
 
 	/* don't allow link a stats to multiple relcache entries */
 	Assert(rel->pgstat_info->relation == NULL);
@@ -167,9 +198,13 @@ pgstat_unlink_relation(Relation rel)
 void
 pgstat_create_relation(Relation rel)
 {
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
-								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								rel->rd_locator.dbOid,
+								RelFileLocatorToPgStatObjid(rel->rd_locator));
 }
 
 /*
@@ -181,9 +216,13 @@ pgstat_drop_relation(Relation rel)
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
 
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  rel->rd_locator.dbOid,
+							  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -207,14 +246,12 @@ pgstat_drop_relation(Relation rel)
  * Report that the table was just vacuumed and flush IO statistics.
  */
 void
-pgstat_report_vacuum(Oid tableoid, bool shared,
-					 PgStat_Counter livetuples, PgStat_Counter deadtuples,
-					 TimestampTz starttime)
+pgstat_report_vacuum(RelFileLocator locator, PgStat_Counter livetuples,
+					 PgStat_Counter deadtuples, TimestampTz starttime)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
 
@@ -227,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											locator.dbOid, RelFileLocatorToPgStatObjid(locator), false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -286,9 +323,9 @@ pgstat_report_analyze(Relation rel,
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
+	RelFileLocator locator;
 
 	if (!pgstat_track_counts)
 		return;
@@ -326,9 +363,26 @@ pgstat_report_analyze(Relation rel,
 	ts = GetCurrentTimestamp();
 	elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
 
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use the relation OID as relNumber.
+		 * No collision with regular relations is possible because relNumbers
+		 * are also assigned from the pg_class OID space (see
+		 * GetNewRelFileNumber()), making each value unique across the
+		 * database regardless of spcOid.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = InvalidOid;
+		locator.relNumber = rel->rd_id;
+	}
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
-											RelationGetRelid(rel),
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
+											locator.dbOid,
+											RelFileLocatorToPgStatObjid(locator),
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -469,7 +523,7 @@ pgstat_update_heap_dead_tuples(Relation rel, int delta)
 PgStat_StatTabEntry *
 pgstat_fetch_stat_tabentry(Oid relid)
 {
-	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
+	return pgstat_fetch_stat_tabentry_ext(relid);
 }
 
 /*
@@ -477,12 +531,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
  * whether the to-be-accessed table is a shared relation or not.
  */
 PgStat_StatTabEntry *
-pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
+pgstat_fetch_stat_tabentry_ext(Oid reloid)
 {
-	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
+	PgStat_StatTabEntry *tabentry;
+	RelFileLocator locator;
+
+	if (!pgstat_reloid_to_relfilelocator(reloid, &locator))
+		return NULL;
 
-	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+	/* fetch the stats entry using the relfilenode based key */
+	tabentry = (PgStat_StatTabEntry *) pgstat_fetch_entry(PGSTAT_KIND_RELATION,
+														  locator.dbOid,
+														  RelFileLocatorToPgStatObjid(locator));
+	return tabentry;
 }
 
 /*
@@ -504,14 +565,17 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableXactStatus *trans;
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
+	RelFileLocator locator;
+
+	if (!pgstat_reloid_to_relfilelocator(rel_id, &locator))
+		return NULL;
+
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+										   locator.dbOid,
+										   RelFileLocatorToPgStatObjid(locator));
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
 	if (!entry_ref)
-	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
-		if (!entry_ref)
-			return tablestatus;
-	}
+		return tablestatus;
 
 	tabentry = (PgStat_TableStatus *) entry_ref->pending;
 	tablestatus = palloc(sizeof(PgStat_TableStatus));
@@ -707,8 +771,12 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		record.inserted_pre_truncdrop = trans->inserted_pre_truncdrop;
 		record.updated_pre_truncdrop = trans->updated_pre_truncdrop;
 		record.deleted_pre_truncdrop = trans->deleted_pre_truncdrop;
-		record.id = tabstat->id;
-		record.shared = tabstat->shared;
+
+		if (tabstat->relation != NULL)
+			record.locator = tabstat->relation->rd_locator;
+		else
+			record.locator = tabstat->locator;
+
 		record.truncdropped = trans->truncdropped;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
@@ -751,7 +819,7 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	PgStat_TableStatus *pgstat_info;
 
 	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, commit case */
 	pgstat_info->counts.tuples_inserted += rec->tuples_inserted;
@@ -786,8 +854,8 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
 
-	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	/* Find or create a tabstat entry for the target locator */
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -921,17 +989,21 @@ pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
  * initialized if not exists.
  */
 static PgStat_TableStatus *
-pgstat_prep_relation_pending(Oid rel_id, bool isshared)
+pgstat_prep_relation_pending(RelFileLocator locator)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStat_TableStatus *pending;
+	uint64		objid;
+
+	objid = RelFileLocatorToPgStatObjid(locator);
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
-										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  locator.dbOid,
+										  objid, NULL);
+
 	pending = entry_ref->pending;
-	pending->id = rel_id;
-	pending->shared = isshared;
+	pending->id = objid;
+	pending->locator = locator;
 
 	return pending;
 }
@@ -1010,3 +1082,83 @@ restore_truncdrop_counters(PgStat_TableXactStatus *trans)
 		trans->tuples_deleted = trans->deleted_pre_truncdrop;
 	}
 }
+
+/*
+ * Convert a relation OID to its corresponding RelFileLocator for statistics
+ * tracking purposes.
+ *
+ * Returns true on success, false if the relation doesn't need statistics
+ * tracking.
+ *
+ * For partitioned tables, constructs a synthetic locator using the relation
+ * OID as relNumber, since they don't have storage.
+ */
+bool
+pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
+{
+	HeapTuple	tuple;
+	Form_pg_class relform;
+	bool		result = true;
+
+	/* get the relation's tuple from pg_class */
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(reloid));
+
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	relform = (Form_pg_class) GETSTRUCT(tuple);
+
+	/* skip relations without storage and non partitioned tables */
+	if (!RELKIND_HAS_STORAGE(relform->relkind) &&
+		relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		ReleaseSysCache(tuple);
+		return false;
+	}
+
+	if (relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		/* build the RelFileLocator */
+		locator->relNumber = relform->relfilenode;
+		locator->spcOid = relform->reltablespace;
+
+		/* handle default tablespace */
+		if (!OidIsValid(locator->spcOid))
+			locator->spcOid = MyDatabaseTableSpace;
+
+		/* handle dbOid for global vs local relations */
+		if (locator->spcOid == GLOBALTABLESPACE_OID)
+			locator->dbOid = InvalidOid;
+		else
+			locator->dbOid = MyDatabaseId;
+
+		/* handle mapped relations */
+		if (!RelFileNumberIsValid(locator->relNumber))
+		{
+			locator->relNumber = RelationMapOidToFilenumber(reloid,
+															relform->relisshared);
+			if (!RelFileNumberIsValid(locator->relNumber))
+			{
+				ReleaseSysCache(tuple);
+				return false;
+			}
+		}
+	}
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use the relation OID as relNumber.
+		 * No collision with regular relations is possible because relNumbers
+		 * are also assigned from the pg_class OID space (see
+		 * GetNewRelFileNumber()), making each value unique across the
+		 * database regardless of spcOid.
+		 */
+		locator->dbOid = (relform->relisshared ? InvalidOid : MyDatabaseId);
+		locator->spcOid = InvalidOid;
+		locator->relNumber = relform->oid;
+	}
+
+	ReleaseSysCache(tuple);
+	return result;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a710508979e..7924cdf5b97 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -23,13 +23,13 @@
 #include "common/ip.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "replication/logicallauncher.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/pgstat_internal.h"
 #include "utils/timestamp.h"
 
 #define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
@@ -1949,9 +1949,14 @@ Datum
 pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 {
 	Oid			taboid = PG_GETARG_OID(0);
-	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
+	RelFileLocator locator;
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	/* Get the stats locator from the relation OID */
+	if (!pgstat_reloid_to_relfilelocator(taboid, &locator))
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_RELATION, locator.dbOid,
+				 RelFileLocatorToPgStatObjid(locator));
 
 	PG_RETURN_VOID();
 }
@@ -2290,5 +2295,16 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	uint64		objid = PG_GETARG_INT64(2);
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
+	/* Convert relation OID to relfilenode objid */
+	if (kind == PGSTAT_KIND_RELATION)
+	{
+		RelFileLocator locator;
+
+		if (!pgstat_reloid_to_relfilelocator(objid, &locator))
+			PG_RETURN_BOOL(false);
+
+		objid = RelFileLocatorToPgStatObjid(locator);
+	}
+
 	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objid));
 }
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 7ae503e71a2..5d0fe79f7e3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/backend_status.h"	/* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/pgstat_kind.h"
@@ -34,6 +35,12 @@
 /* Default directory to store temporary statistics data in */
 #define PG_STAT_TMP_DIR		"pg_stat_tmp"
 
+/*
+ * Build a pgstat key Objid based on a RelFileLocator.
+ */
+#define RelFileLocatorToPgStatObjid(locator) \
+	(((uint64) (locator).spcOid << 32) | (locator).relNumber)
+
 /* Values for track_functions GUC variable --- order is significant! */
 typedef enum TrackFunctionsLevel
 {
@@ -173,11 +180,11 @@ typedef struct PgStat_TableCounts
  */
 typedef struct PgStat_TableStatus
 {
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	uint64		id;				/* hash of relfilelocator for stats key */
 	struct PgStat_TableXactStatus *trans;	/* lowest subxact's counts */
 	PgStat_TableCounts counts;	/* event counts to be sent */
 	Relation	relation;		/* rel that is using this entry */
+	RelFileLocator locator;		/* table's relfilelocator */
 } PgStat_TableStatus;
 
 /* ----------
@@ -664,8 +671,8 @@ extern void pgstat_init_relation(Relation rel);
 extern void pgstat_assoc_relation(Relation rel);
 extern void pgstat_unlink_relation(Relation rel);
 
-extern void pgstat_report_vacuum(Oid tableoid, bool shared,
-								 PgStat_Counter livetuples, PgStat_Counter deadtuples,
+extern void pgstat_report_vacuum(RelFileLocator locator, PgStat_Counter livetuples,
+								 PgStat_Counter deadtuples,
 								 TimestampTz starttime);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
@@ -730,8 +737,7 @@ extern void pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
-extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
-														   Oid reloid);
+extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
 
 
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 4d2b8aa6081..1252249231d 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -718,6 +718,7 @@ extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 extern void pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
+extern bool pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 021e2bf361f..3a9c05eaf10 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -55,10 +55,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -79,10 +79,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -96,10 +96,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -110,10 +110,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -129,10 +129,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -145,10 +145,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -307,9 +307,9 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objid) = @_;
+	my ($db, $kind, $dboid, $objid) = @_;
 
-	return $node->safe_psql($connect_db,
+	return $node->safe_psql($db,
 		"SELECT pg_stat_have_stats('$kind', $dboid, $objid)");
 }
 
-- 
2.34.1

v7-0003-handle-relation-statistics-correctly-during-rewri.patchtext/x-diff; charset=us-asciiDownload
From 8f748e64a89603f9e7543ff0da30ceccaf7d5edb Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 4 Nov 2025 13:52:46 +0000
Subject: [PATCH v7 3/3] handle relation statistics correctly during rewrites

Now that PGSTAT_KIND_RELATION is keyed by refilenode, we need to handle rewrites.

To do so, this patch:

- Adds PgStat_PendingRewrite, a new struct to track rewrite operations within
a transaction, storing the old locator, new locator, and original locator (for
rewrite chains). This allows stats to be copied from the original location to
the final location at commit time.

- Adds a new function, pgstat_mark_rewrite(), called when a table rewrite begins.
It records the rewrite operation in a local list and detects rewrite chains by
checking if the old_locator matches any existing new_locator, preserving the
chain's original_locator.

- Modifies pgstat_copy_relation_stats(), to accept RelFileLocators instead of
Relations, with a new increment parameter to accumulate stats (needed for rewrite
chains with DML between rewrites).

- Ensures that AtEOXact_PgStat_Relations(), AtPrepare_PgStat_Relations(),
pgstat_twophase_postcommit()/postabort() pgstat_drop_relation() handle the
PgStat_PendingRewrite list correctly.

Note that due to the new flush call in pgstat_twophase_postcommit() we can not
call GetCurrentTransactionStopTimestamp() in pgstat_relation_flush_cb(). So,
adding a check to handle this special case and call GetCurrentTimestamp() instead.
Note that we'd call GetCurrentTimestamp() only if there is a rewrite, so that
the GetCurrentTimestamp() extra cost should be negligible. Another solution
could be to trigger the flush from FinishPreparedTransaction() but that's not
worth the extra complexity.

The new pending_rewrites list is traversed in multiple places. The overhead
should be negligible in comparison to a rewrite and the list should not contain
a lot of rewrites in practice.

The pending_rewrites list is traversed in multiple places. In typical usage,
the list will contain only a few entries so the traversal cost is negligible (
furthermore in comparison to a rewrite).
---
 src/backend/catalog/index.c                  |   2 +-
 src/backend/commands/cluster.c               |   5 +
 src/backend/commands/tablecmds.c             |   6 +
 src/backend/utils/activity/pgstat_relation.c | 391 ++++++++++++++++++-
 src/backend/utils/activity/pgstat_xact.c     |  25 +-
 src/backend/utils/cache/relcache.c           |   6 +
 src/include/pgstat.h                         |   5 +-
 src/tools/pgindent/typedefs.list             |   1 +
 8 files changed, 424 insertions(+), 17 deletions(-)
  92.8% src/backend/utils/activity/
   4.9% src/backend/

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5d9db167e59..8b6a7652fcf 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1795,7 +1795,7 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	changeDependenciesOn(RelationRelationId, oldIndexId, newIndexId);
 
 	/* copy over statistics from old to new index */
-	pgstat_copy_relation_stats(newClassRel, oldClassRel);
+	pgstat_copy_relation_stats(newClassRel->rd_locator, oldClassRel->rd_locator, false);
 
 	/* Copy data of pg_statistic from the old index to the new one */
 	CopyStatistics(oldIndexId, newIndexId);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b55221d44cd..da75dfa6ab8 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1196,6 +1196,11 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 
 		rel1 = relation_open(r1, NoLock);
 		rel2 = relation_open(r2, NoLock);
+
+		/* Mark that a rewrite happened */
+		if (RELKIND_HAS_STORAGE(rel1->rd_rel->relkind))
+			pgstat_mark_rewrite(rel1->rd_locator, rel2->rd_locator);
+
 		rel2->rd_createSubid = rel1->rd_createSubid;
 		rel2->rd_newRelfilelocatorSubid = rel1->rd_newRelfilelocatorSubid;
 		rel2->rd_firstRelfilelocatorSubid = rel1->rd_firstRelfilelocatorSubid;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3aac459e483..540923452fb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16848,6 +16848,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	Oid			reltoastrelid;
 	RelFileNumber newrelfilenumber;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator;
 	List	   *reltoastidxids = NIL;
 	ListCell   *lc;
 
@@ -16886,6 +16887,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	newrlocator = rel->rd_locator;
 	newrlocator.relNumber = newrelfilenumber;
 	newrlocator.spcOid = newTableSpace;
+	oldrlocator = rel->rd_locator;
 
 	/* hand off to AM to actually create new rel storage and copy the data */
 	if (rel->rd_rel->relkind == RELKIND_INDEX)
@@ -16898,6 +16900,10 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 		table_relation_copy_data(rel, &newrlocator);
 	}
 
+	/* mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Update the pg_class row.
 	 *
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 7debb14bb5d..15b4663eb77 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -30,6 +30,19 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+/* Pending rewrite operations for stats copying */
+typedef struct PgStat_PendingRewrite
+{
+	RelFileLocator old_locator;
+	RelFileLocator new_locator;
+	RelFileLocator original_locator;
+	int			nest_level;		/* Transaction nesting level where rewrite
+								 * occurred */
+	struct PgStat_PendingRewrite *next;
+} PgStat_PendingRewrite;
+
+/* The pending rewrites list for current transaction */
+static PgStat_PendingRewrite *pending_rewrites = NULL;
 
 /* Record that's written to 2PC state file when pgstat state is persisted */
 typedef struct TwoPhasePgStatRecord
@@ -43,6 +56,8 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter deleted_pre_truncdrop;
 	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
+	RelFileLocator rewrite_old_locator;
+	int			rewrite_nest_level;
 } TwoPhasePgStatRecord;
 
 
@@ -54,27 +69,70 @@ static void restore_truncdrop_counters(PgStat_TableXactStatus *trans);
 
 
 /*
- * Copy stats between relations. This is used for things like REINDEX
+ * Copy stats between RelFileLocator. This is used for things like REINDEX
  * CONCURRENTLY.
  */
 void
-pgstat_copy_relation_stats(Relation dst, Relation src)
+pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment)
 {
 	PgStat_StatTabEntry *srcstats;
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
+	srcstats = (PgStat_StatTabEntry *) pgstat_fetch_entry(PGSTAT_KIND_RELATION,
+														  src.dbOid,
+														  RelFileLocatorToPgStatObjid(src));
 	if (!srcstats)
 		return;
 
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-										  RelationGetRelid(dst),
+										  dst.dbOid,
+										  RelFileLocatorToPgStatObjid(dst),
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
-	dstshstats->stats = *srcstats;
+
+	if (!increment)
+		dstshstats->stats = *srcstats;
+	else
+	{
+		/* Increment those statistics */
+#define RELFSTAT_ACC(fld, stats_to_add) \
+	(dstshstats->stats.fld += stats_to_add->fld)
+		RELFSTAT_ACC(numscans, srcstats);
+		RELFSTAT_ACC(tuples_returned, srcstats);
+		RELFSTAT_ACC(tuples_fetched, srcstats);
+		RELFSTAT_ACC(tuples_inserted, srcstats);
+		RELFSTAT_ACC(tuples_updated, srcstats);
+		RELFSTAT_ACC(tuples_deleted, srcstats);
+		RELFSTAT_ACC(tuples_hot_updated, srcstats);
+		RELFSTAT_ACC(tuples_newpage_updated, srcstats);
+		RELFSTAT_ACC(live_tuples, srcstats);
+		RELFSTAT_ACC(dead_tuples, srcstats);
+		RELFSTAT_ACC(mod_since_analyze, srcstats);
+		RELFSTAT_ACC(ins_since_vacuum, srcstats);
+		RELFSTAT_ACC(blocks_fetched, srcstats);
+		RELFSTAT_ACC(blocks_hit, srcstats);
+		RELFSTAT_ACC(vacuum_count, srcstats);
+		RELFSTAT_ACC(autovacuum_count, srcstats);
+		RELFSTAT_ACC(analyze_count, srcstats);
+		RELFSTAT_ACC(autoanalyze_count, srcstats);
+		RELFSTAT_ACC(total_vacuum_time, srcstats);
+		RELFSTAT_ACC(total_autovacuum_time, srcstats);
+		RELFSTAT_ACC(total_analyze_time, srcstats);
+		RELFSTAT_ACC(total_autoanalyze_time, srcstats);
+#undef RELFSTAT_ACC
+
+		/* Replace those statistics */
+#define RELFSTAT_REP(fld, stats_to_rep) \
+	(dstshstats->stats.fld = stats_to_rep->fld)
+		RELFSTAT_REP(lastscan, srcstats);
+		RELFSTAT_REP(last_vacuum_time, srcstats);
+		RELFSTAT_REP(last_autovacuum_time, srcstats);
+		RELFSTAT_REP(last_analyze_time, srcstats);
+		RELFSTAT_REP(last_autoanalyze_time, srcstats);
+#undef RELFSTAT_REP
+	}
 
 	pgstat_unlock_entry(dst_ref);
 }
@@ -136,6 +194,7 @@ void
 pgstat_assoc_relation(Relation rel)
 {
 	RelFileLocator locator;
+	PgStat_TableStatus *pgstat_info;
 
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
@@ -165,14 +224,54 @@ pgstat_assoc_relation(Relation rel)
 		locator.relNumber = rel->rd_id;
 	}
 
+	/*
+	 * If this relation was rewritten during the current transaction we may be
+	 * reopening it with its new RelFileLocator. In that case, continue using
+	 * the stats entry associated with the old locator rather than creating a
+	 * new one. This ensures all stats from before and after the rewrite are
+	 * tracked in a single entry which will be properly copied to the new
+	 * locator at transaction commit.
+	 */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (locator.dbOid == rewrite->new_locator.dbOid &&
+				locator.spcOid == rewrite->new_locator.spcOid &&
+				locator.relNumber == rewrite->new_locator.relNumber)
+			{
+				pgstat_info = pgstat_prep_relation_pending(rewrite->old_locator);
+				goto found_entry;
+			}
+		}
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(locator);
+	pgstat_info = pgstat_prep_relation_pending(locator);
+
+found_entry:
+	rel->pgstat_info = pgstat_info;
+
+	/*
+	 * For relations stats, we key by physical file location, not by relation
+	 * OID. This means during operations like ALTER TYPE it's possible that
+	 * the relation OID changes but the relfilenode stays the same (no actual
+	 * rewrite needed). Unlink the old relation first.
+	 */
+	if (pgstat_info->relation != NULL &&
+		pgstat_info->relation != rel)
+	{
+		pgstat_info->relation->pgstat_info = NULL;
+		pgstat_info->relation = NULL;
+	}
 
 	/* don't allow link a stats to multiple relcache entries */
-	Assert(rel->pgstat_info->relation == NULL);
+	Assert(pgstat_info->relation == NULL);
 
 	/* mark this relation as the owner */
-	rel->pgstat_info->relation = rel;
+	pgstat_info->relation = rel;
 }
 
 /*
@@ -215,14 +314,37 @@ pgstat_drop_relation(Relation rel)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
+	bool		skip_transactional_drop = false;
 
 	/* don't track stats for relations without storage */
 	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
 		return;
 
-	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_locator.dbOid,
-							  RelFileLocatorToPgStatObjid(rel->rd_locator));
+	/* Check if this drop is part of a pending rewrite */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (rel->rd_locator.dbOid == rewrite->old_locator.dbOid &&
+				rel->rd_locator.spcOid == rewrite->old_locator.spcOid &&
+				rel->rd_locator.relNumber == rewrite->old_locator.relNumber)
+			{
+				skip_transactional_drop = true;
+				break;
+			}
+		}
+	}
+
+	/*
+	 * If it is part of a rewrite, drop its stats later, for example in
+	 * AtEOXact_PgStat_Relations(), so skip it here.
+	 */
+	if (!skip_transactional_drop)
+		pgstat_drop_transactional(PGSTAT_KIND_RELATION,
+								  rel->rd_locator.dbOid,
+								  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -660,6 +782,48 @@ AtEOXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit)
 		}
 		tabstat->trans = NULL;
 	}
+
+	/* preserve the stats in case of rewrite */
+	if (isCommit && pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+		PgStat_PendingRewrite *prev = NULL;
+		PgStat_PendingRewrite *current = pending_rewrites;
+		PgStat_PendingRewrite *next;
+
+		/* reverse the rewrites list to process in chronological order */
+		while (current != NULL)
+		{
+			next = current->next;
+			current->next = prev;
+			prev = current;
+			current = next;
+		}
+
+		/* now process rewrites in chronological order */
+		for (rewrite = prev; rewrite != NULL; rewrite = rewrite->next)
+		{
+			PgStat_EntryRef *old_entry_ref;
+
+			old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+													   rewrite->old_locator.dbOid,
+													   RelFileLocatorToPgStatObjid(rewrite->old_locator));
+
+			if (old_entry_ref && old_entry_ref->pending)
+				pgstat_relation_flush_cb(old_entry_ref, false);
+
+			pgstat_copy_relation_stats(rewrite->new_locator,
+									   rewrite->old_locator, true);
+
+			/* drop old locator's stats */
+			if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+								   rewrite->old_locator.dbOid,
+								   RelFileLocatorToPgStatObjid(rewrite->old_locator)))
+				pgstat_request_entry_refs_gc();
+		}
+	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -675,6 +839,30 @@ AtEOSubXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit, in
 	PgStat_TableXactStatus *trans;
 	PgStat_TableXactStatus *next_trans;
 
+	/*
+	 * If we don't commit then remove the associated rewrites if any, to keep
+	 * the rewrite chain in sync with what will be eventually committed.
+	 */
+	if (!isCommit)
+	{
+		PgStat_PendingRewrite **rewrite_ptr = &pending_rewrites;
+
+		while (*rewrite_ptr != NULL)
+		{
+			if ((*rewrite_ptr)->nest_level >= nestDepth)
+			{
+				PgStat_PendingRewrite *to_remove = *rewrite_ptr;
+
+				*rewrite_ptr = (*rewrite_ptr)->next;
+				pfree(to_remove);
+			}
+			else
+			{
+				rewrite_ptr = &((*rewrite_ptr)->next);
+			}
+		}
+	}
+
 	for (trans = xact_state->first; trans != NULL; trans = next_trans)
 	{
 		PgStat_TableStatus *tabstat;
@@ -754,11 +942,19 @@ void
 AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 {
 	PgStat_TableXactStatus *trans;
+	PgStat_PendingRewrite *rewrite;
 
+	/*
+	 * For each tabstat, find its matching rewrite and remove it from the
+	 * pending rewrites list. This way, after processing all tabstats, pending
+	 * rewrites will only contain rewrite only transactions.
+	 */
 	for (trans = xact_state->first; trans != NULL; trans = trans->next)
 	{
 		PgStat_TableStatus *tabstat PG_USED_FOR_ASSERTS_ONLY;
 		TwoPhasePgStatRecord record;
+		PgStat_PendingRewrite **rewrite_ptr;
+		bool		found_rewrite = false;
 
 		Assert(trans->nest_level == 1);
 		Assert(trans->upper == NULL);
@@ -778,10 +974,83 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 			record.locator = tabstat->locator;
 
 		record.truncdropped = trans->truncdropped;
+		record.rewrite_nest_level = 0;
+
+		/*
+		 * Look for a matching rewrite and remove it from pending rewrites. We
+		 * check three possible matches:
+		 *
+		 * The new_locator when stats have been added after the rewrite. The
+		 * old_locator when stats have been added before the rewrite but not
+		 * after. The original_locator when this tabstat is part of a rewrite
+		 * chain.
+		 */
+		rewrite_ptr = &pending_rewrites;
+		while (*rewrite_ptr != NULL)
+		{
+			rewrite = *rewrite_ptr;
+
+			if ((record.locator.dbOid == rewrite->new_locator.dbOid &&
+				 record.locator.spcOid == rewrite->new_locator.spcOid &&
+				 record.locator.relNumber == rewrite->new_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->old_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->old_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->old_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->original_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->original_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->original_locator.relNumber))
+			{
+				/*
+				 * Found matching rewrite. Record the rewrite information and
+				 * remove this rewrite from the list since it's now handled.
+				 */
+				record.rewrite_old_locator = rewrite->original_locator;
+				record.rewrite_nest_level = rewrite->nest_level;
+				record.locator = rewrite->new_locator;
+				found_rewrite = true;
+
+				/* Remove from pending_rewrites list */
+				*rewrite_ptr = rewrite->next;
+				pfree(rewrite);
+				break;
+			}
+			else
+			{
+				/* Move to next rewrite in the list */
+				rewrite_ptr = &(rewrite->next);
+			}
+		}
+
+		/* If no rewrite found, clear the rewrite fields */
+		if (!found_rewrite)
+		{
+			memset(&record.rewrite_old_locator, 0, sizeof(RelFileLocator));
+		}
+
+		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
+							   &record, sizeof(TwoPhasePgStatRecord));
+	}
+
+	/*
+	 * Now process any rewrites still pending. These are rewrite only
+	 * transactions. We need to preserve their stats even though there's no
+	 * tabstat entry for them.
+	 */
+	for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+	{
+		TwoPhasePgStatRecord record;
+
+		memset(&record, 0, sizeof(TwoPhasePgStatRecord));
+		record.locator = rewrite->new_locator;
+		record.rewrite_old_locator = rewrite->original_locator;
+		record.rewrite_nest_level = rewrite->nest_level;
+		record.truncdropped = false;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
 							   &record, sizeof(TwoPhasePgStatRecord));
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -804,6 +1073,8 @@ PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		tabstat = trans->parent;
 		tabstat->trans = NULL;
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -839,6 +1110,29 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	pgstat_info->counts.changed_tuples +=
 		rec->tuples_inserted + rec->tuples_updated +
 		rec->tuples_deleted;
+
+	if (rec->rewrite_nest_level > 0)
+	{
+		PgStat_EntryRef *old_entry_ref;
+
+		/* Flush any pending stats for old locator first */
+		old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+												   rec->rewrite_old_locator.dbOid,
+												   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator));
+
+		if (old_entry_ref && old_entry_ref->pending)
+			pgstat_relation_flush_cb(old_entry_ref, false);
+
+		/* Copy stats from old to new locator */
+		pgstat_copy_relation_stats(rec->locator, rec->rewrite_old_locator,
+								   true);
+
+		/* Drop old locator's stats */
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+							   rec->rewrite_old_locator.dbOid,
+							   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator)))
+			pgstat_request_entry_refs_gc();
+	}
 }
 
 /*
@@ -853,9 +1147,26 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 {
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
+	RelFileLocator target_locator;
+
+	/*
+	 * For aborted transactions with rewrites (like TRUNCATE), we need to
+	 * restore stats to the old locator, not the new one. The new locator
+	 * should be dropped since the rewrite is being rolled back.
+	 */
+	if (rec->rewrite_nest_level > 0)
+	{
+		/* Use the old locator */
+		target_locator = rec->rewrite_old_locator;
+	}
+	else
+	{
+		/* No rewrite, use the original locator */
+		target_locator = rec->locator;
+	}
 
 	/* Find or create a tabstat entry for the target locator */
-	pgstat_info = pgstat_prep_relation_pending(rec->locator);
+	pgstat_info = pgstat_prep_relation_pending(target_locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -910,7 +1221,17 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	tabentry->numscans += lstats->counts.numscans;
 	if (lstats->counts.numscans)
 	{
-		TimestampTz t = GetCurrentTransactionStopTimestamp();
+		TimestampTz t;
+
+		/*
+		 * Checking the transaction state due to the flush call in
+		 * pgstat_twophase_postcommit() that would break the assertion on the
+		 * state in GetCurrentTransactionStopTimestamp().
+		 */
+		if (!IsTransactionState())
+			t = GetCurrentTransactionStopTimestamp();
+		else
+			t = GetCurrentTimestamp();
 
 		if (t > tabentry->lastscan)
 			tabentry->lastscan = t;
@@ -1162,3 +1483,45 @@ pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
 	ReleaseSysCache(tuple);
 	return result;
 }
+
+/*
+ * Mark that a relation rewrite has occurred, preserving the original locator
+ * so stats can be copied at transaction commit.
+ */
+void
+pgstat_mark_rewrite(RelFileLocator old_locator, RelFileLocator new_locator)
+{
+	PgStat_PendingRewrite *rewrite;
+	PgStat_PendingRewrite *existing;
+	RelFileLocator original_locator = old_locator;
+
+	for (existing = pending_rewrites; existing != NULL; existing = existing->next)
+	{
+		if (old_locator.dbOid == existing->new_locator.dbOid &&
+			old_locator.spcOid == existing->new_locator.spcOid &&
+			old_locator.relNumber == existing->new_locator.relNumber)
+		{
+			original_locator = existing->original_locator;
+			break;
+		}
+	}
+
+	/* Allocate in TopTransactionContext memory context */
+	rewrite = MemoryContextAlloc(TopTransactionContext,
+								 sizeof(PgStat_PendingRewrite));
+
+	rewrite->old_locator = old_locator;
+	rewrite->new_locator = new_locator;
+	rewrite->original_locator = original_locator;
+	rewrite->nest_level = GetCurrentTransactionNestLevel();
+
+	/* Add to the list */
+	rewrite->next = pending_rewrites;
+	pending_rewrites = rewrite;
+}
+
+void
+pgstat_clear_rewrite(void)
+{
+	pending_rewrites = NULL;
+}
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index bc9864bd8d9..f8cf3755ce2 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -55,6 +55,8 @@ AtEOXact_PgStat(bool isCommit, bool parallel)
 	}
 	pgStatXactStack = NULL;
 
+	pgstat_clear_rewrite();
+
 	/* Make sure any stats snapshot is thrown away */
 	pgstat_clear_snapshot();
 }
@@ -360,8 +362,29 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid, bo
 void
 pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objid, false, NULL))
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+
+	if (entry_ref)
 	{
+		/*
+		 * For relations stats, we key by physical file location, not by
+		 * relation OID. This means during operations like ALTER TYPE where
+		 * the relation OID changes but the relfilenode stays the same (no
+		 * actual rewrite needed), we'll find an existing entry.
+		 *
+		 * This is expected behavior, we want to preserve stats across the
+		 * catalog change. Simply reset and recreate the entry for the new
+		 * relation OID without warning.
+		 */
+		if (kind == PGSTAT_KIND_RELATION)
+		{
+			pgstat_reset(kind, dboid, objid);
+			create_drop_transactional_internal(kind, dboid, objid, true);
+			return;
+		}
+
 		ereport(WARNING,
 				errmsg("resetting existing statistics for kind %s, db=%u, oid=%" PRIu64,
 					   (pgstat_get_kind_info(kind))->name, dboid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 915d0bc9084..7a7f8023eb3 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -85,6 +85,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/relmapper.h"
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
@@ -3780,6 +3781,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	MultiXactId minmulti = InvalidMultiXactId;
 	TransactionId freezeXid = InvalidTransactionId;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator = relation->rd_locator;
 
 	if (!IsBinaryUpgrade)
 	{
@@ -3951,6 +3953,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 
 	table_close(pg_class, RowExclusiveLock);
 
+	/* Mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Make the pg_class row change or relation map change visible.  This will
 	 * cause the relcache entry to get updated, too.
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 5d0fe79f7e3..332dffde400 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -665,7 +665,7 @@ extern PgStat_FunctionCounts *find_funcstat_entry(Oid func_id);
 
 extern void pgstat_create_relation(Relation rel);
 extern void pgstat_drop_relation(Relation rel);
-extern void pgstat_copy_relation_stats(Relation dst, Relation src);
+extern void pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment);
 
 extern void pgstat_init_relation(Relation rel);
 extern void pgstat_assoc_relation(Relation rel);
@@ -677,6 +677,9 @@ extern void pgstat_report_vacuum(RelFileLocator locator, PgStat_Counter livetupl
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter, TimestampTz starttime);
+extern void pgstat_mark_rewrite(RelFileLocator old_locator,
+								RelFileLocator new_locator);
+extern void pgstat_clear_rewrite(void);
 
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 432509277c9..bd8fcb16dcf 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2248,6 +2248,7 @@ PgStat_KindInfo
 PgStat_LocalState
 PgStat_PendingDroppedStatsItem
 PgStat_PendingIO
+PgStat_PendingRewrite
 PgStat_SLRUStats
 PgStat_ShmemControl
 PgStat_Snapshot
-- 
2.34.1

#40Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#39)
Re: relfilenode statistics

On Fri, Nov 07, 2025 at 11:28:27AM +0000, Bertrand Drouvot wrote:

While there are existing rewrite tests, the stats behavior during rewrites
doesn't have a good coverage. This patch adds some tests to record some stats
after different rewrite scenarios.

That way, we'll be able to test that the stats are still the ones we
expect after rewrites. Note that it generates a new stats_1.out (which is quite
large), so we may want to move those new tests to "isolation" instead.

Looking at this part of the patch set for now, not looked at the rest
yet. This new stats_1.out is 2k lines long, introduced for the tests
related to rewrites as an effect of 2PC. It seems to me that a split
into a new stats_rewrite would be justified for this case, to reduce
the output duplication.
--
Michael

#41Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#40)
Re: relfilenode statistics

On Sun, Nov 09, 2025 at 08:33:54AM +0900, Michael Paquier wrote:

Looking at this part of the patch set for now, not looked at the rest
yet. This new stats_1.out is 2k lines long, introduced for the tests
related to rewrites as an effect of 2PC. It seems to me that a split
into a new stats_rewrite would be justified for this case, to reduce
the output duplication.

The first patch had an issue with some of the tests checking for dead
tuples: if an autovacuum kicks in before querying the stats, we would
get a dead tuple number of 0. So I have expanded the tests a bit to
avoid autovacuum interactions, which should be enough to avoid noise,
did a split into a new file, which should also be fine because we
don't rely on a system-wide stats reset, then applied the result.

The patch is spending a great deal of effort on three fronts:
- making sure that the statistics are copied over after a relation
rewrite.
- making sure that we assign a "correct" object ID, assigning
the fields of RelFileLocator based on a relation ID. Mapped and
shared relations make the exercise a bit more difficult. It would be
nice to avoid this kind of duplication with other code paths that
assign a RelFileLocator.
- Partitioned tables, where we don't have a relfilenode but we need to
track statistics. The patch relies on the relation oid to assign a
key, as far as I've read.

Among the three points, the first one is the most invasive in the
patch, it seems, and do we actually want to keep the stats across
rewrites at all? The main reason of doing the relfilenode move
would be to rebuild these stats on a WAL-record basis because the
relfile locator is the only thing we know in the startup process, and
once rewritten the state of the data is different.
relation_needs_vacanalyze() then cares about three fields:
- Number of dead tuples, which would be 0 after a rewrite.
- ins_since_vacuum, which would be 0 after a rewrite.
- mod_since_analyze, for analyze, again 0.

I have not checked the recent autovacuum scheduling thread to see if
this set changes there.

Are these numbers worth the effort of copying over at the end? Was
this particular point discussed? I've seen this mentioned once here,
but I am wondering what are the arguments in favor of copying the
stats data versus not copying it across rewrites:
/messages/by-id/20240607031736.7izmr2yirznvidka@awork3.anarazel.de
--
Michael

#42Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#41)
Re: relfilenode statistics

Hi,

On Mon, Nov 10, 2025 at 05:53:45PM +0900, Michael Paquier wrote:

On Sun, Nov 09, 2025 at 08:33:54AM +0900, Michael Paquier wrote:

Looking at this part of the patch set for now, not looked at the rest
yet. This new stats_1.out is 2k lines long, introduced for the tests
related to rewrites as an effect of 2PC. It seems to me that a split
into a new stats_rewrite would be justified for this case, to reduce
the output duplication.

did a split into a new file, which should also be fine because we
don't rely on a system-wide stats reset, then applied the result.

Thanks!

The patch is spending a great deal of effort on three fronts:
- making sure that the statistics are copied over after a relation
rewrite.

Right, in 0003.

- making sure that we assign a "correct" object ID, assigning
the fields of RelFileLocator based on a relation ID. Mapped and
shared relations make the exercise a bit more difficult. It would be
nice to avoid this kind of duplication with other code paths that
assign a RelFileLocator.

Are you referring to the new pgstat_reloid_to_relfilelocator() function?
If so, I'll try to avoid code duplication with other code paths as suggested.

- Partitioned tables, where we don't have a relfilenode but we need to
track statistics. The patch relies on the relation oid to assign a
key, as far as I've read.

Right. It's not doing that much in this area. It's needed so that things like
"last_analyze" on a partitioned table is populated (see "Ensure only the
partitioned table is analyzed" in vacuum.sql).

Among the three points, the first one is the most invasive in the
patch, it seems, and do we actually want to keep the stats across
rewrites at all?

Not doing so would mean that all stats related to a relation will be lost after
a rewrite. I think that would be a major regression as compared to the current
behavior.

The main reason of doing the relfilenode move
would be to rebuild these stats on a WAL-record basis because the
relfile locator is the only thing we know in the startup process, and
once rewritten the state of the data is different.

relation_needs_vacanalyze() then cares about three fields:
- Number of dead tuples, which would be 0 after a rewrite.
- ins_since_vacuum, which would be 0 after a rewrite.
- mod_since_analyze, for analyze, again 0.

I have not checked the recent autovacuum scheduling thread to see if
this set changes there.

Are these numbers worth the effort of copying over at the end?

I think so because that would impact all the other relation's stats (not only
the ones linked to relation_needs_vacanalyze()).

Was
this particular point discussed? I've seen this mentioned once here,
but I am wondering what are the arguments in favor of copying the
stats data versus not copying it across rewrites:
/messages/by-id/20240607031736.7izmr2yirznvidka@awork3.anarazel.de

In favor of copying, I would say:

- no regression as compared to the current behavior. That means, for example,
not breaking DBA's activities/decisions based on the pg_stat_all_tables fields
after a rewrite.

- a rewrite is not changing the number of dead tuples, ins_since_vacuum and
mod_since_analyze. So, if don't copy those, then we'd change the
relation_needs_vacanalyze() decision(s) as compared to the current one(s) for no
reasons (as a rewrite has no impact on those).

In favor of not copying, I would say make the code simpler.

I'm in favor of copying but open to different point of views.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#43Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#42)
2 attachment(s)
Re: relfilenode statistics

Hi,

On Wed, Nov 12, 2025 at 05:03:55PM +0000, Bertrand Drouvot wrote:

In favor of not copying, I would say make the code simpler.

I'm in favor of copying but open to different point of views.

PFA a mandatory rebase.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v8-0001-Key-PGSTAT_KIND_RELATION-by-relfile-locator.patchtext/x-diff; charset=us-asciiDownload
From 7908ba56cb8b6255b869af6be13077aa0315d5f1 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 1 Oct 2025 09:45:26 +0000
Subject: [PATCH v8 1/2] Key PGSTAT_KIND_RELATION by relfile locator

This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
Instead of the relation oid, it now relies on:

- dboid (linked to RelFileLocator's dbOid)
- objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
that computes an objoid based on the RelFileLocator's spcOid and the
RelFileLocator's relNumber.

That will allow us to add new stats (add writes counters) and ensure that some
counters (n_dead_tup and friends) are replicated.

The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
RelationIdGetRelation() to get the relfilelocator based on the relation oid
and 2) handle the partitioned table case.

Please note that:

- when running pg_stat_have_stats('relation',...) we now need to be connected
to the database that hosts the relation. As pg_stat_have_stats() is not
documented publicly, then the changes done in 029_stats_restart.pl look
enough.

- this patch does not handle rewrites so some tests are failing. It's only
intent is to ease the review and should not be pushed without being
merged with the following patch that handles the rewrites.

- it can be used to test that stats are incremented correctly and that we're
able to retrieve them as long as rewrites are not involved.
---
 src/backend/access/heap/vacuumlazy.c         |   3 +-
 src/backend/postmaster/autovacuum.c          |   9 +-
 src/backend/utils/activity/pgstat_relation.c | 234 +++++++++++++++----
 src/backend/utils/adt/pgstatfuncs.c          |  22 +-
 src/include/pgstat.h                         |  18 +-
 src/include/utils/pgstat_internal.h          |   1 +
 src/test/recovery/t/029_stats_restart.pl     |  40 ++--
 7 files changed, 249 insertions(+), 78 deletions(-)
   3.3% src/backend/postmaster/
  64.6% src/backend/utils/activity/
   5.3% src/backend/utils/adt/
   7.0% src/include/
  18.7% src/test/recovery/t/

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..a9b2b4e1033 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -961,8 +961,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
 	 * soon in cases where the failsafe prevented significant amounts of heap
 	 * vacuuming.
 	 */
-	pgstat_report_vacuum(RelationGetRelid(rel),
-						 rel->rd_rel->relisshared,
+	pgstat_report_vacuum(rel->rd_locator,
 						 Max(vacrel->new_live_tuples, 0),
 						 vacrel->recently_dead_tuples +
 						 vacrel->missed_dead_tuples,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..563a3697690 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2048,8 +2048,7 @@ do_autovacuum(void)
 
 		/* Fetch reloptions and the pgstat entry for this table */
 		relopts = extract_autovac_opts(tuple, pg_class_desc);
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 		/* Check if it needs vacuum or analyze */
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
@@ -2141,8 +2140,7 @@ do_autovacuum(void)
 		}
 
 		/* Fetch the pgstat entry for this table */
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
 								  effective_multixact_freeze_max_age,
@@ -2939,8 +2937,7 @@ recheck_relation_needs_vacanalyze(Oid relid,
 	PgStat_StatTabEntry *tabentry;
 
 	/* fetch the pgstat table entry */
-	tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-											  relid);
+	tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
 							  effective_multixact_freeze_max_age,
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index b90754f8578..ee6f2eb2bdb 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -17,12 +17,17 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_tablespace.h"
+#include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 #include "utils/rel.h"
+#include "utils/relmapper.h"
+#include "utils/syscache.h"
 #include "utils/timestamp.h"
 
 
@@ -36,13 +41,12 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter inserted_pre_truncdrop;
 	PgStat_Counter updated_pre_truncdrop;
 	PgStat_Counter deleted_pre_truncdrop;
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
 } TwoPhasePgStatRecord;
 
 
-static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+static PgStat_TableStatus *pgstat_prep_relation_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -60,8 +64,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(src->rd_rel->relisshared,
-											  RelationGetRelid(src));
+	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
 	if (!srcstats)
 		return;
 
@@ -94,8 +97,10 @@ pgstat_init_relation(Relation rel)
 
 	/*
 	 * We only count stats for relations with storage and partitioned tables
+	 * and we don't count stats generated during a rewrite.
 	 */
-	if (!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE)
+	if ((!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE) ||
+		OidIsValid(rel->rd_rel->relrewrite))
 	{
 		rel->pgstat_enabled = false;
 		rel->pgstat_info = NULL;
@@ -130,12 +135,38 @@ pgstat_init_relation(Relation rel)
 void
 pgstat_assoc_relation(Relation rel)
 {
+	RelFileLocator locator;
+
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
 
+	/*
+	 * Don't associate stats for relations without storage and non partitioned
+	 * tables.
+	 */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind) &&
+		rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		return;
+
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use the relation OID as relNumber.
+		 * No collision with regular relations is possible because relNumbers
+		 * are also assigned from the pg_class OID space (see
+		 * GetNewRelFileNumber()), making each value unique across the
+		 * database regardless of spcOid.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = InvalidOid;
+		locator.relNumber = rel->rd_id;
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(RelationGetRelid(rel),
-													rel->rd_rel->relisshared);
+	rel->pgstat_info = pgstat_prep_relation_pending(locator);
 
 	/* don't allow link a stats to multiple relcache entries */
 	Assert(rel->pgstat_info->relation == NULL);
@@ -167,9 +198,13 @@ pgstat_unlink_relation(Relation rel)
 void
 pgstat_create_relation(Relation rel)
 {
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
-								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								rel->rd_locator.dbOid,
+								RelFileLocatorToPgStatObjid(rel->rd_locator));
 }
 
 /*
@@ -181,9 +216,13 @@ pgstat_drop_relation(Relation rel)
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
 
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  rel->rd_locator.dbOid,
+							  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -207,14 +246,12 @@ pgstat_drop_relation(Relation rel)
  * Report that the table was just vacuumed and flush IO statistics.
  */
 void
-pgstat_report_vacuum(Oid tableoid, bool shared,
-					 PgStat_Counter livetuples, PgStat_Counter deadtuples,
-					 TimestampTz starttime)
+pgstat_report_vacuum(RelFileLocator locator, PgStat_Counter livetuples,
+					 PgStat_Counter deadtuples, TimestampTz starttime)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
 
@@ -227,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											locator.dbOid, RelFileLocatorToPgStatObjid(locator), false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -286,9 +323,9 @@ pgstat_report_analyze(Relation rel,
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
+	RelFileLocator locator;
 
 	if (!pgstat_track_counts)
 		return;
@@ -326,9 +363,26 @@ pgstat_report_analyze(Relation rel,
 	ts = GetCurrentTimestamp();
 	elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
 
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use the relation OID as relNumber.
+		 * No collision with regular relations is possible because relNumbers
+		 * are also assigned from the pg_class OID space (see
+		 * GetNewRelFileNumber()), making each value unique across the
+		 * database regardless of spcOid.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = InvalidOid;
+		locator.relNumber = rel->rd_id;
+	}
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
-											RelationGetRelid(rel),
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
+											locator.dbOid,
+											RelFileLocatorToPgStatObjid(locator),
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -469,7 +523,7 @@ pgstat_update_heap_dead_tuples(Relation rel, int delta)
 PgStat_StatTabEntry *
 pgstat_fetch_stat_tabentry(Oid relid)
 {
-	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
+	return pgstat_fetch_stat_tabentry_ext(relid);
 }
 
 /*
@@ -477,12 +531,19 @@ pgstat_fetch_stat_tabentry(Oid relid)
  * whether the to-be-accessed table is a shared relation or not.
  */
 PgStat_StatTabEntry *
-pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
+pgstat_fetch_stat_tabentry_ext(Oid reloid)
 {
-	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
+	PgStat_StatTabEntry *tabentry;
+	RelFileLocator locator;
+
+	if (!pgstat_reloid_to_relfilelocator(reloid, &locator))
+		return NULL;
 
-	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+	/* fetch the stats entry using the relfilenode based key */
+	tabentry = (PgStat_StatTabEntry *) pgstat_fetch_entry(PGSTAT_KIND_RELATION,
+														  locator.dbOid,
+														  RelFileLocatorToPgStatObjid(locator));
+	return tabentry;
 }
 
 /*
@@ -504,14 +565,17 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableXactStatus *trans;
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
+	RelFileLocator locator;
+
+	if (!pgstat_reloid_to_relfilelocator(rel_id, &locator))
+		return NULL;
+
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+										   locator.dbOid,
+										   RelFileLocatorToPgStatObjid(locator));
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
 	if (!entry_ref)
-	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
-		if (!entry_ref)
-			return tablestatus;
-	}
+		return tablestatus;
 
 	tabentry = (PgStat_TableStatus *) entry_ref->pending;
 	tablestatus = palloc_object(PgStat_TableStatus);
@@ -707,8 +771,12 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		record.inserted_pre_truncdrop = trans->inserted_pre_truncdrop;
 		record.updated_pre_truncdrop = trans->updated_pre_truncdrop;
 		record.deleted_pre_truncdrop = trans->deleted_pre_truncdrop;
-		record.id = tabstat->id;
-		record.shared = tabstat->shared;
+
+		if (tabstat->relation != NULL)
+			record.locator = tabstat->relation->rd_locator;
+		else
+			record.locator = tabstat->locator;
+
 		record.truncdropped = trans->truncdropped;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
@@ -751,7 +819,7 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	PgStat_TableStatus *pgstat_info;
 
 	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, commit case */
 	pgstat_info->counts.tuples_inserted += rec->tuples_inserted;
@@ -786,8 +854,8 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
 
-	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	/* Find or create a tabstat entry for the target locator */
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -921,17 +989,21 @@ pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
  * initialized if not exists.
  */
 static PgStat_TableStatus *
-pgstat_prep_relation_pending(Oid rel_id, bool isshared)
+pgstat_prep_relation_pending(RelFileLocator locator)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStat_TableStatus *pending;
+	uint64		objid;
+
+	objid = RelFileLocatorToPgStatObjid(locator);
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
-										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  locator.dbOid,
+										  objid, NULL);
+
 	pending = entry_ref->pending;
-	pending->id = rel_id;
-	pending->shared = isshared;
+	pending->id = objid;
+	pending->locator = locator;
 
 	return pending;
 }
@@ -1010,3 +1082,83 @@ restore_truncdrop_counters(PgStat_TableXactStatus *trans)
 		trans->tuples_deleted = trans->deleted_pre_truncdrop;
 	}
 }
+
+/*
+ * Convert a relation OID to its corresponding RelFileLocator for statistics
+ * tracking purposes.
+ *
+ * Returns true on success, false if the relation doesn't need statistics
+ * tracking.
+ *
+ * For partitioned tables, constructs a synthetic locator using the relation
+ * OID as relNumber, since they don't have storage.
+ */
+bool
+pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
+{
+	HeapTuple	tuple;
+	Form_pg_class relform;
+	bool		result = true;
+
+	/* get the relation's tuple from pg_class */
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(reloid));
+
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	relform = (Form_pg_class) GETSTRUCT(tuple);
+
+	/* skip relations without storage and non partitioned tables */
+	if (!RELKIND_HAS_STORAGE(relform->relkind) &&
+		relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		ReleaseSysCache(tuple);
+		return false;
+	}
+
+	if (relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		/* build the RelFileLocator */
+		locator->relNumber = relform->relfilenode;
+		locator->spcOid = relform->reltablespace;
+
+		/* handle default tablespace */
+		if (!OidIsValid(locator->spcOid))
+			locator->spcOid = MyDatabaseTableSpace;
+
+		/* handle dbOid for global vs local relations */
+		if (locator->spcOid == GLOBALTABLESPACE_OID)
+			locator->dbOid = InvalidOid;
+		else
+			locator->dbOid = MyDatabaseId;
+
+		/* handle mapped relations */
+		if (!RelFileNumberIsValid(locator->relNumber))
+		{
+			locator->relNumber = RelationMapOidToFilenumber(reloid,
+															relform->relisshared);
+			if (!RelFileNumberIsValid(locator->relNumber))
+			{
+				ReleaseSysCache(tuple);
+				return false;
+			}
+		}
+	}
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use the relation OID as relNumber.
+		 * No collision with regular relations is possible because relNumbers
+		 * are also assigned from the pg_class OID space (see
+		 * GetNewRelFileNumber()), making each value unique across the
+		 * database regardless of spcOid.
+		 */
+		locator->dbOid = (relform->relisshared ? InvalidOid : MyDatabaseId);
+		locator->spcOid = InvalidOid;
+		locator->relNumber = relform->oid;
+	}
+
+	ReleaseSysCache(tuple);
+	return result;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ef6fffe60b9..60ffb1679ec 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -23,13 +23,13 @@
 #include "common/ip.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "replication/logicallauncher.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/pgstat_internal.h"
 #include "utils/timestamp.h"
 
 #define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
@@ -1949,9 +1949,14 @@ Datum
 pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 {
 	Oid			taboid = PG_GETARG_OID(0);
-	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
+	RelFileLocator locator;
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	/* Get the stats locator from the relation OID */
+	if (!pgstat_reloid_to_relfilelocator(taboid, &locator))
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_RELATION, locator.dbOid,
+				 RelFileLocatorToPgStatObjid(locator));
 
 	PG_RETURN_VOID();
 }
@@ -2305,5 +2310,16 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	uint64		objid = PG_GETARG_INT64(2);
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
+	/* Convert relation OID to relfilenode objid */
+	if (kind == PGSTAT_KIND_RELATION)
+	{
+		RelFileLocator locator;
+
+		if (!pgstat_reloid_to_relfilelocator(objid, &locator))
+			PG_RETURN_BOOL(false);
+
+		objid = RelFileLocatorToPgStatObjid(locator);
+	}
+
 	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objid));
 }
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f23dd5870da..c9e451094f3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -17,6 +17,7 @@
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
 #include "replication/worker_internal.h"
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/backend_status.h"	/* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/pgstat_kind.h"
@@ -35,6 +36,12 @@
 /* Default directory to store temporary statistics data in */
 #define PG_STAT_TMP_DIR		"pg_stat_tmp"
 
+/*
+ * Build a pgstat key Objid based on a RelFileLocator.
+ */
+#define RelFileLocatorToPgStatObjid(locator) \
+	(((uint64) (locator).spcOid << 32) | (locator).relNumber)
+
 /* Values for track_functions GUC variable --- order is significant! */
 typedef enum TrackFunctionsLevel
 {
@@ -175,11 +182,11 @@ typedef struct PgStat_TableCounts
  */
 typedef struct PgStat_TableStatus
 {
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	uint64		id;				/* hash of relfilelocator for stats key */
 	struct PgStat_TableXactStatus *trans;	/* lowest subxact's counts */
 	PgStat_TableCounts counts;	/* event counts to be sent */
 	Relation	relation;		/* rel that is using this entry */
+	RelFileLocator locator;		/* table's relfilelocator */
 } PgStat_TableStatus;
 
 /* ----------
@@ -669,8 +676,8 @@ extern void pgstat_init_relation(Relation rel);
 extern void pgstat_assoc_relation(Relation rel);
 extern void pgstat_unlink_relation(Relation rel);
 
-extern void pgstat_report_vacuum(Oid tableoid, bool shared,
-								 PgStat_Counter livetuples, PgStat_Counter deadtuples,
+extern void pgstat_report_vacuum(RelFileLocator locator, PgStat_Counter livetuples,
+								 PgStat_Counter deadtuples,
 								 TimestampTz starttime);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
@@ -735,8 +742,7 @@ extern void pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
-extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
-														   Oid reloid);
+extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
 
 
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 5c1ce4d3d6a..7b24928b00d 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -764,6 +764,7 @@ extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 extern void pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
+extern bool pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 021e2bf361f..3a9c05eaf10 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -55,10 +55,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -79,10 +79,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -96,10 +96,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -110,10 +110,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -129,10 +129,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -145,10 +145,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -307,9 +307,9 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objid) = @_;
+	my ($db, $kind, $dboid, $objid) = @_;
 
-	return $node->safe_psql($connect_db,
+	return $node->safe_psql($db,
 		"SELECT pg_stat_have_stats('$kind', $dboid, $objid)");
 }
 
-- 
2.34.1

v8-0002-handle-relation-statistics-correctly-during-rewri.patchtext/x-diff; charset=us-asciiDownload
From 2c7c6e20d438b5d660d0931c1736007efa862c99 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 4 Nov 2025 13:52:46 +0000
Subject: [PATCH v8 2/2] handle relation statistics correctly during rewrites

Now that PGSTAT_KIND_RELATION is keyed by refilenode, we need to handle rewrites.

To do so, this patch:

- Adds PgStat_PendingRewrite, a new struct to track rewrite operations within
a transaction, storing the old locator, new locator, and original locator (for
rewrite chains). This allows stats to be copied from the original location to
the final location at commit time.

- Adds a new function, pgstat_mark_rewrite(), called when a table rewrite begins.
It records the rewrite operation in a local list and detects rewrite chains by
checking if the old_locator matches any existing new_locator, preserving the
chain's original_locator.

- Modifies pgstat_copy_relation_stats(), to accept RelFileLocators instead of
Relations, with a new increment parameter to accumulate stats (needed for rewrite
chains with DML between rewrites).

- Ensures that AtEOXact_PgStat_Relations(), AtPrepare_PgStat_Relations(),
pgstat_twophase_postcommit()/postabort() pgstat_drop_relation() handle the
PgStat_PendingRewrite list correctly.

Note that due to the new flush call in pgstat_twophase_postcommit() we can not
call GetCurrentTransactionStopTimestamp() in pgstat_relation_flush_cb(). So,
adding a check to handle this special case and call GetCurrentTimestamp() instead.
Note that we'd call GetCurrentTimestamp() only if there is a rewrite, so that
the GetCurrentTimestamp() extra cost should be negligible. Another solution
could be to trigger the flush from FinishPreparedTransaction() but that's not
worth the extra complexity.

The new pending_rewrites list is traversed in multiple places. The overhead
should be negligible in comparison to a rewrite and the list should not contain
a lot of rewrites in practice.

The pending_rewrites list is traversed in multiple places. In typical usage,
the list will contain only a few entries so the traversal cost is negligible (
furthermore in comparison to a rewrite).
---
 src/backend/catalog/index.c                  |   2 +-
 src/backend/commands/cluster.c               |   5 +
 src/backend/commands/tablecmds.c             |   6 +
 src/backend/utils/activity/pgstat_relation.c | 391 ++++++++++++++++++-
 src/backend/utils/activity/pgstat_xact.c     |  25 +-
 src/backend/utils/cache/relcache.c           |   6 +
 src/include/pgstat.h                         |   5 +-
 src/tools/pgindent/typedefs.list             |   1 +
 8 files changed, 424 insertions(+), 17 deletions(-)
  92.8% src/backend/utils/activity/
   4.9% src/backend/

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8dea58ad96b..b71925a22c3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1795,7 +1795,7 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	changeDependenciesOn(RelationRelationId, oldIndexId, newIndexId);
 
 	/* copy over statistics from old to new index */
-	pgstat_copy_relation_stats(newClassRel, oldClassRel);
+	pgstat_copy_relation_stats(newClassRel->rd_locator, oldClassRel->rd_locator, false);
 
 	/* Copy data of pg_statistic from the old index to the new one */
 	CopyStatistics(oldIndexId, newIndexId);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 2120c85ccb4..6155b12afab 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1196,6 +1196,11 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 
 		rel1 = relation_open(r1, NoLock);
 		rel2 = relation_open(r2, NoLock);
+
+		/* Mark that a rewrite happened */
+		if (RELKIND_HAS_STORAGE(rel1->rd_rel->relkind))
+			pgstat_mark_rewrite(rel1->rd_locator, rel2->rd_locator);
+
 		rel2->rd_createSubid = rel1->rd_createSubid;
 		rel2->rd_newRelfilelocatorSubid = rel1->rd_newRelfilelocatorSubid;
 		rel2->rd_firstRelfilelocatorSubid = rel1->rd_firstRelfilelocatorSubid;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..9de70f321ed 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16884,6 +16884,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	Oid			reltoastrelid;
 	RelFileNumber newrelfilenumber;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator;
 	List	   *reltoastidxids = NIL;
 	ListCell   *lc;
 
@@ -16922,6 +16923,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	newrlocator = rel->rd_locator;
 	newrlocator.relNumber = newrelfilenumber;
 	newrlocator.spcOid = newTableSpace;
+	oldrlocator = rel->rd_locator;
 
 	/* hand off to AM to actually create new rel storage and copy the data */
 	if (rel->rd_rel->relkind == RELKIND_INDEX)
@@ -16934,6 +16936,10 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 		table_relation_copy_data(rel, &newrlocator);
 	}
 
+	/* mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Update the pg_class row.
 	 *
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index ee6f2eb2bdb..83103189fb5 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -30,6 +30,19 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+/* Pending rewrite operations for stats copying */
+typedef struct PgStat_PendingRewrite
+{
+	RelFileLocator old_locator;
+	RelFileLocator new_locator;
+	RelFileLocator original_locator;
+	int			nest_level;		/* Transaction nesting level where rewrite
+								 * occurred */
+	struct PgStat_PendingRewrite *next;
+} PgStat_PendingRewrite;
+
+/* The pending rewrites list for current transaction */
+static PgStat_PendingRewrite *pending_rewrites = NULL;
 
 /* Record that's written to 2PC state file when pgstat state is persisted */
 typedef struct TwoPhasePgStatRecord
@@ -43,6 +56,8 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter deleted_pre_truncdrop;
 	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
+	RelFileLocator rewrite_old_locator;
+	int			rewrite_nest_level;
 } TwoPhasePgStatRecord;
 
 
@@ -54,27 +69,70 @@ static void restore_truncdrop_counters(PgStat_TableXactStatus *trans);
 
 
 /*
- * Copy stats between relations. This is used for things like REINDEX
+ * Copy stats between RelFileLocator. This is used for things like REINDEX
  * CONCURRENTLY.
  */
 void
-pgstat_copy_relation_stats(Relation dst, Relation src)
+pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment)
 {
 	PgStat_StatTabEntry *srcstats;
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
+	srcstats = (PgStat_StatTabEntry *) pgstat_fetch_entry(PGSTAT_KIND_RELATION,
+														  src.dbOid,
+														  RelFileLocatorToPgStatObjid(src));
 	if (!srcstats)
 		return;
 
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-										  RelationGetRelid(dst),
+										  dst.dbOid,
+										  RelFileLocatorToPgStatObjid(dst),
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
-	dstshstats->stats = *srcstats;
+
+	if (!increment)
+		dstshstats->stats = *srcstats;
+	else
+	{
+		/* Increment those statistics */
+#define RELFSTAT_ACC(fld, stats_to_add) \
+	(dstshstats->stats.fld += stats_to_add->fld)
+		RELFSTAT_ACC(numscans, srcstats);
+		RELFSTAT_ACC(tuples_returned, srcstats);
+		RELFSTAT_ACC(tuples_fetched, srcstats);
+		RELFSTAT_ACC(tuples_inserted, srcstats);
+		RELFSTAT_ACC(tuples_updated, srcstats);
+		RELFSTAT_ACC(tuples_deleted, srcstats);
+		RELFSTAT_ACC(tuples_hot_updated, srcstats);
+		RELFSTAT_ACC(tuples_newpage_updated, srcstats);
+		RELFSTAT_ACC(live_tuples, srcstats);
+		RELFSTAT_ACC(dead_tuples, srcstats);
+		RELFSTAT_ACC(mod_since_analyze, srcstats);
+		RELFSTAT_ACC(ins_since_vacuum, srcstats);
+		RELFSTAT_ACC(blocks_fetched, srcstats);
+		RELFSTAT_ACC(blocks_hit, srcstats);
+		RELFSTAT_ACC(vacuum_count, srcstats);
+		RELFSTAT_ACC(autovacuum_count, srcstats);
+		RELFSTAT_ACC(analyze_count, srcstats);
+		RELFSTAT_ACC(autoanalyze_count, srcstats);
+		RELFSTAT_ACC(total_vacuum_time, srcstats);
+		RELFSTAT_ACC(total_autovacuum_time, srcstats);
+		RELFSTAT_ACC(total_analyze_time, srcstats);
+		RELFSTAT_ACC(total_autoanalyze_time, srcstats);
+#undef RELFSTAT_ACC
+
+		/* Replace those statistics */
+#define RELFSTAT_REP(fld, stats_to_rep) \
+	(dstshstats->stats.fld = stats_to_rep->fld)
+		RELFSTAT_REP(lastscan, srcstats);
+		RELFSTAT_REP(last_vacuum_time, srcstats);
+		RELFSTAT_REP(last_autovacuum_time, srcstats);
+		RELFSTAT_REP(last_analyze_time, srcstats);
+		RELFSTAT_REP(last_autoanalyze_time, srcstats);
+#undef RELFSTAT_REP
+	}
 
 	pgstat_unlock_entry(dst_ref);
 }
@@ -136,6 +194,7 @@ void
 pgstat_assoc_relation(Relation rel)
 {
 	RelFileLocator locator;
+	PgStat_TableStatus *pgstat_info;
 
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
@@ -165,14 +224,54 @@ pgstat_assoc_relation(Relation rel)
 		locator.relNumber = rel->rd_id;
 	}
 
+	/*
+	 * If this relation was rewritten during the current transaction we may be
+	 * reopening it with its new RelFileLocator. In that case, continue using
+	 * the stats entry associated with the old locator rather than creating a
+	 * new one. This ensures all stats from before and after the rewrite are
+	 * tracked in a single entry which will be properly copied to the new
+	 * locator at transaction commit.
+	 */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (locator.dbOid == rewrite->new_locator.dbOid &&
+				locator.spcOid == rewrite->new_locator.spcOid &&
+				locator.relNumber == rewrite->new_locator.relNumber)
+			{
+				pgstat_info = pgstat_prep_relation_pending(rewrite->old_locator);
+				goto found_entry;
+			}
+		}
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(locator);
+	pgstat_info = pgstat_prep_relation_pending(locator);
+
+found_entry:
+	rel->pgstat_info = pgstat_info;
+
+	/*
+	 * For relations stats, we key by physical file location, not by relation
+	 * OID. This means during operations like ALTER TYPE it's possible that
+	 * the relation OID changes but the relfilenode stays the same (no actual
+	 * rewrite needed). Unlink the old relation first.
+	 */
+	if (pgstat_info->relation != NULL &&
+		pgstat_info->relation != rel)
+	{
+		pgstat_info->relation->pgstat_info = NULL;
+		pgstat_info->relation = NULL;
+	}
 
 	/* don't allow link a stats to multiple relcache entries */
-	Assert(rel->pgstat_info->relation == NULL);
+	Assert(pgstat_info->relation == NULL);
 
 	/* mark this relation as the owner */
-	rel->pgstat_info->relation = rel;
+	pgstat_info->relation = rel;
 }
 
 /*
@@ -215,14 +314,37 @@ pgstat_drop_relation(Relation rel)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
+	bool		skip_transactional_drop = false;
 
 	/* don't track stats for relations without storage */
 	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
 		return;
 
-	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_locator.dbOid,
-							  RelFileLocatorToPgStatObjid(rel->rd_locator));
+	/* Check if this drop is part of a pending rewrite */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (rel->rd_locator.dbOid == rewrite->old_locator.dbOid &&
+				rel->rd_locator.spcOid == rewrite->old_locator.spcOid &&
+				rel->rd_locator.relNumber == rewrite->old_locator.relNumber)
+			{
+				skip_transactional_drop = true;
+				break;
+			}
+		}
+	}
+
+	/*
+	 * If it is part of a rewrite, drop its stats later, for example in
+	 * AtEOXact_PgStat_Relations(), so skip it here.
+	 */
+	if (!skip_transactional_drop)
+		pgstat_drop_transactional(PGSTAT_KIND_RELATION,
+								  rel->rd_locator.dbOid,
+								  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -660,6 +782,48 @@ AtEOXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit)
 		}
 		tabstat->trans = NULL;
 	}
+
+	/* preserve the stats in case of rewrite */
+	if (isCommit && pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+		PgStat_PendingRewrite *prev = NULL;
+		PgStat_PendingRewrite *current = pending_rewrites;
+		PgStat_PendingRewrite *next;
+
+		/* reverse the rewrites list to process in chronological order */
+		while (current != NULL)
+		{
+			next = current->next;
+			current->next = prev;
+			prev = current;
+			current = next;
+		}
+
+		/* now process rewrites in chronological order */
+		for (rewrite = prev; rewrite != NULL; rewrite = rewrite->next)
+		{
+			PgStat_EntryRef *old_entry_ref;
+
+			old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+													   rewrite->old_locator.dbOid,
+													   RelFileLocatorToPgStatObjid(rewrite->old_locator));
+
+			if (old_entry_ref && old_entry_ref->pending)
+				pgstat_relation_flush_cb(old_entry_ref, false);
+
+			pgstat_copy_relation_stats(rewrite->new_locator,
+									   rewrite->old_locator, true);
+
+			/* drop old locator's stats */
+			if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+								   rewrite->old_locator.dbOid,
+								   RelFileLocatorToPgStatObjid(rewrite->old_locator)))
+				pgstat_request_entry_refs_gc();
+		}
+	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -675,6 +839,30 @@ AtEOSubXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit, in
 	PgStat_TableXactStatus *trans;
 	PgStat_TableXactStatus *next_trans;
 
+	/*
+	 * If we don't commit then remove the associated rewrites if any, to keep
+	 * the rewrite chain in sync with what will be eventually committed.
+	 */
+	if (!isCommit)
+	{
+		PgStat_PendingRewrite **rewrite_ptr = &pending_rewrites;
+
+		while (*rewrite_ptr != NULL)
+		{
+			if ((*rewrite_ptr)->nest_level >= nestDepth)
+			{
+				PgStat_PendingRewrite *to_remove = *rewrite_ptr;
+
+				*rewrite_ptr = (*rewrite_ptr)->next;
+				pfree(to_remove);
+			}
+			else
+			{
+				rewrite_ptr = &((*rewrite_ptr)->next);
+			}
+		}
+	}
+
 	for (trans = xact_state->first; trans != NULL; trans = next_trans)
 	{
 		PgStat_TableStatus *tabstat;
@@ -754,11 +942,19 @@ void
 AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 {
 	PgStat_TableXactStatus *trans;
+	PgStat_PendingRewrite *rewrite;
 
+	/*
+	 * For each tabstat, find its matching rewrite and remove it from the
+	 * pending rewrites list. This way, after processing all tabstats, pending
+	 * rewrites will only contain rewrite only transactions.
+	 */
 	for (trans = xact_state->first; trans != NULL; trans = trans->next)
 	{
 		PgStat_TableStatus *tabstat PG_USED_FOR_ASSERTS_ONLY;
 		TwoPhasePgStatRecord record;
+		PgStat_PendingRewrite **rewrite_ptr;
+		bool		found_rewrite = false;
 
 		Assert(trans->nest_level == 1);
 		Assert(trans->upper == NULL);
@@ -778,10 +974,83 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 			record.locator = tabstat->locator;
 
 		record.truncdropped = trans->truncdropped;
+		record.rewrite_nest_level = 0;
+
+		/*
+		 * Look for a matching rewrite and remove it from pending rewrites. We
+		 * check three possible matches:
+		 *
+		 * The new_locator when stats have been added after the rewrite. The
+		 * old_locator when stats have been added before the rewrite but not
+		 * after. The original_locator when this tabstat is part of a rewrite
+		 * chain.
+		 */
+		rewrite_ptr = &pending_rewrites;
+		while (*rewrite_ptr != NULL)
+		{
+			rewrite = *rewrite_ptr;
+
+			if ((record.locator.dbOid == rewrite->new_locator.dbOid &&
+				 record.locator.spcOid == rewrite->new_locator.spcOid &&
+				 record.locator.relNumber == rewrite->new_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->old_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->old_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->old_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->original_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->original_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->original_locator.relNumber))
+			{
+				/*
+				 * Found matching rewrite. Record the rewrite information and
+				 * remove this rewrite from the list since it's now handled.
+				 */
+				record.rewrite_old_locator = rewrite->original_locator;
+				record.rewrite_nest_level = rewrite->nest_level;
+				record.locator = rewrite->new_locator;
+				found_rewrite = true;
+
+				/* Remove from pending_rewrites list */
+				*rewrite_ptr = rewrite->next;
+				pfree(rewrite);
+				break;
+			}
+			else
+			{
+				/* Move to next rewrite in the list */
+				rewrite_ptr = &(rewrite->next);
+			}
+		}
+
+		/* If no rewrite found, clear the rewrite fields */
+		if (!found_rewrite)
+		{
+			memset(&record.rewrite_old_locator, 0, sizeof(RelFileLocator));
+		}
+
+		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
+							   &record, sizeof(TwoPhasePgStatRecord));
+	}
+
+	/*
+	 * Now process any rewrites still pending. These are rewrite only
+	 * transactions. We need to preserve their stats even though there's no
+	 * tabstat entry for them.
+	 */
+	for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+	{
+		TwoPhasePgStatRecord record;
+
+		memset(&record, 0, sizeof(TwoPhasePgStatRecord));
+		record.locator = rewrite->new_locator;
+		record.rewrite_old_locator = rewrite->original_locator;
+		record.rewrite_nest_level = rewrite->nest_level;
+		record.truncdropped = false;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
 							   &record, sizeof(TwoPhasePgStatRecord));
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -804,6 +1073,8 @@ PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		tabstat = trans->parent;
 		tabstat->trans = NULL;
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -839,6 +1110,29 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	pgstat_info->counts.changed_tuples +=
 		rec->tuples_inserted + rec->tuples_updated +
 		rec->tuples_deleted;
+
+	if (rec->rewrite_nest_level > 0)
+	{
+		PgStat_EntryRef *old_entry_ref;
+
+		/* Flush any pending stats for old locator first */
+		old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+												   rec->rewrite_old_locator.dbOid,
+												   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator));
+
+		if (old_entry_ref && old_entry_ref->pending)
+			pgstat_relation_flush_cb(old_entry_ref, false);
+
+		/* Copy stats from old to new locator */
+		pgstat_copy_relation_stats(rec->locator, rec->rewrite_old_locator,
+								   true);
+
+		/* Drop old locator's stats */
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+							   rec->rewrite_old_locator.dbOid,
+							   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator)))
+			pgstat_request_entry_refs_gc();
+	}
 }
 
 /*
@@ -853,9 +1147,26 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 {
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
+	RelFileLocator target_locator;
+
+	/*
+	 * For aborted transactions with rewrites (like TRUNCATE), we need to
+	 * restore stats to the old locator, not the new one. The new locator
+	 * should be dropped since the rewrite is being rolled back.
+	 */
+	if (rec->rewrite_nest_level > 0)
+	{
+		/* Use the old locator */
+		target_locator = rec->rewrite_old_locator;
+	}
+	else
+	{
+		/* No rewrite, use the original locator */
+		target_locator = rec->locator;
+	}
 
 	/* Find or create a tabstat entry for the target locator */
-	pgstat_info = pgstat_prep_relation_pending(rec->locator);
+	pgstat_info = pgstat_prep_relation_pending(target_locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -910,7 +1221,17 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	tabentry->numscans += lstats->counts.numscans;
 	if (lstats->counts.numscans)
 	{
-		TimestampTz t = GetCurrentTransactionStopTimestamp();
+		TimestampTz t;
+
+		/*
+		 * Checking the transaction state due to the flush call in
+		 * pgstat_twophase_postcommit() that would break the assertion on the
+		 * state in GetCurrentTransactionStopTimestamp().
+		 */
+		if (!IsTransactionState())
+			t = GetCurrentTransactionStopTimestamp();
+		else
+			t = GetCurrentTimestamp();
 
 		if (t > tabentry->lastscan)
 			tabentry->lastscan = t;
@@ -1162,3 +1483,45 @@ pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
 	ReleaseSysCache(tuple);
 	return result;
 }
+
+/*
+ * Mark that a relation rewrite has occurred, preserving the original locator
+ * so stats can be copied at transaction commit.
+ */
+void
+pgstat_mark_rewrite(RelFileLocator old_locator, RelFileLocator new_locator)
+{
+	PgStat_PendingRewrite *rewrite;
+	PgStat_PendingRewrite *existing;
+	RelFileLocator original_locator = old_locator;
+
+	for (existing = pending_rewrites; existing != NULL; existing = existing->next)
+	{
+		if (old_locator.dbOid == existing->new_locator.dbOid &&
+			old_locator.spcOid == existing->new_locator.spcOid &&
+			old_locator.relNumber == existing->new_locator.relNumber)
+		{
+			original_locator = existing->original_locator;
+			break;
+		}
+	}
+
+	/* Allocate in TopTransactionContext memory context */
+	rewrite = MemoryContextAlloc(TopTransactionContext,
+								 sizeof(PgStat_PendingRewrite));
+
+	rewrite->old_locator = old_locator;
+	rewrite->new_locator = new_locator;
+	rewrite->original_locator = original_locator;
+	rewrite->nest_level = GetCurrentTransactionNestLevel();
+
+	/* Add to the list */
+	rewrite->next = pending_rewrites;
+	pending_rewrites = rewrite;
+}
+
+void
+pgstat_clear_rewrite(void)
+{
+	pending_rewrites = NULL;
+}
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index bc9864bd8d9..f8cf3755ce2 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -55,6 +55,8 @@ AtEOXact_PgStat(bool isCommit, bool parallel)
 	}
 	pgStatXactStack = NULL;
 
+	pgstat_clear_rewrite();
+
 	/* Make sure any stats snapshot is thrown away */
 	pgstat_clear_snapshot();
 }
@@ -360,8 +362,29 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid, bo
 void
 pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objid, false, NULL))
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+
+	if (entry_ref)
 	{
+		/*
+		 * For relations stats, we key by physical file location, not by
+		 * relation OID. This means during operations like ALTER TYPE where
+		 * the relation OID changes but the relfilenode stays the same (no
+		 * actual rewrite needed), we'll find an existing entry.
+		 *
+		 * This is expected behavior, we want to preserve stats across the
+		 * catalog change. Simply reset and recreate the entry for the new
+		 * relation OID without warning.
+		 */
+		if (kind == PGSTAT_KIND_RELATION)
+		{
+			pgstat_reset(kind, dboid, objid);
+			create_drop_transactional_internal(kind, dboid, objid, true);
+			return;
+		}
+
 		ereport(WARNING,
 				errmsg("resetting existing statistics for kind %s, db=%u, oid=%" PRIu64,
 					   (pgstat_get_kind_info(kind))->name, dboid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 2d0cb7bcfd4..c98e5c51d63 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -85,6 +85,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/relmapper.h"
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
@@ -3780,6 +3781,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	MultiXactId minmulti = InvalidMultiXactId;
 	TransactionId freezeXid = InvalidTransactionId;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator = relation->rd_locator;
 
 	if (!IsBinaryUpgrade)
 	{
@@ -3951,6 +3953,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 
 	table_close(pg_class, RowExclusiveLock);
 
+	/* Mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Make the pg_class row change or relation map change visible.  This will
 	 * cause the relcache entry to get updated, too.
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index c9e451094f3..33d530740d0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -670,7 +670,7 @@ extern PgStat_FunctionCounts *find_funcstat_entry(Oid func_id);
 
 extern void pgstat_create_relation(Relation rel);
 extern void pgstat_drop_relation(Relation rel);
-extern void pgstat_copy_relation_stats(Relation dst, Relation src);
+extern void pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment);
 
 extern void pgstat_init_relation(Relation rel);
 extern void pgstat_assoc_relation(Relation rel);
@@ -682,6 +682,9 @@ extern void pgstat_report_vacuum(RelFileLocator locator, PgStat_Counter livetupl
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter, TimestampTz starttime);
+extern void pgstat_mark_rewrite(RelFileLocator old_locator,
+								RelFileLocator new_locator);
+extern void pgstat_clear_rewrite(void);
 
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3451538565e..0ca3eae8026 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2268,6 +2268,7 @@ PgStat_KindInfo
 PgStat_LocalState
 PgStat_PendingDroppedStatsItem
 PgStat_PendingIO
+PgStat_PendingRewrite
 PgStat_SLRUStats
 PgStat_ShmemControl
 PgStat_Snapshot
-- 
2.34.1

#44Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#43)
Re: relfilenode statistics

Hi,

On 2025-12-15 16:29:18 +0000, Bertrand Drouvot wrote:

From 7908ba56cb8b6255b869af6be13077aa0315d5f1 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 1 Oct 2025 09:45:26 +0000
Subject: [PATCH v8 1/2] Key PGSTAT_KIND_RELATION by relfile locator

This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
Instead of the relation oid, it now relies on:

- dboid (linked to RelFileLocator's dbOid)
- objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
that computes an objoid based on the RelFileLocator's spcOid and the
RelFileLocator's relNumber.

I think this needs to make more explicit that this works because the object ID
now is a uint64, and that both the inputs are 32 bits.

That will allow us to add new stats (add writes counters) and ensure that some
counters (n_dead_tup and friends) are replicated.

Yay.

The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
RelationIdGetRelation() to get the relfilelocator based on the relation oid
and 2) handle the partitioned table case.

Please note that:

- when running pg_stat_have_stats('relation',...) we now need to be connected
to the database that hosts the relation. As pg_stat_have_stats() is not
documented publicly, then the changes done in 029_stats_restart.pl look
enough.

That seems fine.

- this patch does not handle rewrites so some tests are failing. It's only
intent is to ease the review and should not be pushed without being
merged with the following patch that handles the rewrites.

Makes sense.

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..a9b2b4e1033 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -961,8 +961,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
* soon in cases where the failsafe prevented significant amounts of heap
* vacuuming.
*/
-	pgstat_report_vacuum(RelationGetRelid(rel),
-						 rel->rd_rel->relisshared,
+	pgstat_report_vacuum(rel->rd_locator,
Max(vacrel->new_live_tuples, 0),
vacrel->recently_dead_tuples +
vacrel->missed_dead_tuples,

Why not pass in the Relation itself? Given that we do that already for
pgstat_report_analyze(), it seems like that'd be an improvement even
independent of this change?

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..563a3697690 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2048,8 +2048,7 @@ do_autovacuum(void)
/* Fetch reloptions and the pgstat entry for this table */
relopts = extract_autovac_opts(tuple, pg_class_desc);
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_ext(relid);

/* Check if it needs vacuum or analyze */
relation_needs_vacanalyze(relid, relopts, classForm, tabentry,

I don't think this is good - now do_autovacuum() will do a separate syscache
lookup to fetch information the caller already has (due to the
pgstat_reloid_to_relfilelocator() in pgstat_fetch_stat_tabentry_ext()). That's
not too bad for things like viewing stats, but do_autovacuum() does this for
every table in a database...

@@ -326,9 +363,26 @@ pgstat_report_analyze(Relation rel,
ts = GetCurrentTimestamp();
elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);

+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use the relation OID as relNumber.
+		 * No collision with regular relations is possible because relNumbers
+		 * are also assigned from the pg_class OID space (see
+		 * GetNewRelFileNumber()), making each value unique across the
+		 * database regardless of spcOid.
+		 */

I don't think this is true as stated. Two reasons:

1) This afaict guarantees that the relfilenode will not class with oids, but
it does *NOT* guarantee that it does not clash with other relfilenodes

2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
creating a new relfilenode for an existing relation:
* If the relfilenumber will also be used as the relation's OID, pass the
* opened pg_class catalog, and this routine will guarantee that the result
* is also an unused OID within pg_class. If the result is to be used only
* as a relfilenumber for an existing relation, pass NULL for pg_class.

Greetings,

Andres Freund

#45Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#44)
Re: relfilenode statistics

On Mon, Dec 15, 2025 at 12:48:25PM -0500, Andres Freund wrote:

I don't think this is true as stated. Two reasons:

1) This afaict guarantees that the relfilenode will not clash with oids, but
it does *NOT* guarantee that it does not clash with other relfilenodes

2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
creating a new relfilenode for an existing relation:
* If the relfilenumber will also be used as the relation's OID, pass the
* opened pg_class catalog, and this routine will guarantee that the result
* is also an unused OID within pg_class. If the result is to be used only
* as a relfilenumber for an existing relation, pass NULL for pg_class.

FWIW, I am also still troubled by the part of the proposed patch set
where we are trying to hide the idea of a partitioned table has a
relfilenode set by using its relid instead in the key for the data.
This leads to a huge amount of complexity in the patch, mainly to
store data for autovacuum that we do not need at the end:
- autovacuum discards partitioned tables in do_autovacuum(), so the
stats related to partitioned tables that we need to select the
relations does not matter.
- manual vacuums may include partitioned tables to extract its
partitions, vacuum_rel() at the end discarding them. Well, stats
don't matter anyway.

We only need to attach three fields to let autovacuum know if a
relation needs to run or not: dead_tuples, ins_since_vacuum,
mod_since_analyze. Most the fields of PgStat_StatTabEntry make sense
only for tables, few are required by indexes for pg_stat_all_indexes.
Some fields actually make sense because they refer to on-disk files,
mostly for pg_statio_all_tables (blocks_fetched, blocks_hit).

Hence, why don't we split PgStat_StatTabEntry into three things from
the start, even if it means to duplicate some of them? Say:
- Table fields: includes [auto]vacuum/analyze data, block fields,
fields of pg_stat_all_tables.
- Index fields: no need for the [auto]vacuum/analyze time and counts,
block fields, pg_stat_all_indexes fields.
- Relfilenode fields: dead_tuples, ins_since_vacuum and
mod_since_analyze. Does not apply to partitioned tables and indexes,
only applies to tables. Provides a clean split, embrace the fact that
these are the only three fields we need to worry about during
recovery.
--
Michael

#46Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#44)
2 attachment(s)
Re: relfilenode statistics

Hi,

On Mon, Dec 15, 2025 at 12:48:25PM -0500, Andres Freund wrote:

On 2025-12-15 16:29:18 +0000, Bertrand Drouvot wrote:

From 7908ba56cb8b6255b869af6be13077aa0315d5f1 Mon Sep 17 00:00:00 2001

I think this needs to make more explicit that this works because the object ID
now is a uint64, and that both the inputs are 32 bits.

Yeah, it's now added in the commit message (mentioning b14e9ce7d55c).

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..a9b2b4e1033 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -961,8 +961,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
* soon in cases where the failsafe prevented significant amounts of heap
* vacuuming.
*/
-	pgstat_report_vacuum(RelationGetRelid(rel),
-						 rel->rd_rel->relisshared,
+	pgstat_report_vacuum(rel->rd_locator,
Max(vacrel->new_live_tuples, 0),
vacrel->recently_dead_tuples +
vacrel->missed_dead_tuples,

Why not pass in the Relation itself? Given that we do that already for
pgstat_report_analyze(), it seems like that'd be an improvement even
independent of this change?

Makes sense, done in [1]/messages/by-id/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal.

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..563a3697690 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2048,8 +2048,7 @@ do_autovacuum(void)
/* Fetch reloptions and the pgstat entry for this table */
relopts = extract_autovac_opts(tuple, pg_class_desc);
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_ext(relid);

/* Check if it needs vacuum or analyze */
relation_needs_vacanalyze(relid, relopts, classForm, tabentry,

I don't think this is good - now do_autovacuum() will do a separate syscache
lookup to fetch information the caller already has (due to the
pgstat_reloid_to_relfilelocator() in pgstat_fetch_stat_tabentry_ext()). That's
not too bad for things like viewing stats, but do_autovacuum() does this for
every table in a database...

Good point. In the attached I added pgstat_fetch_stat_tabentry_by_locator().
It's called directly in do_autovacuum() and also in pgstat_fetch_stat_tabentry_ext().

I did not check if there are other places where we can call pgstat_fetch_stat_tabentry_by_locator()
directly. I want first to validate this idea makes sense, does it?

I don't think this is true as stated. Two reasons:

1) This afaict guarantees that the relfilenode will not class with oids, but
it does *NOT* guarantee that it does not clash with other relfilenodes

2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
creating a new relfilenode for an existing relation:
* If the relfilenumber will also be used as the relation's OID, pass the
* opened pg_class catalog, and this routine will guarantee that the result
* is also an unused OID within pg_class. If the result is to be used only
* as a relfilenumber for an existing relation, pass NULL for pg_class.

Oh right, in case of OID wraparound. In the attached I added a new

"
#define PSEUDO_PARTITION_TABLE_SPCOID 1665
"

to ensure uniqueness then.

[1]: /messages/by-id/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v9-0001-Key-PGSTAT_KIND_RELATION-by-relfile-locator.patchtext/x-diff; charset=us-asciiDownload
From 2b0e5ab507d89a5f8f49036af5cbc1456f99939e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 1 Oct 2025 09:45:26 +0000
Subject: [PATCH v9 1/2] Key PGSTAT_KIND_RELATION by relfile locator

This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
Instead of the relation oid, it now relies on:

- dboid (linked to RelFileLocator's dbOid)
- objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
that computes an objoid based on the RelFileLocator's spcOid and the
RelFileLocator's relNumber.

This is possible as, since b14e9ce7d55c, the objoid is now uint64 and spcOid
and relNumber are 32 bits.

That will allow us to add new stats (add writes counters) and ensure that some
counters (n_dead_tup and friends) are replicated.

The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
RelationIdGetRelation() to get the relfilelocator based on the relation oid
and 2) handle the partitioned table case.

Please note that:

- when running pg_stat_have_stats('relation',...) we now need to be connected
to the database that hosts the relation. As pg_stat_have_stats() is not
documented publicly, then the changes done in 029_stats_restart.pl look
enough.

- this patch does not handle rewrites so some tests are failing. It's only
intent is to ease the review and should not be pushed without being
merged with the following patch that handles the rewrites.

- it can be used to test that stats are incremented correctly and that we're
able to retrieve them as long as rewrites are not involved.
---
 src/backend/access/heap/vacuumlazy.c         |   3 +-
 src/backend/postmaster/autovacuum.c          |  17 +-
 src/backend/utils/activity/pgstat_relation.c | 239 +++++++++++++++----
 src/backend/utils/adt/pgstatfuncs.c          |  22 +-
 src/include/catalog/pg_tablespace.dat        |   4 +
 src/include/catalog/pg_tablespace.h          |   8 +
 src/include/pgstat.h                         |  19 +-
 src/include/utils/pgstat_internal.h          |   1 +
 src/test/recovery/t/029_stats_restart.pl     |  40 ++--
 9 files changed, 275 insertions(+), 78 deletions(-)
   5.9% src/backend/postmaster/
  60.4% src/backend/utils/activity/
   4.9% src/backend/utils/adt/
   3.0% src/include/catalog/
   7.1% src/include/
  17.5% src/test/recovery/t/

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 62035b7f9c3..30778a15639 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -961,8 +961,7 @@ heap_vacuum_rel(Relation rel, const VacuumParams params,
 	 * soon in cases where the failsafe prevented significant amounts of heap
 	 * vacuuming.
 	 */
-	pgstat_report_vacuum(RelationGetRelid(rel),
-						 rel->rd_rel->relisshared,
+	pgstat_report_vacuum(rel,
 						 Max(vacrel->new_live_tuples, 0),
 						 vacrel->recently_dead_tuples +
 						 vacrel->missed_dead_tuples,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..a11174b25ad 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2014,12 +2014,16 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		RelFileLocator locator;
 
 		if (classForm->relkind != RELKIND_RELATION &&
 			classForm->relkind != RELKIND_MATVIEW)
 			continue;
 
 		relid = classForm->oid;
+		locator.dbOid = classForm->relisshared ? InvalidOid : MyDatabaseId;
+		locator.spcOid = classForm->reltablespace;
+		locator.relNumber = classForm->relfilenode;
 
 		/*
 		 * Check if it is a temp table (presumably, of some other backend's).
@@ -2048,8 +2052,7 @@ do_autovacuum(void)
 
 		/* Fetch reloptions and the pgstat entry for this table */
 		relopts = extract_autovac_opts(tuple, pg_class_desc);
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_by_locator(locator);
 
 		/* Check if it needs vacuum or analyze */
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
@@ -2114,6 +2117,7 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		RelFileLocator locator;
 
 		/*
 		 * We cannot safely process other backends' temp tables, so skip 'em.
@@ -2122,6 +2126,9 @@ do_autovacuum(void)
 			continue;
 
 		relid = classForm->oid;
+		locator.dbOid = classForm->relisshared ? InvalidOid : MyDatabaseId;
+		locator.spcOid = classForm->reltablespace;
+		locator.relNumber = classForm->relfilenode;
 
 		/*
 		 * fetch reloptions -- if this toast table does not have them, try the
@@ -2141,8 +2148,7 @@ do_autovacuum(void)
 		}
 
 		/* Fetch the pgstat entry for this table */
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_by_locator(locator);
 
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
 								  effective_multixact_freeze_max_age,
@@ -2939,8 +2945,7 @@ recheck_relation_needs_vacanalyze(Oid relid,
 	PgStat_StatTabEntry *tabentry;
 
 	/* fetch the pgstat table entry */
-	tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-											  relid);
+	tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
 							  effective_multixact_freeze_max_age,
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index b90754f8578..48bf93cae6e 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -17,12 +17,17 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_tablespace.h"
+#include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 #include "utils/rel.h"
+#include "utils/relmapper.h"
+#include "utils/syscache.h"
 #include "utils/timestamp.h"
 
 
@@ -36,13 +41,12 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter inserted_pre_truncdrop;
 	PgStat_Counter updated_pre_truncdrop;
 	PgStat_Counter deleted_pre_truncdrop;
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
 } TwoPhasePgStatRecord;
 
 
-static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+static PgStat_TableStatus *pgstat_prep_relation_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -60,8 +64,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(src->rd_rel->relisshared,
-											  RelationGetRelid(src));
+	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
 	if (!srcstats)
 		return;
 
@@ -94,8 +97,10 @@ pgstat_init_relation(Relation rel)
 
 	/*
 	 * We only count stats for relations with storage and partitioned tables
+	 * and we don't count stats generated during a rewrite.
 	 */
-	if (!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE)
+	if ((!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE) ||
+		OidIsValid(rel->rd_rel->relrewrite))
 	{
 		rel->pgstat_enabled = false;
 		rel->pgstat_info = NULL;
@@ -130,12 +135,37 @@ pgstat_init_relation(Relation rel)
 void
 pgstat_assoc_relation(Relation rel)
 {
+	RelFileLocator locator;
+
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
 
+	/*
+	 * Don't associate stats for relations without storage and non partitioned
+	 * tables.
+	 */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind) &&
+		rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		return;
+
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use a reserved pseudo tablespace
+		 * OID that cannot conflict with real tablespaces, and the relation
+		 * OID as relNumber. This ensures no collision with regular relations
+		 * even after OID wraparound.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = PSEUDO_PARTITION_TABLE_SPCOID;
+		locator.relNumber = rel->rd_id;
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(RelationGetRelid(rel),
-													rel->rd_rel->relisshared);
+	rel->pgstat_info = pgstat_prep_relation_pending(locator);
 
 	/* don't allow link a stats to multiple relcache entries */
 	Assert(rel->pgstat_info->relation == NULL);
@@ -167,9 +197,13 @@ pgstat_unlink_relation(Relation rel)
 void
 pgstat_create_relation(Relation rel)
 {
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
-								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								rel->rd_locator.dbOid,
+								RelFileLocatorToPgStatObjid(rel->rd_locator));
 }
 
 /*
@@ -181,9 +215,13 @@ pgstat_drop_relation(Relation rel)
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
 
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  rel->rd_locator.dbOid,
+							  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -207,27 +245,29 @@ pgstat_drop_relation(Relation rel)
  * Report that the table was just vacuumed and flush IO statistics.
  */
 void
-pgstat_report_vacuum(Oid tableoid, bool shared,
-					 PgStat_Counter livetuples, PgStat_Counter deadtuples,
-					 TimestampTz starttime)
+pgstat_report_vacuum(Relation rel, PgStat_Counter livetuples,
+					 PgStat_Counter deadtuples, TimestampTz starttime)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
+	RelFileLocator locator;
 
 	if (!pgstat_track_counts)
 		return;
 
+	locator = rel->rd_locator;
 	/* Store the data in the table's hash table entry. */
 	ts = GetCurrentTimestamp();
 	elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
 	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-											dboid, tableoid, false);
+											locator.dbOid,
+											RelFileLocatorToPgStatObjid(locator),
+											false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -286,9 +326,9 @@ pgstat_report_analyze(Relation rel,
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
+	RelFileLocator locator;
 
 	if (!pgstat_track_counts)
 		return;
@@ -326,9 +366,25 @@ pgstat_report_analyze(Relation rel,
 	ts = GetCurrentTimestamp();
 	elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
 
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use a reserved pseudo tablespace
+		 * OID that cannot conflict with real tablespaces, and the relation
+		 * OID as relNumber. This ensures no collision with regular relations
+		 * even after OID wraparound.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = PSEUDO_PARTITION_TABLE_SPCOID;
+		locator.relNumber = rel->rd_id;
+	}
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
-											RelationGetRelid(rel),
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
+											locator.dbOid,
+											RelFileLocatorToPgStatObjid(locator),
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -469,7 +525,16 @@ pgstat_update_heap_dead_tuples(Relation rel, int delta)
 PgStat_StatTabEntry *
 pgstat_fetch_stat_tabentry(Oid relid)
 {
-	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
+	return pgstat_fetch_stat_tabentry_ext(relid);
+}
+
+PgStat_StatTabEntry *
+pgstat_fetch_stat_tabentry_by_locator(RelFileLocator locator)
+{
+	return (PgStat_StatTabEntry *) pgstat_fetch_entry(
+													  PGSTAT_KIND_RELATION,
+													  locator.dbOid,
+													  RelFileLocatorToPgStatObjid(locator));
 }
 
 /*
@@ -477,12 +542,14 @@ pgstat_fetch_stat_tabentry(Oid relid)
  * whether the to-be-accessed table is a shared relation or not.
  */
 PgStat_StatTabEntry *
-pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
+pgstat_fetch_stat_tabentry_ext(Oid reloid)
 {
-	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
+	RelFileLocator locator;
 
-	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+	if (!pgstat_reloid_to_relfilelocator(reloid, &locator))
+		return NULL;
+
+	return pgstat_fetch_stat_tabentry_by_locator(locator);
 }
 
 /*
@@ -504,14 +571,17 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableXactStatus *trans;
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
+	RelFileLocator locator;
+
+	if (!pgstat_reloid_to_relfilelocator(rel_id, &locator))
+		return NULL;
+
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+										   locator.dbOid,
+										   RelFileLocatorToPgStatObjid(locator));
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
 	if (!entry_ref)
-	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
-		if (!entry_ref)
-			return tablestatus;
-	}
+		return tablestatus;
 
 	tabentry = (PgStat_TableStatus *) entry_ref->pending;
 	tablestatus = palloc_object(PgStat_TableStatus);
@@ -707,8 +777,12 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		record.inserted_pre_truncdrop = trans->inserted_pre_truncdrop;
 		record.updated_pre_truncdrop = trans->updated_pre_truncdrop;
 		record.deleted_pre_truncdrop = trans->deleted_pre_truncdrop;
-		record.id = tabstat->id;
-		record.shared = tabstat->shared;
+
+		if (tabstat->relation != NULL)
+			record.locator = tabstat->relation->rd_locator;
+		else
+			record.locator = tabstat->locator;
+
 		record.truncdropped = trans->truncdropped;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
@@ -751,7 +825,7 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	PgStat_TableStatus *pgstat_info;
 
 	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, commit case */
 	pgstat_info->counts.tuples_inserted += rec->tuples_inserted;
@@ -786,8 +860,8 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
 
-	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	/* Find or create a tabstat entry for the target locator */
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -921,17 +995,21 @@ pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
  * initialized if not exists.
  */
 static PgStat_TableStatus *
-pgstat_prep_relation_pending(Oid rel_id, bool isshared)
+pgstat_prep_relation_pending(RelFileLocator locator)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStat_TableStatus *pending;
+	uint64		objid;
+
+	objid = RelFileLocatorToPgStatObjid(locator);
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
-										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  locator.dbOid,
+										  objid, NULL);
+
 	pending = entry_ref->pending;
-	pending->id = rel_id;
-	pending->shared = isshared;
+	pending->id = objid;
+	pending->locator = locator;
 
 	return pending;
 }
@@ -1010,3 +1088,82 @@ restore_truncdrop_counters(PgStat_TableXactStatus *trans)
 		trans->tuples_deleted = trans->deleted_pre_truncdrop;
 	}
 }
+
+/*
+ * Convert a relation OID to its corresponding RelFileLocator for statistics
+ * tracking purposes.
+ *
+ * Returns true on success, false if the relation doesn't need statistics
+ * tracking.
+ *
+ * For partitioned tables, constructs a synthetic locator using the relation
+ * OID as relNumber, since they don't have storage.
+ */
+bool
+pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
+{
+	HeapTuple	tuple;
+	Form_pg_class relform;
+	bool		result = true;
+
+	/* get the relation's tuple from pg_class */
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(reloid));
+
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	relform = (Form_pg_class) GETSTRUCT(tuple);
+
+	/* skip relations without storage and non partitioned tables */
+	if (!RELKIND_HAS_STORAGE(relform->relkind) &&
+		relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		ReleaseSysCache(tuple);
+		return false;
+	}
+
+	if (relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		/* build the RelFileLocator */
+		locator->relNumber = relform->relfilenode;
+		locator->spcOid = relform->reltablespace;
+
+		/* handle default tablespace */
+		if (!OidIsValid(locator->spcOid))
+			locator->spcOid = MyDatabaseTableSpace;
+
+		/* handle dbOid for global vs local relations */
+		if (locator->spcOid == GLOBALTABLESPACE_OID)
+			locator->dbOid = InvalidOid;
+		else
+			locator->dbOid = MyDatabaseId;
+
+		/* handle mapped relations */
+		if (!RelFileNumberIsValid(locator->relNumber))
+		{
+			locator->relNumber = RelationMapOidToFilenumber(reloid,
+															relform->relisshared);
+			if (!RelFileNumberIsValid(locator->relNumber))
+			{
+				ReleaseSysCache(tuple);
+				return false;
+			}
+		}
+	}
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use a reserved pseudo tablespace
+		 * OID that cannot conflict with real tablespaces, and the relation
+		 * OID as relNumber. This ensures no collision with regular relations
+		 * even after OID wraparound.
+		 */
+		locator->dbOid = (relform->relisshared ? InvalidOid : MyDatabaseId);
+		locator->spcOid = PSEUDO_PARTITION_TABLE_SPCOID;
+		locator->relNumber = relform->oid;
+	}
+
+	ReleaseSysCache(tuple);
+	return result;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ef6fffe60b9..60ffb1679ec 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -23,13 +23,13 @@
 #include "common/ip.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "replication/logicallauncher.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/pgstat_internal.h"
 #include "utils/timestamp.h"
 
 #define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
@@ -1949,9 +1949,14 @@ Datum
 pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 {
 	Oid			taboid = PG_GETARG_OID(0);
-	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
+	RelFileLocator locator;
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	/* Get the stats locator from the relation OID */
+	if (!pgstat_reloid_to_relfilelocator(taboid, &locator))
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_RELATION, locator.dbOid,
+				 RelFileLocatorToPgStatObjid(locator));
 
 	PG_RETURN_VOID();
 }
@@ -2305,5 +2310,16 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	uint64		objid = PG_GETARG_INT64(2);
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
+	/* Convert relation OID to relfilenode objid */
+	if (kind == PGSTAT_KIND_RELATION)
+	{
+		RelFileLocator locator;
+
+		if (!pgstat_reloid_to_relfilelocator(objid, &locator))
+			PG_RETURN_BOOL(false);
+
+		objid = RelFileLocatorToPgStatObjid(locator);
+	}
+
 	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objid));
 }
diff --git a/src/include/catalog/pg_tablespace.dat b/src/include/catalog/pg_tablespace.dat
index 1302a3d75cd..9430970fffd 100644
--- a/src/include/catalog/pg_tablespace.dat
+++ b/src/include/catalog/pg_tablespace.dat
@@ -10,6 +10,10 @@
 #
 #----------------------------------------------------------------------
 
+/*
+ * When adding a new one, ensure it does not conflict with
+ * PSEUDO_PARTITION_TABLE_SPCOID.
+ */
 [
 
 { oid => '1663', oid_symbol => 'DEFAULTTABLESPACE_OID',
diff --git a/src/include/catalog/pg_tablespace.h b/src/include/catalog/pg_tablespace.h
index 7816d779d8c..0e2d8051d69 100644
--- a/src/include/catalog/pg_tablespace.h
+++ b/src/include/catalog/pg_tablespace.h
@@ -21,6 +21,14 @@
 #include "catalog/genbki.h"
 #include "catalog/pg_tablespace_d.h"	/* IWYU pragma: export */
 
+/*
+ * Reserved tablespace OID for partitioned table pseudo locators.
+ * This is not an actual tablespace, just a reserved value to distinguish
+ * partitioned table statistics from regular table statistics. Ensures it does
+ * not conflict with the ones in pg_tablespace.dat.
+ */
+#define PSEUDO_PARTITION_TABLE_SPCOID 1665
+
 /* ----------------
  *		pg_tablespace definition.  cpp turns this into
  *		typedef struct FormData_pg_tablespace
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f23dd5870da..3102f86aa24 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -17,6 +17,7 @@
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
 #include "replication/worker_internal.h"
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/backend_status.h"	/* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/pgstat_kind.h"
@@ -35,6 +36,12 @@
 /* Default directory to store temporary statistics data in */
 #define PG_STAT_TMP_DIR		"pg_stat_tmp"
 
+/*
+ * Build a pgstat key Objid based on a RelFileLocator.
+ */
+#define RelFileLocatorToPgStatObjid(locator) \
+	(((uint64) (locator).spcOid << 32) | (locator).relNumber)
+
 /* Values for track_functions GUC variable --- order is significant! */
 typedef enum TrackFunctionsLevel
 {
@@ -175,11 +182,11 @@ typedef struct PgStat_TableCounts
  */
 typedef struct PgStat_TableStatus
 {
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	uint64		id;				/* hash of relfilelocator for stats key */
 	struct PgStat_TableXactStatus *trans;	/* lowest subxact's counts */
 	PgStat_TableCounts counts;	/* event counts to be sent */
 	Relation	relation;		/* rel that is using this entry */
+	RelFileLocator locator;		/* table's relfilelocator */
 } PgStat_TableStatus;
 
 /* ----------
@@ -669,8 +676,8 @@ extern void pgstat_init_relation(Relation rel);
 extern void pgstat_assoc_relation(Relation rel);
 extern void pgstat_unlink_relation(Relation rel);
 
-extern void pgstat_report_vacuum(Oid tableoid, bool shared,
-								 PgStat_Counter livetuples, PgStat_Counter deadtuples,
+extern void pgstat_report_vacuum(Relation rel, PgStat_Counter livetuples,
+								 PgStat_Counter deadtuples,
 								 TimestampTz starttime);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
@@ -735,8 +742,8 @@ extern void pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
-extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
-														   Oid reloid);
+extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_by_locator(RelFileLocator locator);
+extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
 
 
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 5c1ce4d3d6a..7b24928b00d 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -764,6 +764,7 @@ extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 extern void pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
+extern bool pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 021e2bf361f..3a9c05eaf10 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -55,10 +55,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -79,10 +79,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -96,10 +96,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -110,10 +110,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -129,10 +129,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -145,10 +145,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -307,9 +307,9 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objid) = @_;
+	my ($db, $kind, $dboid, $objid) = @_;
 
-	return $node->safe_psql($connect_db,
+	return $node->safe_psql($db,
 		"SELECT pg_stat_have_stats('$kind', $dboid, $objid)");
 }
 
-- 
2.34.1

v9-0002-handle-relation-statistics-correctly-during-rewri.patchtext/x-diff; charset=us-asciiDownload
From 0f4200a12e5f07a42200a2c0a9f4d50719caf05d Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 4 Nov 2025 13:52:46 +0000
Subject: [PATCH v9 2/2] handle relation statistics correctly during rewrites

Now that PGSTAT_KIND_RELATION is keyed by refilenode, we need to handle rewrites.

To do so, this patch:

- Adds PgStat_PendingRewrite, a new struct to track rewrite operations within
a transaction, storing the old locator, new locator, and original locator (for
rewrite chains). This allows stats to be copied from the original location to
the final location at commit time.

- Adds a new function, pgstat_mark_rewrite(), called when a table rewrite begins.
It records the rewrite operation in a local list and detects rewrite chains by
checking if the old_locator matches any existing new_locator, preserving the
chain's original_locator.

- Modifies pgstat_copy_relation_stats(), to accept RelFileLocators instead of
Relations, with a new increment parameter to accumulate stats (needed for rewrite
chains with DML between rewrites).

- Ensures that AtEOXact_PgStat_Relations(), AtPrepare_PgStat_Relations(),
pgstat_twophase_postcommit()/postabort() pgstat_drop_relation() handle the
PgStat_PendingRewrite list correctly.

Note that due to the new flush call in pgstat_twophase_postcommit() we can not
call GetCurrentTransactionStopTimestamp() in pgstat_relation_flush_cb(). So,
adding a check to handle this special case and call GetCurrentTimestamp() instead.
Note that we'd call GetCurrentTimestamp() only if there is a rewrite, so that
the GetCurrentTimestamp() extra cost should be negligible. Another solution
could be to trigger the flush from FinishPreparedTransaction() but that's not
worth the extra complexity.

The new pending_rewrites list is traversed in multiple places. The overhead
should be negligible in comparison to a rewrite and the list should not contain
a lot of rewrites in practice.

The pending_rewrites list is traversed in multiple places. In typical usage,
the list will contain only a few entries so the traversal cost is negligible (
furthermore in comparison to a rewrite).
---
 src/backend/catalog/index.c                  |   2 +-
 src/backend/commands/cluster.c               |   5 +
 src/backend/commands/tablecmds.c             |   6 +
 src/backend/utils/activity/pgstat_relation.c | 391 ++++++++++++++++++-
 src/backend/utils/activity/pgstat_xact.c     |  25 +-
 src/backend/utils/cache/relcache.c           |   6 +
 src/include/pgstat.h                         |   5 +-
 src/tools/pgindent/typedefs.list             |   1 +
 8 files changed, 424 insertions(+), 17 deletions(-)
  92.8% src/backend/utils/activity/
   4.9% src/backend/

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8dea58ad96b..b71925a22c3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1795,7 +1795,7 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	changeDependenciesOn(RelationRelationId, oldIndexId, newIndexId);
 
 	/* copy over statistics from old to new index */
-	pgstat_copy_relation_stats(newClassRel, oldClassRel);
+	pgstat_copy_relation_stats(newClassRel->rd_locator, oldClassRel->rd_locator, false);
 
 	/* Copy data of pg_statistic from the old index to the new one */
 	CopyStatistics(oldIndexId, newIndexId);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 2120c85ccb4..6155b12afab 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1196,6 +1196,11 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 
 		rel1 = relation_open(r1, NoLock);
 		rel2 = relation_open(r2, NoLock);
+
+		/* Mark that a rewrite happened */
+		if (RELKIND_HAS_STORAGE(rel1->rd_rel->relkind))
+			pgstat_mark_rewrite(rel1->rd_locator, rel2->rd_locator);
+
 		rel2->rd_createSubid = rel1->rd_createSubid;
 		rel2->rd_newRelfilelocatorSubid = rel1->rd_newRelfilelocatorSubid;
 		rel2->rd_firstRelfilelocatorSubid = rel1->rd_firstRelfilelocatorSubid;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..9de70f321ed 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16884,6 +16884,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	Oid			reltoastrelid;
 	RelFileNumber newrelfilenumber;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator;
 	List	   *reltoastidxids = NIL;
 	ListCell   *lc;
 
@@ -16922,6 +16923,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	newrlocator = rel->rd_locator;
 	newrlocator.relNumber = newrelfilenumber;
 	newrlocator.spcOid = newTableSpace;
+	oldrlocator = rel->rd_locator;
 
 	/* hand off to AM to actually create new rel storage and copy the data */
 	if (rel->rd_rel->relkind == RELKIND_INDEX)
@@ -16934,6 +16936,10 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 		table_relation_copy_data(rel, &newrlocator);
 	}
 
+	/* mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Update the pg_class row.
 	 *
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 48bf93cae6e..b6ba8fdf440 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -30,6 +30,19 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+/* Pending rewrite operations for stats copying */
+typedef struct PgStat_PendingRewrite
+{
+	RelFileLocator old_locator;
+	RelFileLocator new_locator;
+	RelFileLocator original_locator;
+	int			nest_level;		/* Transaction nesting level where rewrite
+								 * occurred */
+	struct PgStat_PendingRewrite *next;
+} PgStat_PendingRewrite;
+
+/* The pending rewrites list for current transaction */
+static PgStat_PendingRewrite *pending_rewrites = NULL;
 
 /* Record that's written to 2PC state file when pgstat state is persisted */
 typedef struct TwoPhasePgStatRecord
@@ -43,6 +56,8 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter deleted_pre_truncdrop;
 	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
+	RelFileLocator rewrite_old_locator;
+	int			rewrite_nest_level;
 } TwoPhasePgStatRecord;
 
 
@@ -54,27 +69,70 @@ static void restore_truncdrop_counters(PgStat_TableXactStatus *trans);
 
 
 /*
- * Copy stats between relations. This is used for things like REINDEX
+ * Copy stats between RelFileLocator. This is used for things like REINDEX
  * CONCURRENTLY.
  */
 void
-pgstat_copy_relation_stats(Relation dst, Relation src)
+pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment)
 {
 	PgStat_StatTabEntry *srcstats;
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
+	srcstats = (PgStat_StatTabEntry *) pgstat_fetch_entry(PGSTAT_KIND_RELATION,
+														  src.dbOid,
+														  RelFileLocatorToPgStatObjid(src));
 	if (!srcstats)
 		return;
 
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-										  RelationGetRelid(dst),
+										  dst.dbOid,
+										  RelFileLocatorToPgStatObjid(dst),
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
-	dstshstats->stats = *srcstats;
+
+	if (!increment)
+		dstshstats->stats = *srcstats;
+	else
+	{
+		/* Increment those statistics */
+#define RELFSTAT_ACC(fld, stats_to_add) \
+	(dstshstats->stats.fld += stats_to_add->fld)
+		RELFSTAT_ACC(numscans, srcstats);
+		RELFSTAT_ACC(tuples_returned, srcstats);
+		RELFSTAT_ACC(tuples_fetched, srcstats);
+		RELFSTAT_ACC(tuples_inserted, srcstats);
+		RELFSTAT_ACC(tuples_updated, srcstats);
+		RELFSTAT_ACC(tuples_deleted, srcstats);
+		RELFSTAT_ACC(tuples_hot_updated, srcstats);
+		RELFSTAT_ACC(tuples_newpage_updated, srcstats);
+		RELFSTAT_ACC(live_tuples, srcstats);
+		RELFSTAT_ACC(dead_tuples, srcstats);
+		RELFSTAT_ACC(mod_since_analyze, srcstats);
+		RELFSTAT_ACC(ins_since_vacuum, srcstats);
+		RELFSTAT_ACC(blocks_fetched, srcstats);
+		RELFSTAT_ACC(blocks_hit, srcstats);
+		RELFSTAT_ACC(vacuum_count, srcstats);
+		RELFSTAT_ACC(autovacuum_count, srcstats);
+		RELFSTAT_ACC(analyze_count, srcstats);
+		RELFSTAT_ACC(autoanalyze_count, srcstats);
+		RELFSTAT_ACC(total_vacuum_time, srcstats);
+		RELFSTAT_ACC(total_autovacuum_time, srcstats);
+		RELFSTAT_ACC(total_analyze_time, srcstats);
+		RELFSTAT_ACC(total_autoanalyze_time, srcstats);
+#undef RELFSTAT_ACC
+
+		/* Replace those statistics */
+#define RELFSTAT_REP(fld, stats_to_rep) \
+	(dstshstats->stats.fld = stats_to_rep->fld)
+		RELFSTAT_REP(lastscan, srcstats);
+		RELFSTAT_REP(last_vacuum_time, srcstats);
+		RELFSTAT_REP(last_autovacuum_time, srcstats);
+		RELFSTAT_REP(last_analyze_time, srcstats);
+		RELFSTAT_REP(last_autoanalyze_time, srcstats);
+#undef RELFSTAT_REP
+	}
 
 	pgstat_unlock_entry(dst_ref);
 }
@@ -136,6 +194,7 @@ void
 pgstat_assoc_relation(Relation rel)
 {
 	RelFileLocator locator;
+	PgStat_TableStatus *pgstat_info;
 
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
@@ -164,14 +223,54 @@ pgstat_assoc_relation(Relation rel)
 		locator.relNumber = rel->rd_id;
 	}
 
+	/*
+	 * If this relation was rewritten during the current transaction we may be
+	 * reopening it with its new RelFileLocator. In that case, continue using
+	 * the stats entry associated with the old locator rather than creating a
+	 * new one. This ensures all stats from before and after the rewrite are
+	 * tracked in a single entry which will be properly copied to the new
+	 * locator at transaction commit.
+	 */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (locator.dbOid == rewrite->new_locator.dbOid &&
+				locator.spcOid == rewrite->new_locator.spcOid &&
+				locator.relNumber == rewrite->new_locator.relNumber)
+			{
+				pgstat_info = pgstat_prep_relation_pending(rewrite->old_locator);
+				goto found_entry;
+			}
+		}
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(locator);
+	pgstat_info = pgstat_prep_relation_pending(locator);
+
+found_entry:
+	rel->pgstat_info = pgstat_info;
+
+	/*
+	 * For relations stats, we key by physical file location, not by relation
+	 * OID. This means during operations like ALTER TYPE it's possible that
+	 * the relation OID changes but the relfilenode stays the same (no actual
+	 * rewrite needed). Unlink the old relation first.
+	 */
+	if (pgstat_info->relation != NULL &&
+		pgstat_info->relation != rel)
+	{
+		pgstat_info->relation->pgstat_info = NULL;
+		pgstat_info->relation = NULL;
+	}
 
 	/* don't allow link a stats to multiple relcache entries */
-	Assert(rel->pgstat_info->relation == NULL);
+	Assert(pgstat_info->relation == NULL);
 
 	/* mark this relation as the owner */
-	rel->pgstat_info->relation = rel;
+	pgstat_info->relation = rel;
 }
 
 /*
@@ -214,14 +313,37 @@ pgstat_drop_relation(Relation rel)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
+	bool		skip_transactional_drop = false;
 
 	/* don't track stats for relations without storage */
 	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
 		return;
 
-	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_locator.dbOid,
-							  RelFileLocatorToPgStatObjid(rel->rd_locator));
+	/* Check if this drop is part of a pending rewrite */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (rel->rd_locator.dbOid == rewrite->old_locator.dbOid &&
+				rel->rd_locator.spcOid == rewrite->old_locator.spcOid &&
+				rel->rd_locator.relNumber == rewrite->old_locator.relNumber)
+			{
+				skip_transactional_drop = true;
+				break;
+			}
+		}
+	}
+
+	/*
+	 * If it is part of a rewrite, drop its stats later, for example in
+	 * AtEOXact_PgStat_Relations(), so skip it here.
+	 */
+	if (!skip_transactional_drop)
+		pgstat_drop_transactional(PGSTAT_KIND_RELATION,
+								  rel->rd_locator.dbOid,
+								  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -666,6 +788,48 @@ AtEOXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit)
 		}
 		tabstat->trans = NULL;
 	}
+
+	/* preserve the stats in case of rewrite */
+	if (isCommit && pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+		PgStat_PendingRewrite *prev = NULL;
+		PgStat_PendingRewrite *current = pending_rewrites;
+		PgStat_PendingRewrite *next;
+
+		/* reverse the rewrites list to process in chronological order */
+		while (current != NULL)
+		{
+			next = current->next;
+			current->next = prev;
+			prev = current;
+			current = next;
+		}
+
+		/* now process rewrites in chronological order */
+		for (rewrite = prev; rewrite != NULL; rewrite = rewrite->next)
+		{
+			PgStat_EntryRef *old_entry_ref;
+
+			old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+													   rewrite->old_locator.dbOid,
+													   RelFileLocatorToPgStatObjid(rewrite->old_locator));
+
+			if (old_entry_ref && old_entry_ref->pending)
+				pgstat_relation_flush_cb(old_entry_ref, false);
+
+			pgstat_copy_relation_stats(rewrite->new_locator,
+									   rewrite->old_locator, true);
+
+			/* drop old locator's stats */
+			if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+								   rewrite->old_locator.dbOid,
+								   RelFileLocatorToPgStatObjid(rewrite->old_locator)))
+				pgstat_request_entry_refs_gc();
+		}
+	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -681,6 +845,30 @@ AtEOSubXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit, in
 	PgStat_TableXactStatus *trans;
 	PgStat_TableXactStatus *next_trans;
 
+	/*
+	 * If we don't commit then remove the associated rewrites if any, to keep
+	 * the rewrite chain in sync with what will be eventually committed.
+	 */
+	if (!isCommit)
+	{
+		PgStat_PendingRewrite **rewrite_ptr = &pending_rewrites;
+
+		while (*rewrite_ptr != NULL)
+		{
+			if ((*rewrite_ptr)->nest_level >= nestDepth)
+			{
+				PgStat_PendingRewrite *to_remove = *rewrite_ptr;
+
+				*rewrite_ptr = (*rewrite_ptr)->next;
+				pfree(to_remove);
+			}
+			else
+			{
+				rewrite_ptr = &((*rewrite_ptr)->next);
+			}
+		}
+	}
+
 	for (trans = xact_state->first; trans != NULL; trans = next_trans)
 	{
 		PgStat_TableStatus *tabstat;
@@ -760,11 +948,19 @@ void
 AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 {
 	PgStat_TableXactStatus *trans;
+	PgStat_PendingRewrite *rewrite;
 
+	/*
+	 * For each tabstat, find its matching rewrite and remove it from the
+	 * pending rewrites list. This way, after processing all tabstats, pending
+	 * rewrites will only contain rewrite only transactions.
+	 */
 	for (trans = xact_state->first; trans != NULL; trans = trans->next)
 	{
 		PgStat_TableStatus *tabstat PG_USED_FOR_ASSERTS_ONLY;
 		TwoPhasePgStatRecord record;
+		PgStat_PendingRewrite **rewrite_ptr;
+		bool		found_rewrite = false;
 
 		Assert(trans->nest_level == 1);
 		Assert(trans->upper == NULL);
@@ -784,10 +980,83 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 			record.locator = tabstat->locator;
 
 		record.truncdropped = trans->truncdropped;
+		record.rewrite_nest_level = 0;
+
+		/*
+		 * Look for a matching rewrite and remove it from pending rewrites. We
+		 * check three possible matches:
+		 *
+		 * The new_locator when stats have been added after the rewrite. The
+		 * old_locator when stats have been added before the rewrite but not
+		 * after. The original_locator when this tabstat is part of a rewrite
+		 * chain.
+		 */
+		rewrite_ptr = &pending_rewrites;
+		while (*rewrite_ptr != NULL)
+		{
+			rewrite = *rewrite_ptr;
+
+			if ((record.locator.dbOid == rewrite->new_locator.dbOid &&
+				 record.locator.spcOid == rewrite->new_locator.spcOid &&
+				 record.locator.relNumber == rewrite->new_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->old_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->old_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->old_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->original_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->original_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->original_locator.relNumber))
+			{
+				/*
+				 * Found matching rewrite. Record the rewrite information and
+				 * remove this rewrite from the list since it's now handled.
+				 */
+				record.rewrite_old_locator = rewrite->original_locator;
+				record.rewrite_nest_level = rewrite->nest_level;
+				record.locator = rewrite->new_locator;
+				found_rewrite = true;
+
+				/* Remove from pending_rewrites list */
+				*rewrite_ptr = rewrite->next;
+				pfree(rewrite);
+				break;
+			}
+			else
+			{
+				/* Move to next rewrite in the list */
+				rewrite_ptr = &(rewrite->next);
+			}
+		}
+
+		/* If no rewrite found, clear the rewrite fields */
+		if (!found_rewrite)
+		{
+			memset(&record.rewrite_old_locator, 0, sizeof(RelFileLocator));
+		}
+
+		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
+							   &record, sizeof(TwoPhasePgStatRecord));
+	}
+
+	/*
+	 * Now process any rewrites still pending. These are rewrite only
+	 * transactions. We need to preserve their stats even though there's no
+	 * tabstat entry for them.
+	 */
+	for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+	{
+		TwoPhasePgStatRecord record;
+
+		memset(&record, 0, sizeof(TwoPhasePgStatRecord));
+		record.locator = rewrite->new_locator;
+		record.rewrite_old_locator = rewrite->original_locator;
+		record.rewrite_nest_level = rewrite->nest_level;
+		record.truncdropped = false;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
 							   &record, sizeof(TwoPhasePgStatRecord));
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -810,6 +1079,8 @@ PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		tabstat = trans->parent;
 		tabstat->trans = NULL;
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -845,6 +1116,29 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	pgstat_info->counts.changed_tuples +=
 		rec->tuples_inserted + rec->tuples_updated +
 		rec->tuples_deleted;
+
+	if (rec->rewrite_nest_level > 0)
+	{
+		PgStat_EntryRef *old_entry_ref;
+
+		/* Flush any pending stats for old locator first */
+		old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+												   rec->rewrite_old_locator.dbOid,
+												   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator));
+
+		if (old_entry_ref && old_entry_ref->pending)
+			pgstat_relation_flush_cb(old_entry_ref, false);
+
+		/* Copy stats from old to new locator */
+		pgstat_copy_relation_stats(rec->locator, rec->rewrite_old_locator,
+								   true);
+
+		/* Drop old locator's stats */
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+							   rec->rewrite_old_locator.dbOid,
+							   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator)))
+			pgstat_request_entry_refs_gc();
+	}
 }
 
 /*
@@ -859,9 +1153,26 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 {
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
+	RelFileLocator target_locator;
+
+	/*
+	 * For aborted transactions with rewrites (like TRUNCATE), we need to
+	 * restore stats to the old locator, not the new one. The new locator
+	 * should be dropped since the rewrite is being rolled back.
+	 */
+	if (rec->rewrite_nest_level > 0)
+	{
+		/* Use the old locator */
+		target_locator = rec->rewrite_old_locator;
+	}
+	else
+	{
+		/* No rewrite, use the original locator */
+		target_locator = rec->locator;
+	}
 
 	/* Find or create a tabstat entry for the target locator */
-	pgstat_info = pgstat_prep_relation_pending(rec->locator);
+	pgstat_info = pgstat_prep_relation_pending(target_locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -916,7 +1227,17 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	tabentry->numscans += lstats->counts.numscans;
 	if (lstats->counts.numscans)
 	{
-		TimestampTz t = GetCurrentTransactionStopTimestamp();
+		TimestampTz t;
+
+		/*
+		 * Checking the transaction state due to the flush call in
+		 * pgstat_twophase_postcommit() that would break the assertion on the
+		 * state in GetCurrentTransactionStopTimestamp().
+		 */
+		if (!IsTransactionState())
+			t = GetCurrentTransactionStopTimestamp();
+		else
+			t = GetCurrentTimestamp();
 
 		if (t > tabentry->lastscan)
 			tabentry->lastscan = t;
@@ -1167,3 +1488,45 @@ pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
 	ReleaseSysCache(tuple);
 	return result;
 }
+
+/*
+ * Mark that a relation rewrite has occurred, preserving the original locator
+ * so stats can be copied at transaction commit.
+ */
+void
+pgstat_mark_rewrite(RelFileLocator old_locator, RelFileLocator new_locator)
+{
+	PgStat_PendingRewrite *rewrite;
+	PgStat_PendingRewrite *existing;
+	RelFileLocator original_locator = old_locator;
+
+	for (existing = pending_rewrites; existing != NULL; existing = existing->next)
+	{
+		if (old_locator.dbOid == existing->new_locator.dbOid &&
+			old_locator.spcOid == existing->new_locator.spcOid &&
+			old_locator.relNumber == existing->new_locator.relNumber)
+		{
+			original_locator = existing->original_locator;
+			break;
+		}
+	}
+
+	/* Allocate in TopTransactionContext memory context */
+	rewrite = MemoryContextAlloc(TopTransactionContext,
+								 sizeof(PgStat_PendingRewrite));
+
+	rewrite->old_locator = old_locator;
+	rewrite->new_locator = new_locator;
+	rewrite->original_locator = original_locator;
+	rewrite->nest_level = GetCurrentTransactionNestLevel();
+
+	/* Add to the list */
+	rewrite->next = pending_rewrites;
+	pending_rewrites = rewrite;
+}
+
+void
+pgstat_clear_rewrite(void)
+{
+	pending_rewrites = NULL;
+}
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index bc9864bd8d9..f8cf3755ce2 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -55,6 +55,8 @@ AtEOXact_PgStat(bool isCommit, bool parallel)
 	}
 	pgStatXactStack = NULL;
 
+	pgstat_clear_rewrite();
+
 	/* Make sure any stats snapshot is thrown away */
 	pgstat_clear_snapshot();
 }
@@ -360,8 +362,29 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid, bo
 void
 pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objid, false, NULL))
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+
+	if (entry_ref)
 	{
+		/*
+		 * For relations stats, we key by physical file location, not by
+		 * relation OID. This means during operations like ALTER TYPE where
+		 * the relation OID changes but the relfilenode stays the same (no
+		 * actual rewrite needed), we'll find an existing entry.
+		 *
+		 * This is expected behavior, we want to preserve stats across the
+		 * catalog change. Simply reset and recreate the entry for the new
+		 * relation OID without warning.
+		 */
+		if (kind == PGSTAT_KIND_RELATION)
+		{
+			pgstat_reset(kind, dboid, objid);
+			create_drop_transactional_internal(kind, dboid, objid, true);
+			return;
+		}
+
 		ereport(WARNING,
 				errmsg("resetting existing statistics for kind %s, db=%u, oid=%" PRIu64,
 					   (pgstat_get_kind_info(kind))->name, dboid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 2d0cb7bcfd4..c98e5c51d63 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -85,6 +85,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/relmapper.h"
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
@@ -3780,6 +3781,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	MultiXactId minmulti = InvalidMultiXactId;
 	TransactionId freezeXid = InvalidTransactionId;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator = relation->rd_locator;
 
 	if (!IsBinaryUpgrade)
 	{
@@ -3951,6 +3953,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 
 	table_close(pg_class, RowExclusiveLock);
 
+	/* Mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Make the pg_class row change or relation map change visible.  This will
 	 * cause the relcache entry to get updated, too.
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3102f86aa24..8750c025bbe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -670,7 +670,7 @@ extern PgStat_FunctionCounts *find_funcstat_entry(Oid func_id);
 
 extern void pgstat_create_relation(Relation rel);
 extern void pgstat_drop_relation(Relation rel);
-extern void pgstat_copy_relation_stats(Relation dst, Relation src);
+extern void pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment);
 
 extern void pgstat_init_relation(Relation rel);
 extern void pgstat_assoc_relation(Relation rel);
@@ -682,6 +682,9 @@ extern void pgstat_report_vacuum(Relation rel, PgStat_Counter livetuples,
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter, TimestampTz starttime);
+extern void pgstat_mark_rewrite(RelFileLocator old_locator,
+								RelFileLocator new_locator);
+extern void pgstat_clear_rewrite(void);
 
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3451538565e..0ca3eae8026 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2268,6 +2268,7 @@ PgStat_KindInfo
 PgStat_LocalState
 PgStat_PendingDroppedStatsItem
 PgStat_PendingIO
+PgStat_PendingRewrite
 PgStat_SLRUStats
 PgStat_ShmemControl
 PgStat_Snapshot
-- 
2.34.1

#47Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#45)
Re: relfilenode statistics

Hi,

On Tue, Dec 16, 2025 at 04:33:17PM +0900, Michael Paquier wrote:

Hence, why don't we split PgStat_StatTabEntry into three things from
the start, even if it means to duplicate some of them? Say:
- Table fields: includes [auto]vacuum/analyze data, block fields,
fields of pg_stat_all_tables.
- Index fields: no need for the [auto]vacuum/analyze time and counts,
block fields, pg_stat_all_indexes fields.
- Relfilenode fields: dead_tuples, ins_since_vacuum and
mod_since_analyze. Does not apply to partitioned tables and indexes,
only applies to tables. Provides a clean split, embrace the fact that
these are the only three fields we need to worry about during
recovery.

I think that the PSEUDO_PARTITION_TABLE_SPCOID just proposed in [1]/messages/by-id/aUEyzoOJtrCLAEeT@ip-10-97-1-34.eu-west-3.compute.internal approach
is simple enough and solves the collision issue raised by Andres.

I think I prefer the unified structure as proposed in the patch (though we
may want to split tables and indexes later on). The reason is that it's
easier to expose publicly.

Indeed, at the very beginning of this thread, in v1, I created a new
PGSTAT_KIND_RELFILENODE and had to make it coexist with PGSTAT_KIND_RELATION and
that led to discussion on how we should expose them ([2]/messages/by-id/CA+TgmoZtwT6h=nyuQ1J9GNSrRyhf0fv7Ai6FzO=bH0C9Bf6tew@mail.gmail.com).

[1]: /messages/by-id/aUEyzoOJtrCLAEeT@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/CA+TgmoZtwT6h=nyuQ1J9GNSrRyhf0fv7Ai6FzO=bH0C9Bf6tew@mail.gmail.com

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#48Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#45)
Re: relfilenode statistics

Hi,

On 2025-12-16 16:33:17 +0900, Michael Paquier wrote:

On Mon, Dec 15, 2025 at 12:48:25PM -0500, Andres Freund wrote:

I don't think this is true as stated. Two reasons:

1) This afaict guarantees that the relfilenode will not clash with oids, but
it does *NOT* guarantee that it does not clash with other relfilenodes

2) Note that GetNewRelFileNumber() does *NOT* check for conflicts when
creating a new relfilenode for an existing relation:
* If the relfilenumber will also be used as the relation's OID, pass the
* opened pg_class catalog, and this routine will guarantee that the result
* is also an unused OID within pg_class. If the result is to be used only
* as a relfilenumber for an existing relation, pass NULL for pg_class.

FWIW, I am also still troubled by the part of the proposed patch set
where we are trying to hide the idea of a partitioned table has a
relfilenode set by using its relid instead in the key for the data.
This leads to a huge amount of complexity in the patch, mainly to
store data for autovacuum that we do not need at the end:
- autovacuum discards partitioned tables in do_autovacuum(), so the
stats related to partitioned tables that we need to select the
relations does not matter.

I feel like that's an implementation wart that we ought to fix. It's not
infrequently a problem that we don't automatically analyze partitioned
tables. Weren't there even a couple threads on that on the list in the last
weeks?

- manual vacuums may include partitioned tables to extract its
partitions, vacuum_rel() at the end discarding them. Well, stats
don't matter anyway.

We only need to attach three fields to let autovacuum know if a
relation needs to run or not: dead_tuples, ins_since_vacuum,
mod_since_analyze.

That may be true for autovacuum today, but I don't see any reason for
live_tuples, tuples_inserted etc to be inaccurate after a failover.

Most the fields of PgStat_StatTabEntry make sense
only for tables, few are required by indexes for pg_stat_all_indexes.
Some fields actually make sense because they refer to on-disk files,
mostly for pg_statio_all_tables (blocks_fetched, blocks_hit).

Hence, why don't we split PgStat_StatTabEntry into three things from
the start, even if it means to duplicate some of them? Say:
- Table fields: includes [auto]vacuum/analyze data, block fields,
fields of pg_stat_all_tables.

What do you mean with "block fields"? pg_statio_all_tables? If so, what's the
point of including them here, rather than in the relfilenode fields?

- Index fields: no need for the [auto]vacuum/analyze time and counts,
block fields, pg_stat_all_indexes fields.

I think we actually should populate the [auto]vac fields for indexes, right
now it's impossible to figure out from stats whether indexes are frequently
scanned as part of vacuum or not.

- Relfilenode fields: dead_tuples, ins_since_vacuum and
mod_since_analyze. Does not apply to partitioned tables and indexes,
only applies to tables. Provides a clean split, embrace the fact that
these are the only three fields we need to worry about during
recovery.

I think we really ought to populate not just these during recovery, but also
at least n_tup_ins, n_tup_upd, n_tup_del, n_tup_hot_upd, n_live_tup.

I don't understand why we would want to only populate these three fields?

I'm not against splitting the index fields off, but it seems pretty orthogonal
to what we're discussing here. If we were to split of index stats into a
separate stat, why wouldn't we keep the statio fields in the relfilenode
stats, since they're obviously intimately tied to that?

Greetings,

Andres Freund

#49Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#46)
2 attachment(s)
Re: relfilenode statistics

Hi,

On Tue, Dec 16, 2025 at 10:22:06AM +0000, Bertrand Drouvot wrote:

In the attached

PFA a mandatory rebase due to f4e797171ea.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v10-0001-Key-PGSTAT_KIND_RELATION-by-relfile-locator.patchtext/x-diff; charset=us-asciiDownload
From bd285932cbe23ef9e70916269b6be9a5aacbaac4 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 1 Oct 2025 09:45:26 +0000
Subject: [PATCH v10 1/2] Key PGSTAT_KIND_RELATION by relfile locator

This patch changes the key used for the PGSTAT_KIND_RELATION statistic kind.
Instead of the relation oid, it now relies on:

- dboid (linked to RelFileLocator's dbOid)
- objoid which is the result of a new macro (namely RelFileLocatorToPgStatObjid())
that computes an objoid based on the RelFileLocator's spcOid and the
RelFileLocator's relNumber.

This is possible as, since b14e9ce7d55c, the objoid is now uint64 and spcOid
and relNumber are 32 bits.

That will allow us to add new stats (add writes counters) and ensure that some
counters (n_dead_tup and friends) are replicated.

The patch introduces pgstat_reloid_to_relfilelocator() to 1) avoid calling
RelationIdGetRelation() to get the relfilelocator based on the relation oid
and 2) handle the partitioned table case.

Please note that:

- when running pg_stat_have_stats('relation',...) we now need to be connected
to the database that hosts the relation. As pg_stat_have_stats() is not
documented publicly, then the changes done in 029_stats_restart.pl look
enough.

- this patch does not handle rewrites so some tests are failing. It's only
intent is to ease the review and should not be pushed without being
merged with the following patch that handles the rewrites.

- it can be used to test that stats are incremented correctly and that we're
able to retrieve them as long as rewrites are not involved.
---
 src/backend/postmaster/autovacuum.c          |  17 +-
 src/backend/utils/activity/pgstat_relation.c | 236 ++++++++++++++++---
 src/backend/utils/adt/pgstatfuncs.c          |  22 +-
 src/include/catalog/pg_tablespace.dat        |   4 +
 src/include/catalog/pg_tablespace.h          |   8 +
 src/include/pgstat.h                         |  15 +-
 src/include/utils/pgstat_internal.h          |   1 +
 src/test/recovery/t/029_stats_restart.pl     |  40 ++--
 8 files changed, 271 insertions(+), 72 deletions(-)
   6.1% src/backend/postmaster/
  61.6% src/backend/utils/activity/
   5.1% src/backend/utils/adt/
   3.2% src/include/catalog/
   5.6% src/include/
  18.1% src/test/recovery/t/

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 1bd3924e35e..a11174b25ad 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2014,12 +2014,16 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		RelFileLocator locator;
 
 		if (classForm->relkind != RELKIND_RELATION &&
 			classForm->relkind != RELKIND_MATVIEW)
 			continue;
 
 		relid = classForm->oid;
+		locator.dbOid = classForm->relisshared ? InvalidOid : MyDatabaseId;
+		locator.spcOid = classForm->reltablespace;
+		locator.relNumber = classForm->relfilenode;
 
 		/*
 		 * Check if it is a temp table (presumably, of some other backend's).
@@ -2048,8 +2052,7 @@ do_autovacuum(void)
 
 		/* Fetch reloptions and the pgstat entry for this table */
 		relopts = extract_autovac_opts(tuple, pg_class_desc);
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_by_locator(locator);
 
 		/* Check if it needs vacuum or analyze */
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
@@ -2114,6 +2117,7 @@ do_autovacuum(void)
 		bool		dovacuum;
 		bool		doanalyze;
 		bool		wraparound;
+		RelFileLocator locator;
 
 		/*
 		 * We cannot safely process other backends' temp tables, so skip 'em.
@@ -2122,6 +2126,9 @@ do_autovacuum(void)
 			continue;
 
 		relid = classForm->oid;
+		locator.dbOid = classForm->relisshared ? InvalidOid : MyDatabaseId;
+		locator.spcOid = classForm->reltablespace;
+		locator.relNumber = classForm->relfilenode;
 
 		/*
 		 * fetch reloptions -- if this toast table does not have them, try the
@@ -2141,8 +2148,7 @@ do_autovacuum(void)
 		}
 
 		/* Fetch the pgstat entry for this table */
-		tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-												  relid);
+		tabentry = pgstat_fetch_stat_tabentry_by_locator(locator);
 
 		relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
 								  effective_multixact_freeze_max_age,
@@ -2939,8 +2945,7 @@ recheck_relation_needs_vacanalyze(Oid relid,
 	PgStat_StatTabEntry *tabentry;
 
 	/* fetch the pgstat table entry */
-	tabentry = pgstat_fetch_stat_tabentry_ext(classForm->relisshared,
-											  relid);
+	tabentry = pgstat_fetch_stat_tabentry_ext(relid);
 
 	relation_needs_vacanalyze(relid, avopts, classForm, tabentry,
 							  effective_multixact_freeze_max_age,
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 55a10c299db..a267e93f8be 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -17,12 +17,17 @@
 
 #include "postgres.h"
 
+#include "access/htup_details.h"
 #include "access/twophase_rmgr.h"
 #include "access/xact.h"
 #include "catalog/catalog.h"
+#include "catalog/pg_tablespace.h"
+#include "storage/lmgr.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 #include "utils/rel.h"
+#include "utils/relmapper.h"
+#include "utils/syscache.h"
 #include "utils/timestamp.h"
 
 
@@ -36,13 +41,12 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter inserted_pre_truncdrop;
 	PgStat_Counter updated_pre_truncdrop;
 	PgStat_Counter deleted_pre_truncdrop;
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
 } TwoPhasePgStatRecord;
 
 
-static PgStat_TableStatus *pgstat_prep_relation_pending(Oid rel_id, bool isshared);
+static PgStat_TableStatus *pgstat_prep_relation_pending(RelFileLocator locator);
 static void add_tabstat_xact_level(PgStat_TableStatus *pgstat_info, int nest_level);
 static void ensure_tabstat_xact_level(PgStat_TableStatus *pgstat_info);
 static void save_truncdrop_counters(PgStat_TableXactStatus *trans, bool is_drop);
@@ -60,8 +64,7 @@ pgstat_copy_relation_stats(Relation dst, Relation src)
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(src->rd_rel->relisshared,
-											  RelationGetRelid(src));
+	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
 	if (!srcstats)
 		return;
 
@@ -94,8 +97,10 @@ pgstat_init_relation(Relation rel)
 
 	/*
 	 * We only count stats for relations with storage and partitioned tables
+	 * and we don't count stats generated during a rewrite.
 	 */
-	if (!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE)
+	if ((!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE) ||
+		OidIsValid(rel->rd_rel->relrewrite))
 	{
 		rel->pgstat_enabled = false;
 		rel->pgstat_info = NULL;
@@ -130,12 +135,37 @@ pgstat_init_relation(Relation rel)
 void
 pgstat_assoc_relation(Relation rel)
 {
+	RelFileLocator locator;
+
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
 
+	/*
+	 * Don't associate stats for relations without storage and non partitioned
+	 * tables.
+	 */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind) &&
+		rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		return;
+
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use a reserved pseudo tablespace
+		 * OID that cannot conflict with real tablespaces, and the relation
+		 * OID as relNumber. This ensures no collision with regular relations
+		 * even after OID wraparound.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = PSEUDO_PARTITION_TABLE_SPCOID;
+		locator.relNumber = rel->rd_id;
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(RelationGetRelid(rel),
-													rel->rd_rel->relisshared);
+	rel->pgstat_info = pgstat_prep_relation_pending(locator);
 
 	/* don't allow link a stats to multiple relcache entries */
 	Assert(rel->pgstat_info->relation == NULL);
@@ -167,9 +197,13 @@ pgstat_unlink_relation(Relation rel)
 void
 pgstat_create_relation(Relation rel)
 {
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_create_transactional(PGSTAT_KIND_RELATION,
-								rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-								RelationGetRelid(rel));
+								rel->rd_locator.dbOid,
+								RelFileLocatorToPgStatObjid(rel->rd_locator));
 }
 
 /*
@@ -181,9 +215,13 @@ pgstat_drop_relation(Relation rel)
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
 
+	/* don't track stats for relations without storage */
+	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		return;
+
 	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-							  RelationGetRelid(rel));
+							  rel->rd_locator.dbOid,
+							  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -213,20 +251,23 @@ pgstat_report_vacuum(Relation rel, PgStat_Counter livetuples,
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
+	RelFileLocator locator;
 
 	if (!pgstat_track_counts)
 		return;
 
+	locator = rel->rd_locator;
+
 	/* Store the data in the table's hash table entry. */
 	ts = GetCurrentTimestamp();
 	elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
 
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
-											RelationGetRelid(rel), false);
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, locator.dbOid,
+											RelFileLocatorToPgStatObjid(locator),
+											false);
 
 	shtabentry = (PgStatShared_Relation *) entry_ref->shared_stats;
 	tabentry = &shtabentry->stats;
@@ -285,9 +326,9 @@ pgstat_report_analyze(Relation rel,
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Relation *shtabentry;
 	PgStat_StatTabEntry *tabentry;
-	Oid			dboid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
 	TimestampTz ts;
 	PgStat_Counter elapsedtime;
+	RelFileLocator locator;
 
 	if (!pgstat_track_counts)
 		return;
@@ -325,9 +366,25 @@ pgstat_report_analyze(Relation rel,
 	ts = GetCurrentTimestamp();
 	elapsedtime = TimestampDifferenceMilliseconds(starttime, ts);
 
+	if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		locator = rel->rd_locator;
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use a reserved pseudo tablespace
+		 * OID that cannot conflict with real tablespaces, and the relation
+		 * OID as relNumber. This ensures no collision with regular relations
+		 * even after OID wraparound.
+		 */
+		locator.dbOid = (rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId);
+		locator.spcOid = PSEUDO_PARTITION_TABLE_SPCOID;
+		locator.relNumber = rel->rd_id;
+	}
 	/* block acquiring lock for the same reason as pgstat_report_autovac() */
-	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION, dboid,
-											RelationGetRelid(rel),
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
+											locator.dbOid,
+											RelFileLocatorToPgStatObjid(locator),
 											false);
 	/* can't get dropped while accessed */
 	Assert(entry_ref != NULL && entry_ref->shared_stats != NULL);
@@ -468,7 +525,16 @@ pgstat_update_heap_dead_tuples(Relation rel, int delta)
 PgStat_StatTabEntry *
 pgstat_fetch_stat_tabentry(Oid relid)
 {
-	return pgstat_fetch_stat_tabentry_ext(IsSharedRelation(relid), relid);
+	return pgstat_fetch_stat_tabentry_ext(relid);
+}
+
+PgStat_StatTabEntry *
+pgstat_fetch_stat_tabentry_by_locator(RelFileLocator locator)
+{
+	return (PgStat_StatTabEntry *) pgstat_fetch_entry(
+													  PGSTAT_KIND_RELATION,
+													  locator.dbOid,
+													  RelFileLocatorToPgStatObjid(locator));
 }
 
 /*
@@ -476,12 +542,14 @@ pgstat_fetch_stat_tabentry(Oid relid)
  * whether the to-be-accessed table is a shared relation or not.
  */
 PgStat_StatTabEntry *
-pgstat_fetch_stat_tabentry_ext(bool shared, Oid reloid)
+pgstat_fetch_stat_tabentry_ext(Oid reloid)
 {
-	Oid			dboid = (shared ? InvalidOid : MyDatabaseId);
+	RelFileLocator locator;
 
-	return (PgStat_StatTabEntry *)
-		pgstat_fetch_entry(PGSTAT_KIND_RELATION, dboid, reloid);
+	if (!pgstat_reloid_to_relfilelocator(reloid, &locator))
+		return NULL;
+
+	return pgstat_fetch_stat_tabentry_by_locator(locator);
 }
 
 /*
@@ -503,14 +571,17 @@ find_tabstat_entry(Oid rel_id)
 	PgStat_TableXactStatus *trans;
 	PgStat_TableStatus *tabentry = NULL;
 	PgStat_TableStatus *tablestatus = NULL;
+	RelFileLocator locator;
+
+	if (!pgstat_reloid_to_relfilelocator(rel_id, &locator))
+		return NULL;
+
+	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+										   locator.dbOid,
+										   RelFileLocatorToPgStatObjid(locator));
 
-	entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, MyDatabaseId, rel_id);
 	if (!entry_ref)
-	{
-		entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION, InvalidOid, rel_id);
-		if (!entry_ref)
-			return tablestatus;
-	}
+		return tablestatus;
 
 	tabentry = (PgStat_TableStatus *) entry_ref->pending;
 	tablestatus = palloc_object(PgStat_TableStatus);
@@ -706,8 +777,12 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		record.inserted_pre_truncdrop = trans->inserted_pre_truncdrop;
 		record.updated_pre_truncdrop = trans->updated_pre_truncdrop;
 		record.deleted_pre_truncdrop = trans->deleted_pre_truncdrop;
-		record.id = tabstat->id;
-		record.shared = tabstat->shared;
+
+		if (tabstat->relation != NULL)
+			record.locator = tabstat->relation->rd_locator;
+		else
+			record.locator = tabstat->locator;
+
 		record.truncdropped = trans->truncdropped;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
@@ -750,7 +825,7 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	PgStat_TableStatus *pgstat_info;
 
 	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, commit case */
 	pgstat_info->counts.tuples_inserted += rec->tuples_inserted;
@@ -785,8 +860,8 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
 
-	/* Find or create a tabstat entry for the rel */
-	pgstat_info = pgstat_prep_relation_pending(rec->id, rec->shared);
+	/* Find or create a tabstat entry for the target locator */
+	pgstat_info = pgstat_prep_relation_pending(rec->locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -920,17 +995,21 @@ pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
  * initialized if not exists.
  */
 static PgStat_TableStatus *
-pgstat_prep_relation_pending(Oid rel_id, bool isshared)
+pgstat_prep_relation_pending(RelFileLocator locator)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStat_TableStatus *pending;
+	uint64		objid;
+
+	objid = RelFileLocatorToPgStatObjid(locator);
 
 	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_RELATION,
-										  isshared ? InvalidOid : MyDatabaseId,
-										  rel_id, NULL);
+										  locator.dbOid,
+										  objid, NULL);
+
 	pending = entry_ref->pending;
-	pending->id = rel_id;
-	pending->shared = isshared;
+	pending->id = objid;
+	pending->locator = locator;
 
 	return pending;
 }
@@ -1009,3 +1088,82 @@ restore_truncdrop_counters(PgStat_TableXactStatus *trans)
 		trans->tuples_deleted = trans->deleted_pre_truncdrop;
 	}
 }
+
+/*
+ * Convert a relation OID to its corresponding RelFileLocator for statistics
+ * tracking purposes.
+ *
+ * Returns true on success, false if the relation doesn't need statistics
+ * tracking.
+ *
+ * For partitioned tables, constructs a synthetic locator using the relation
+ * OID as relNumber, since they don't have storage.
+ */
+bool
+pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
+{
+	HeapTuple	tuple;
+	Form_pg_class relform;
+	bool		result = true;
+
+	/* get the relation's tuple from pg_class */
+	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(reloid));
+
+	if (!HeapTupleIsValid(tuple))
+		return false;
+
+	relform = (Form_pg_class) GETSTRUCT(tuple);
+
+	/* skip relations without storage and non partitioned tables */
+	if (!RELKIND_HAS_STORAGE(relform->relkind) &&
+		relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		ReleaseSysCache(tuple);
+		return false;
+	}
+
+	if (relform->relkind != RELKIND_PARTITIONED_TABLE)
+	{
+		/* build the RelFileLocator */
+		locator->relNumber = relform->relfilenode;
+		locator->spcOid = relform->reltablespace;
+
+		/* handle default tablespace */
+		if (!OidIsValid(locator->spcOid))
+			locator->spcOid = MyDatabaseTableSpace;
+
+		/* handle dbOid for global vs local relations */
+		if (locator->spcOid == GLOBALTABLESPACE_OID)
+			locator->dbOid = InvalidOid;
+		else
+			locator->dbOid = MyDatabaseId;
+
+		/* handle mapped relations */
+		if (!RelFileNumberIsValid(locator->relNumber))
+		{
+			locator->relNumber = RelationMapOidToFilenumber(reloid,
+															relform->relisshared);
+			if (!RelFileNumberIsValid(locator->relNumber))
+			{
+				ReleaseSysCache(tuple);
+				return false;
+			}
+		}
+	}
+	else
+	{
+		/*
+		 * Partitioned tables don't have storage, so construct a synthetic
+		 * locator for statistics tracking. Use a reserved pseudo tablespace
+		 * OID that cannot conflict with real tablespaces, and the relation
+		 * OID as relNumber. This ensures no collision with regular relations
+		 * even after OID wraparound.
+		 */
+		locator->dbOid = (relform->relisshared ? InvalidOid : MyDatabaseId);
+		locator->spcOid = PSEUDO_PARTITION_TABLE_SPCOID;
+		locator->relNumber = relform->oid;
+	}
+
+	ReleaseSysCache(tuple);
+	return result;
+}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ef6fffe60b9..60ffb1679ec 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -23,13 +23,13 @@
 #include "common/ip.h"
 #include "funcapi.h"
 #include "miscadmin.h"
-#include "pgstat.h"
 #include "postmaster/bgworker.h"
 #include "replication/logicallauncher.h"
 #include "storage/proc.h"
 #include "storage/procarray.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/pgstat_internal.h"
 #include "utils/timestamp.h"
 
 #define UINT32_ACCESS_ONCE(var)		 ((uint32)(*((volatile uint32 *)&(var))))
@@ -1949,9 +1949,14 @@ Datum
 pg_stat_reset_single_table_counters(PG_FUNCTION_ARGS)
 {
 	Oid			taboid = PG_GETARG_OID(0);
-	Oid			dboid = (IsSharedRelation(taboid) ? InvalidOid : MyDatabaseId);
+	RelFileLocator locator;
 
-	pgstat_reset(PGSTAT_KIND_RELATION, dboid, taboid);
+	/* Get the stats locator from the relation OID */
+	if (!pgstat_reloid_to_relfilelocator(taboid, &locator))
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_RELATION, locator.dbOid,
+				 RelFileLocatorToPgStatObjid(locator));
 
 	PG_RETURN_VOID();
 }
@@ -2305,5 +2310,16 @@ pg_stat_have_stats(PG_FUNCTION_ARGS)
 	uint64		objid = PG_GETARG_INT64(2);
 	PgStat_Kind kind = pgstat_get_kind_from_str(stats_type);
 
+	/* Convert relation OID to relfilenode objid */
+	if (kind == PGSTAT_KIND_RELATION)
+	{
+		RelFileLocator locator;
+
+		if (!pgstat_reloid_to_relfilelocator(objid, &locator))
+			PG_RETURN_BOOL(false);
+
+		objid = RelFileLocatorToPgStatObjid(locator);
+	}
+
 	PG_RETURN_BOOL(pgstat_have_entry(kind, dboid, objid));
 }
diff --git a/src/include/catalog/pg_tablespace.dat b/src/include/catalog/pg_tablespace.dat
index 1302a3d75cd..9430970fffd 100644
--- a/src/include/catalog/pg_tablespace.dat
+++ b/src/include/catalog/pg_tablespace.dat
@@ -10,6 +10,10 @@
 #
 #----------------------------------------------------------------------
 
+/*
+ * When adding a new one, ensure it does not conflict with
+ * PSEUDO_PARTITION_TABLE_SPCOID.
+ */
 [
 
 { oid => '1663', oid_symbol => 'DEFAULTTABLESPACE_OID',
diff --git a/src/include/catalog/pg_tablespace.h b/src/include/catalog/pg_tablespace.h
index 7816d779d8c..0e2d8051d69 100644
--- a/src/include/catalog/pg_tablespace.h
+++ b/src/include/catalog/pg_tablespace.h
@@ -21,6 +21,14 @@
 #include "catalog/genbki.h"
 #include "catalog/pg_tablespace_d.h"	/* IWYU pragma: export */
 
+/*
+ * Reserved tablespace OID for partitioned table pseudo locators.
+ * This is not an actual tablespace, just a reserved value to distinguish
+ * partitioned table statistics from regular table statistics. Ensures it does
+ * not conflict with the ones in pg_tablespace.dat.
+ */
+#define PSEUDO_PARTITION_TABLE_SPCOID 1665
+
 /* ----------------
  *		pg_tablespace definition.  cpp turns this into
  *		typedef struct FormData_pg_tablespace
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6714363144a..3102f86aa24 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -17,6 +17,7 @@
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
 #include "replication/worker_internal.h"
+#include "storage/relfilelocator.h"
 #include "utils/backend_progress.h" /* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/backend_status.h"	/* for backward compatibility */	/* IWYU pragma: export */
 #include "utils/pgstat_kind.h"
@@ -35,6 +36,12 @@
 /* Default directory to store temporary statistics data in */
 #define PG_STAT_TMP_DIR		"pg_stat_tmp"
 
+/*
+ * Build a pgstat key Objid based on a RelFileLocator.
+ */
+#define RelFileLocatorToPgStatObjid(locator) \
+	(((uint64) (locator).spcOid << 32) | (locator).relNumber)
+
 /* Values for track_functions GUC variable --- order is significant! */
 typedef enum TrackFunctionsLevel
 {
@@ -175,11 +182,11 @@ typedef struct PgStat_TableCounts
  */
 typedef struct PgStat_TableStatus
 {
-	Oid			id;				/* table's OID */
-	bool		shared;			/* is it a shared catalog? */
+	uint64		id;				/* hash of relfilelocator for stats key */
 	struct PgStat_TableXactStatus *trans;	/* lowest subxact's counts */
 	PgStat_TableCounts counts;	/* event counts to be sent */
 	Relation	relation;		/* rel that is using this entry */
+	RelFileLocator locator;		/* table's relfilelocator */
 } PgStat_TableStatus;
 
 /* ----------
@@ -735,8 +742,8 @@ extern void pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 									  void *recdata, uint32 len);
 
 extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry(Oid relid);
-extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(bool shared,
-														   Oid reloid);
+extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_by_locator(RelFileLocator locator);
+extern PgStat_StatTabEntry *pgstat_fetch_stat_tabentry_ext(Oid reloid);
 extern PgStat_TableStatus *find_tabstat_entry(Oid rel_id);
 
 
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 5c1ce4d3d6a..7b24928b00d 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -764,6 +764,7 @@ extern void PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state);
 extern bool pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_relation_delete_pending_cb(PgStat_EntryRef *entry_ref);
 extern void pgstat_relation_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
+extern bool pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator);
 
 
 /*
diff --git a/src/test/recovery/t/029_stats_restart.pl b/src/test/recovery/t/029_stats_restart.pl
index 021e2bf361f..3a9c05eaf10 100644
--- a/src/test/recovery/t/029_stats_restart.pl
+++ b/src/test/recovery/t/029_stats_restart.pl
@@ -55,10 +55,10 @@ trigger_funcrel_stat();
 
 # verify stats objects exist
 $sect = "initial";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -79,10 +79,10 @@ copy($og_stats, $statsfile) or die "Copy failed: $!";
 $node->start;
 
 $sect = "copy";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 $node->stop('immediate');
@@ -96,10 +96,10 @@ $node->start;
 
 # stats should have been discarded
 $sect = "post immediate";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 # get rid of backup statsfile
@@ -110,10 +110,10 @@ unlink $statsfile or die "cannot unlink $statsfile $!";
 trigger_funcrel_stat();
 
 $sect = "post immediate, new";
-is(have_stats('database', $dboid, 0), 't', "$sect: db stats do exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 't', "$sect: db stats do exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	't', "$sect: function stats do exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	't', "$sect: relation stats do exist");
 
 # regular shutdown
@@ -129,10 +129,10 @@ $node->start;
 
 # no stats present due to invalid stats file
 $sect = "invalid_overwrite";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -145,10 +145,10 @@ append_file($og_stats, "XYZ");
 $node->start;
 
 $sect = "invalid_append";
-is(have_stats('database', $dboid, 0), 'f', "$sect: db stats do not exist");
-is(have_stats('function', $dboid, $funcoid),
+is(have_stats($connect_db, 'database', $dboid, 0), 'f', "$sect: db stats do not exist");
+is(have_stats($db_under_test, 'function', $dboid, $funcoid),
 	'f', "$sect: function stats do not exist");
-is(have_stats('relation', $dboid, $tableoid),
+is(have_stats($db_under_test, 'relation', $dboid, $tableoid),
 	'f', "$sect: relation stats do not exist");
 
 
@@ -307,9 +307,9 @@ sub trigger_funcrel_stat
 
 sub have_stats
 {
-	my ($kind, $dboid, $objid) = @_;
+	my ($db, $kind, $dboid, $objid) = @_;
 
-	return $node->safe_psql($connect_db,
+	return $node->safe_psql($db,
 		"SELECT pg_stat_have_stats('$kind', $dboid, $objid)");
 }
 
-- 
2.34.1

v10-0002-handle-relation-statistics-correctly-during-rewr.patchtext/x-diff; charset=us-asciiDownload
From 272952b5631f092d9bcfdc7865dbf7ae9b8a2b23 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 4 Nov 2025 13:52:46 +0000
Subject: [PATCH v10 2/2] handle relation statistics correctly during rewrites

Now that PGSTAT_KIND_RELATION is keyed by refilenode, we need to handle rewrites.

To do so, this patch:

- Adds PgStat_PendingRewrite, a new struct to track rewrite operations within
a transaction, storing the old locator, new locator, and original locator (for
rewrite chains). This allows stats to be copied from the original location to
the final location at commit time.

- Adds a new function, pgstat_mark_rewrite(), called when a table rewrite begins.
It records the rewrite operation in a local list and detects rewrite chains by
checking if the old_locator matches any existing new_locator, preserving the
chain's original_locator.

- Modifies pgstat_copy_relation_stats(), to accept RelFileLocators instead of
Relations, with a new increment parameter to accumulate stats (needed for rewrite
chains with DML between rewrites).

- Ensures that AtEOXact_PgStat_Relations(), AtPrepare_PgStat_Relations(),
pgstat_twophase_postcommit()/postabort() pgstat_drop_relation() handle the
PgStat_PendingRewrite list correctly.

Note that due to the new flush call in pgstat_twophase_postcommit() we can not
call GetCurrentTransactionStopTimestamp() in pgstat_relation_flush_cb(). So,
adding a check to handle this special case and call GetCurrentTimestamp() instead.
Note that we'd call GetCurrentTimestamp() only if there is a rewrite, so that
the GetCurrentTimestamp() extra cost should be negligible. Another solution
could be to trigger the flush from FinishPreparedTransaction() but that's not
worth the extra complexity.

The new pending_rewrites list is traversed in multiple places. The overhead
should be negligible in comparison to a rewrite and the list should not contain
a lot of rewrites in practice.

The pending_rewrites list is traversed in multiple places. In typical usage,
the list will contain only a few entries so the traversal cost is negligible (
furthermore in comparison to a rewrite).
---
 src/backend/catalog/index.c                  |   2 +-
 src/backend/commands/cluster.c               |   5 +
 src/backend/commands/tablecmds.c             |   6 +
 src/backend/utils/activity/pgstat_relation.c | 391 ++++++++++++++++++-
 src/backend/utils/activity/pgstat_xact.c     |  25 +-
 src/backend/utils/cache/relcache.c           |   6 +
 src/include/pgstat.h                         |   5 +-
 src/tools/pgindent/typedefs.list             |   1 +
 8 files changed, 424 insertions(+), 17 deletions(-)
  92.8% src/backend/utils/activity/
   4.9% src/backend/

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8dea58ad96b..b71925a22c3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1795,7 +1795,7 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 	changeDependenciesOn(RelationRelationId, oldIndexId, newIndexId);
 
 	/* copy over statistics from old to new index */
-	pgstat_copy_relation_stats(newClassRel, oldClassRel);
+	pgstat_copy_relation_stats(newClassRel->rd_locator, oldClassRel->rd_locator, false);
 
 	/* Copy data of pg_statistic from the old index to the new one */
 	CopyStatistics(oldIndexId, newIndexId);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 2120c85ccb4..6155b12afab 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1196,6 +1196,11 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 
 		rel1 = relation_open(r1, NoLock);
 		rel2 = relation_open(r2, NoLock);
+
+		/* Mark that a rewrite happened */
+		if (RELKIND_HAS_STORAGE(rel1->rd_rel->relkind))
+			pgstat_mark_rewrite(rel1->rd_locator, rel2->rd_locator);
+
 		rel2->rd_createSubid = rel1->rd_createSubid;
 		rel2->rd_newRelfilelocatorSubid = rel1->rd_newRelfilelocatorSubid;
 		rel2->rd_firstRelfilelocatorSubid = rel1->rd_firstRelfilelocatorSubid;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b1a00ed477..9de70f321ed 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16884,6 +16884,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	Oid			reltoastrelid;
 	RelFileNumber newrelfilenumber;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator;
 	List	   *reltoastidxids = NIL;
 	ListCell   *lc;
 
@@ -16922,6 +16923,7 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 	newrlocator = rel->rd_locator;
 	newrlocator.relNumber = newrelfilenumber;
 	newrlocator.spcOid = newTableSpace;
+	oldrlocator = rel->rd_locator;
 
 	/* hand off to AM to actually create new rel storage and copy the data */
 	if (rel->rd_rel->relkind == RELKIND_INDEX)
@@ -16934,6 +16936,10 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 		table_relation_copy_data(rel, &newrlocator);
 	}
 
+	/* mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Update the pg_class row.
 	 *
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index a267e93f8be..123fb50d98f 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -30,6 +30,19 @@
 #include "utils/syscache.h"
 #include "utils/timestamp.h"
 
+/* Pending rewrite operations for stats copying */
+typedef struct PgStat_PendingRewrite
+{
+	RelFileLocator old_locator;
+	RelFileLocator new_locator;
+	RelFileLocator original_locator;
+	int			nest_level;		/* Transaction nesting level where rewrite
+								 * occurred */
+	struct PgStat_PendingRewrite *next;
+} PgStat_PendingRewrite;
+
+/* The pending rewrites list for current transaction */
+static PgStat_PendingRewrite *pending_rewrites = NULL;
 
 /* Record that's written to 2PC state file when pgstat state is persisted */
 typedef struct TwoPhasePgStatRecord
@@ -43,6 +56,8 @@ typedef struct TwoPhasePgStatRecord
 	PgStat_Counter deleted_pre_truncdrop;
 	RelFileLocator locator;		/* table's rd_locator */
 	bool		truncdropped;	/* was the relation truncated/dropped? */
+	RelFileLocator rewrite_old_locator;
+	int			rewrite_nest_level;
 } TwoPhasePgStatRecord;
 
 
@@ -54,27 +69,70 @@ static void restore_truncdrop_counters(PgStat_TableXactStatus *trans);
 
 
 /*
- * Copy stats between relations. This is used for things like REINDEX
+ * Copy stats between RelFileLocator. This is used for things like REINDEX
  * CONCURRENTLY.
  */
 void
-pgstat_copy_relation_stats(Relation dst, Relation src)
+pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment)
 {
 	PgStat_StatTabEntry *srcstats;
 	PgStatShared_Relation *dstshstats;
 	PgStat_EntryRef *dst_ref;
 
-	srcstats = pgstat_fetch_stat_tabentry_ext(RelationGetRelid(src));
+	srcstats = (PgStat_StatTabEntry *) pgstat_fetch_entry(PGSTAT_KIND_RELATION,
+														  src.dbOid,
+														  RelFileLocatorToPgStatObjid(src));
 	if (!srcstats)
 		return;
 
 	dst_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_RELATION,
-										  dst->rd_rel->relisshared ? InvalidOid : MyDatabaseId,
-										  RelationGetRelid(dst),
+										  dst.dbOid,
+										  RelFileLocatorToPgStatObjid(dst),
 										  false);
 
 	dstshstats = (PgStatShared_Relation *) dst_ref->shared_stats;
-	dstshstats->stats = *srcstats;
+
+	if (!increment)
+		dstshstats->stats = *srcstats;
+	else
+	{
+		/* Increment those statistics */
+#define RELFSTAT_ACC(fld, stats_to_add) \
+	(dstshstats->stats.fld += stats_to_add->fld)
+		RELFSTAT_ACC(numscans, srcstats);
+		RELFSTAT_ACC(tuples_returned, srcstats);
+		RELFSTAT_ACC(tuples_fetched, srcstats);
+		RELFSTAT_ACC(tuples_inserted, srcstats);
+		RELFSTAT_ACC(tuples_updated, srcstats);
+		RELFSTAT_ACC(tuples_deleted, srcstats);
+		RELFSTAT_ACC(tuples_hot_updated, srcstats);
+		RELFSTAT_ACC(tuples_newpage_updated, srcstats);
+		RELFSTAT_ACC(live_tuples, srcstats);
+		RELFSTAT_ACC(dead_tuples, srcstats);
+		RELFSTAT_ACC(mod_since_analyze, srcstats);
+		RELFSTAT_ACC(ins_since_vacuum, srcstats);
+		RELFSTAT_ACC(blocks_fetched, srcstats);
+		RELFSTAT_ACC(blocks_hit, srcstats);
+		RELFSTAT_ACC(vacuum_count, srcstats);
+		RELFSTAT_ACC(autovacuum_count, srcstats);
+		RELFSTAT_ACC(analyze_count, srcstats);
+		RELFSTAT_ACC(autoanalyze_count, srcstats);
+		RELFSTAT_ACC(total_vacuum_time, srcstats);
+		RELFSTAT_ACC(total_autovacuum_time, srcstats);
+		RELFSTAT_ACC(total_analyze_time, srcstats);
+		RELFSTAT_ACC(total_autoanalyze_time, srcstats);
+#undef RELFSTAT_ACC
+
+		/* Replace those statistics */
+#define RELFSTAT_REP(fld, stats_to_rep) \
+	(dstshstats->stats.fld = stats_to_rep->fld)
+		RELFSTAT_REP(lastscan, srcstats);
+		RELFSTAT_REP(last_vacuum_time, srcstats);
+		RELFSTAT_REP(last_autovacuum_time, srcstats);
+		RELFSTAT_REP(last_analyze_time, srcstats);
+		RELFSTAT_REP(last_autoanalyze_time, srcstats);
+#undef RELFSTAT_REP
+	}
 
 	pgstat_unlock_entry(dst_ref);
 }
@@ -136,6 +194,7 @@ void
 pgstat_assoc_relation(Relation rel)
 {
 	RelFileLocator locator;
+	PgStat_TableStatus *pgstat_info;
 
 	Assert(rel->pgstat_enabled);
 	Assert(rel->pgstat_info == NULL);
@@ -164,14 +223,54 @@ pgstat_assoc_relation(Relation rel)
 		locator.relNumber = rel->rd_id;
 	}
 
+	/*
+	 * If this relation was rewritten during the current transaction we may be
+	 * reopening it with its new RelFileLocator. In that case, continue using
+	 * the stats entry associated with the old locator rather than creating a
+	 * new one. This ensures all stats from before and after the rewrite are
+	 * tracked in a single entry which will be properly copied to the new
+	 * locator at transaction commit.
+	 */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (locator.dbOid == rewrite->new_locator.dbOid &&
+				locator.spcOid == rewrite->new_locator.spcOid &&
+				locator.relNumber == rewrite->new_locator.relNumber)
+			{
+				pgstat_info = pgstat_prep_relation_pending(rewrite->old_locator);
+				goto found_entry;
+			}
+		}
+	}
+
 	/* Else find or make the PgStat_TableStatus entry, and update link */
-	rel->pgstat_info = pgstat_prep_relation_pending(locator);
+	pgstat_info = pgstat_prep_relation_pending(locator);
+
+found_entry:
+	rel->pgstat_info = pgstat_info;
+
+	/*
+	 * For relations stats, we key by physical file location, not by relation
+	 * OID. This means during operations like ALTER TYPE it's possible that
+	 * the relation OID changes but the relfilenode stays the same (no actual
+	 * rewrite needed). Unlink the old relation first.
+	 */
+	if (pgstat_info->relation != NULL &&
+		pgstat_info->relation != rel)
+	{
+		pgstat_info->relation->pgstat_info = NULL;
+		pgstat_info->relation = NULL;
+	}
 
 	/* don't allow link a stats to multiple relcache entries */
-	Assert(rel->pgstat_info->relation == NULL);
+	Assert(pgstat_info->relation == NULL);
 
 	/* mark this relation as the owner */
-	rel->pgstat_info->relation = rel;
+	pgstat_info->relation = rel;
 }
 
 /*
@@ -214,14 +313,37 @@ pgstat_drop_relation(Relation rel)
 {
 	int			nest_level = GetCurrentTransactionNestLevel();
 	PgStat_TableStatus *pgstat_info;
+	bool		skip_transactional_drop = false;
 
 	/* don't track stats for relations without storage */
 	if (!RELKIND_HAS_STORAGE(rel->rd_rel->relkind))
 		return;
 
-	pgstat_drop_transactional(PGSTAT_KIND_RELATION,
-							  rel->rd_locator.dbOid,
-							  RelFileLocatorToPgStatObjid(rel->rd_locator));
+	/* Check if this drop is part of a pending rewrite */
+	if (pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+
+		for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+		{
+			if (rel->rd_locator.dbOid == rewrite->old_locator.dbOid &&
+				rel->rd_locator.spcOid == rewrite->old_locator.spcOid &&
+				rel->rd_locator.relNumber == rewrite->old_locator.relNumber)
+			{
+				skip_transactional_drop = true;
+				break;
+			}
+		}
+	}
+
+	/*
+	 * If it is part of a rewrite, drop its stats later, for example in
+	 * AtEOXact_PgStat_Relations(), so skip it here.
+	 */
+	if (!skip_transactional_drop)
+		pgstat_drop_transactional(PGSTAT_KIND_RELATION,
+								  rel->rd_locator.dbOid,
+								  RelFileLocatorToPgStatObjid(rel->rd_locator));
 
 	if (!pgstat_should_count_relation(rel))
 		return;
@@ -666,6 +788,48 @@ AtEOXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit)
 		}
 		tabstat->trans = NULL;
 	}
+
+	/* preserve the stats in case of rewrite */
+	if (isCommit && pending_rewrites != NULL)
+	{
+		PgStat_PendingRewrite *rewrite;
+		PgStat_PendingRewrite *prev = NULL;
+		PgStat_PendingRewrite *current = pending_rewrites;
+		PgStat_PendingRewrite *next;
+
+		/* reverse the rewrites list to process in chronological order */
+		while (current != NULL)
+		{
+			next = current->next;
+			current->next = prev;
+			prev = current;
+			current = next;
+		}
+
+		/* now process rewrites in chronological order */
+		for (rewrite = prev; rewrite != NULL; rewrite = rewrite->next)
+		{
+			PgStat_EntryRef *old_entry_ref;
+
+			old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+													   rewrite->old_locator.dbOid,
+													   RelFileLocatorToPgStatObjid(rewrite->old_locator));
+
+			if (old_entry_ref && old_entry_ref->pending)
+				pgstat_relation_flush_cb(old_entry_ref, false);
+
+			pgstat_copy_relation_stats(rewrite->new_locator,
+									   rewrite->old_locator, true);
+
+			/* drop old locator's stats */
+			if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+								   rewrite->old_locator.dbOid,
+								   RelFileLocatorToPgStatObjid(rewrite->old_locator)))
+				pgstat_request_entry_refs_gc();
+		}
+	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -681,6 +845,30 @@ AtEOSubXact_PgStat_Relations(PgStat_SubXactStatus *xact_state, bool isCommit, in
 	PgStat_TableXactStatus *trans;
 	PgStat_TableXactStatus *next_trans;
 
+	/*
+	 * If we don't commit then remove the associated rewrites if any, to keep
+	 * the rewrite chain in sync with what will be eventually committed.
+	 */
+	if (!isCommit)
+	{
+		PgStat_PendingRewrite **rewrite_ptr = &pending_rewrites;
+
+		while (*rewrite_ptr != NULL)
+		{
+			if ((*rewrite_ptr)->nest_level >= nestDepth)
+			{
+				PgStat_PendingRewrite *to_remove = *rewrite_ptr;
+
+				*rewrite_ptr = (*rewrite_ptr)->next;
+				pfree(to_remove);
+			}
+			else
+			{
+				rewrite_ptr = &((*rewrite_ptr)->next);
+			}
+		}
+	}
+
 	for (trans = xact_state->first; trans != NULL; trans = next_trans)
 	{
 		PgStat_TableStatus *tabstat;
@@ -760,11 +948,19 @@ void
 AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 {
 	PgStat_TableXactStatus *trans;
+	PgStat_PendingRewrite *rewrite;
 
+	/*
+	 * For each tabstat, find its matching rewrite and remove it from the
+	 * pending rewrites list. This way, after processing all tabstats, pending
+	 * rewrites will only contain rewrite only transactions.
+	 */
 	for (trans = xact_state->first; trans != NULL; trans = trans->next)
 	{
 		PgStat_TableStatus *tabstat PG_USED_FOR_ASSERTS_ONLY;
 		TwoPhasePgStatRecord record;
+		PgStat_PendingRewrite **rewrite_ptr;
+		bool		found_rewrite = false;
 
 		Assert(trans->nest_level == 1);
 		Assert(trans->upper == NULL);
@@ -784,10 +980,83 @@ AtPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 			record.locator = tabstat->locator;
 
 		record.truncdropped = trans->truncdropped;
+		record.rewrite_nest_level = 0;
+
+		/*
+		 * Look for a matching rewrite and remove it from pending rewrites. We
+		 * check three possible matches:
+		 *
+		 * The new_locator when stats have been added after the rewrite. The
+		 * old_locator when stats have been added before the rewrite but not
+		 * after. The original_locator when this tabstat is part of a rewrite
+		 * chain.
+		 */
+		rewrite_ptr = &pending_rewrites;
+		while (*rewrite_ptr != NULL)
+		{
+			rewrite = *rewrite_ptr;
+
+			if ((record.locator.dbOid == rewrite->new_locator.dbOid &&
+				 record.locator.spcOid == rewrite->new_locator.spcOid &&
+				 record.locator.relNumber == rewrite->new_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->old_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->old_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->old_locator.relNumber) ||
+				(tabstat->locator.dbOid == rewrite->original_locator.dbOid &&
+				 tabstat->locator.spcOid == rewrite->original_locator.spcOid &&
+				 tabstat->locator.relNumber == rewrite->original_locator.relNumber))
+			{
+				/*
+				 * Found matching rewrite. Record the rewrite information and
+				 * remove this rewrite from the list since it's now handled.
+				 */
+				record.rewrite_old_locator = rewrite->original_locator;
+				record.rewrite_nest_level = rewrite->nest_level;
+				record.locator = rewrite->new_locator;
+				found_rewrite = true;
+
+				/* Remove from pending_rewrites list */
+				*rewrite_ptr = rewrite->next;
+				pfree(rewrite);
+				break;
+			}
+			else
+			{
+				/* Move to next rewrite in the list */
+				rewrite_ptr = &(rewrite->next);
+			}
+		}
+
+		/* If no rewrite found, clear the rewrite fields */
+		if (!found_rewrite)
+		{
+			memset(&record.rewrite_old_locator, 0, sizeof(RelFileLocator));
+		}
+
+		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
+							   &record, sizeof(TwoPhasePgStatRecord));
+	}
+
+	/*
+	 * Now process any rewrites still pending. These are rewrite only
+	 * transactions. We need to preserve their stats even though there's no
+	 * tabstat entry for them.
+	 */
+	for (rewrite = pending_rewrites; rewrite != NULL; rewrite = rewrite->next)
+	{
+		TwoPhasePgStatRecord record;
+
+		memset(&record, 0, sizeof(TwoPhasePgStatRecord));
+		record.locator = rewrite->new_locator;
+		record.rewrite_old_locator = rewrite->original_locator;
+		record.rewrite_nest_level = rewrite->nest_level;
+		record.truncdropped = false;
 
 		RegisterTwoPhaseRecord(TWOPHASE_RM_PGSTAT_ID, 0,
 							   &record, sizeof(TwoPhasePgStatRecord));
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -810,6 +1079,8 @@ PostPrepare_PgStat_Relations(PgStat_SubXactStatus *xact_state)
 		tabstat = trans->parent;
 		tabstat->trans = NULL;
 	}
+
+	pending_rewrites = NULL;
 }
 
 /*
@@ -845,6 +1116,29 @@ pgstat_twophase_postcommit(FullTransactionId fxid, uint16 info,
 	pgstat_info->counts.changed_tuples +=
 		rec->tuples_inserted + rec->tuples_updated +
 		rec->tuples_deleted;
+
+	if (rec->rewrite_nest_level > 0)
+	{
+		PgStat_EntryRef *old_entry_ref;
+
+		/* Flush any pending stats for old locator first */
+		old_entry_ref = pgstat_fetch_pending_entry(PGSTAT_KIND_RELATION,
+												   rec->rewrite_old_locator.dbOid,
+												   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator));
+
+		if (old_entry_ref && old_entry_ref->pending)
+			pgstat_relation_flush_cb(old_entry_ref, false);
+
+		/* Copy stats from old to new locator */
+		pgstat_copy_relation_stats(rec->locator, rec->rewrite_old_locator,
+								   true);
+
+		/* Drop old locator's stats */
+		if (!pgstat_drop_entry(PGSTAT_KIND_RELATION,
+							   rec->rewrite_old_locator.dbOid,
+							   RelFileLocatorToPgStatObjid(rec->rewrite_old_locator)))
+			pgstat_request_entry_refs_gc();
+	}
 }
 
 /*
@@ -859,9 +1153,26 @@ pgstat_twophase_postabort(FullTransactionId fxid, uint16 info,
 {
 	TwoPhasePgStatRecord *rec = (TwoPhasePgStatRecord *) recdata;
 	PgStat_TableStatus *pgstat_info;
+	RelFileLocator target_locator;
+
+	/*
+	 * For aborted transactions with rewrites (like TRUNCATE), we need to
+	 * restore stats to the old locator, not the new one. The new locator
+	 * should be dropped since the rewrite is being rolled back.
+	 */
+	if (rec->rewrite_nest_level > 0)
+	{
+		/* Use the old locator */
+		target_locator = rec->rewrite_old_locator;
+	}
+	else
+	{
+		/* No rewrite, use the original locator */
+		target_locator = rec->locator;
+	}
 
 	/* Find or create a tabstat entry for the target locator */
-	pgstat_info = pgstat_prep_relation_pending(rec->locator);
+	pgstat_info = pgstat_prep_relation_pending(target_locator);
 
 	/* Same math as in AtEOXact_PgStat, abort case */
 	if (rec->truncdropped)
@@ -916,7 +1227,17 @@ pgstat_relation_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 	tabentry->numscans += lstats->counts.numscans;
 	if (lstats->counts.numscans)
 	{
-		TimestampTz t = GetCurrentTransactionStopTimestamp();
+		TimestampTz t;
+
+		/*
+		 * Checking the transaction state due to the flush call in
+		 * pgstat_twophase_postcommit() that would break the assertion on the
+		 * state in GetCurrentTransactionStopTimestamp().
+		 */
+		if (!IsTransactionState())
+			t = GetCurrentTransactionStopTimestamp();
+		else
+			t = GetCurrentTimestamp();
 
 		if (t > tabentry->lastscan)
 			tabentry->lastscan = t;
@@ -1167,3 +1488,45 @@ pgstat_reloid_to_relfilelocator(Oid reloid, RelFileLocator *locator)
 	ReleaseSysCache(tuple);
 	return result;
 }
+
+/*
+ * Mark that a relation rewrite has occurred, preserving the original locator
+ * so stats can be copied at transaction commit.
+ */
+void
+pgstat_mark_rewrite(RelFileLocator old_locator, RelFileLocator new_locator)
+{
+	PgStat_PendingRewrite *rewrite;
+	PgStat_PendingRewrite *existing;
+	RelFileLocator original_locator = old_locator;
+
+	for (existing = pending_rewrites; existing != NULL; existing = existing->next)
+	{
+		if (old_locator.dbOid == existing->new_locator.dbOid &&
+			old_locator.spcOid == existing->new_locator.spcOid &&
+			old_locator.relNumber == existing->new_locator.relNumber)
+		{
+			original_locator = existing->original_locator;
+			break;
+		}
+	}
+
+	/* Allocate in TopTransactionContext memory context */
+	rewrite = MemoryContextAlloc(TopTransactionContext,
+								 sizeof(PgStat_PendingRewrite));
+
+	rewrite->old_locator = old_locator;
+	rewrite->new_locator = new_locator;
+	rewrite->original_locator = original_locator;
+	rewrite->nest_level = GetCurrentTransactionNestLevel();
+
+	/* Add to the list */
+	rewrite->next = pending_rewrites;
+	pending_rewrites = rewrite;
+}
+
+void
+pgstat_clear_rewrite(void)
+{
+	pending_rewrites = NULL;
+}
diff --git a/src/backend/utils/activity/pgstat_xact.c b/src/backend/utils/activity/pgstat_xact.c
index bc9864bd8d9..f8cf3755ce2 100644
--- a/src/backend/utils/activity/pgstat_xact.c
+++ b/src/backend/utils/activity/pgstat_xact.c
@@ -55,6 +55,8 @@ AtEOXact_PgStat(bool isCommit, bool parallel)
 	}
 	pgStatXactStack = NULL;
 
+	pgstat_clear_rewrite();
+
 	/* Make sure any stats snapshot is thrown away */
 	pgstat_clear_snapshot();
 }
@@ -360,8 +362,29 @@ create_drop_transactional_internal(PgStat_Kind kind, Oid dboid, uint64 objid, bo
 void
 pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 objid)
 {
-	if (pgstat_get_entry_ref(kind, dboid, objid, false, NULL))
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+
+	if (entry_ref)
 	{
+		/*
+		 * For relations stats, we key by physical file location, not by
+		 * relation OID. This means during operations like ALTER TYPE where
+		 * the relation OID changes but the relfilenode stays the same (no
+		 * actual rewrite needed), we'll find an existing entry.
+		 *
+		 * This is expected behavior, we want to preserve stats across the
+		 * catalog change. Simply reset and recreate the entry for the new
+		 * relation OID without warning.
+		 */
+		if (kind == PGSTAT_KIND_RELATION)
+		{
+			pgstat_reset(kind, dboid, objid);
+			create_drop_transactional_internal(kind, dboid, objid, true);
+			return;
+		}
+
 		ereport(WARNING,
 				errmsg("resetting existing statistics for kind %s, db=%u, oid=%" PRIu64,
 					   (pgstat_get_kind_info(kind))->name, dboid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 2d0cb7bcfd4..c98e5c51d63 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -85,6 +85,7 @@
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "utils/pgstat_internal.h"
 #include "utils/relmapper.h"
 #include "utils/resowner.h"
 #include "utils/snapmgr.h"
@@ -3780,6 +3781,7 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 	MultiXactId minmulti = InvalidMultiXactId;
 	TransactionId freezeXid = InvalidTransactionId;
 	RelFileLocator newrlocator;
+	RelFileLocator oldrlocator = relation->rd_locator;
 
 	if (!IsBinaryUpgrade)
 	{
@@ -3951,6 +3953,10 @@ RelationSetNewRelfilenumber(Relation relation, char persistence)
 
 	table_close(pg_class, RowExclusiveLock);
 
+	/* Mark that a rewrite happened */
+	if (RELKIND_HAS_STORAGE(relation->rd_rel->relkind))
+		pgstat_mark_rewrite(oldrlocator, newrlocator);
+
 	/*
 	 * Make the pg_class row change or relation map change visible.  This will
 	 * cause the relcache entry to get updated, too.
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3102f86aa24..8750c025bbe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -670,7 +670,7 @@ extern PgStat_FunctionCounts *find_funcstat_entry(Oid func_id);
 
 extern void pgstat_create_relation(Relation rel);
 extern void pgstat_drop_relation(Relation rel);
-extern void pgstat_copy_relation_stats(Relation dst, Relation src);
+extern void pgstat_copy_relation_stats(RelFileLocator dst, RelFileLocator src, bool increment);
 
 extern void pgstat_init_relation(Relation rel);
 extern void pgstat_assoc_relation(Relation rel);
@@ -682,6 +682,9 @@ extern void pgstat_report_vacuum(Relation rel, PgStat_Counter livetuples,
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter, TimestampTz starttime);
+extern void pgstat_mark_rewrite(RelFileLocator old_locator,
+								RelFileLocator new_locator);
+extern void pgstat_clear_rewrite(void);
 
 /*
  * If stats are enabled, but pending data hasn't been prepared yet, call
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 04845d5e680..3184e9c464b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2269,6 +2269,7 @@ PgStat_KindInfo
 PgStat_LocalState
 PgStat_PendingDroppedStatsItem
 PgStat_PendingIO
+PgStat_PendingRewrite
 PgStat_SLRUStats
 PgStat_ShmemControl
 PgStat_Snapshot
-- 
2.34.1