per backend I/O statistics

Started by Bertrand Drouvotover 1 year ago94 messages
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com
2 attachment(s)

Hi hackers,

Please find attached a patch to implement $SUBJECT.

While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
pg_my_stat_io view to display "my" backend I/O statistics and a new
pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
backend pid.

By having the per backend level of granularity, one could for example identify
which running backend is responsible for most of the reads, most of the extends
and so on... The pg_my_stat_io view could also be useful to check the
impact on the I/O made by some operations, queries,... in the current session.

Some remarks:

- it is split in 2 sub patches: 0001 introducing the necessary changes to provide
the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function.
- the idea of having per backend I/O statistics has already been mentioned in
[1]: /messages/by-id/20230309003438.rectf7xo7pw5t5cj@awork3.anarazel.de

Some implementation choices:

- The KIND_IO stats are still "fixed amount" ones as the maximum number of
backend is fixed.
- The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build for all the backend stats (that
could be memory expensive depending on the number of max connections and given
the fact that PgStat_IO is 16KB long).
- The above point means that pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory).
- The above 2 points are also the reasons why the pg_my_stat_io view has been
added (as its results takes care of the stats_fetch_consistency setting). I think
that makes sense to rely on it in that case, while I'm not sure that would make
a lot of sense to retrieve other's backend I/O stats and taking care of
stats_fetch_consistency.

[1]: /messages/by-id/20230309003438.rectf7xo7pw5t5cj@awork3.anarazel.de

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 04c0a9ed462850a620df61e5d0dc5053717f2b26 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 22 Aug 2024 15:16:50 +0000
Subject: [PATCH v1 1/2] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds a new
pg_my_stat_io view to display "my" backend I/O statistics. The KIND_IO stats
are still "fixed amount" ones as the maximum number of backend is fixed.

The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build for all the backend stats (that
could be memory expensive depending of the number of max connections and given
the fact that PgStat_IO is 16K bytes long).

A subsequent commit will add a new pg_stat_get_backend_io() function to be
able to retrieve the I/O statistics for a given backend pid.

Bump catalog version.
---
 doc/src/sgml/config.sgml                  |  4 +-
 doc/src/sgml/monitoring.sgml              | 28 +++++++++
 src/backend/catalog/system_views.sql      | 24 +++++++-
 src/backend/utils/activity/pgstat.c       | 10 ++-
 src/backend/utils/activity/pgstat_io.c    | 75 +++++++++++++++++++----
 src/backend/utils/activity/pgstat_shmem.c |  4 +-
 src/backend/utils/adt/pgstatfuncs.c       | 18 +++++-
 src/include/catalog/catversion.h          |  2 +-
 src/include/catalog/pg_proc.dat           |  8 +--
 src/include/pgstat.h                      |  8 ++-
 src/include/utils/pgstat_internal.h       | 13 ++--
 src/test/regress/expected/rules.out       | 21 ++++++-
 src/test/regress/expected/stats.out       | 44 ++++++++++++-
 src/test/regress/sql/stats.sql            | 25 +++++++-
 14 files changed, 245 insertions(+), 39 deletions(-)
  10.7% doc/src/sgml/
   4.1% src/backend/catalog/
  30.9% src/backend/utils/activity/
   5.2% src/backend/utils/adt/
   7.9% src/include/catalog/
   3.4% src/include/utils/
  22.0% src/test/regress/expected/
  12.7% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0aec11f443..2a59b97093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8331,7 +8331,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
+        <structname>pg_stat_io</structname></link>,
+        <link linkend="monitoring-pg-my-stat-io-view">
+        <structname>pg_my_stat_io</structname></link>, in the output of
         <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
         is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 55417a6fa9..27d2548d61 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -488,6 +488,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_my_stat_io</structname><indexterm><primary>pg_my_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of context and target object containing
+       my backend I/O statistics.
+       See <link linkend="monitoring-pg-my-stat-io-view">
+       <structname>pg_my_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -2875,6 +2885,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
 
 
+ </sect2>
+
+ <sect2 id="monitoring-pg-my-stat-io-view">
+  <title><structname>pg_my_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_my_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_my_stat_io</structname> view will contain one row for each
+   combination of target I/O object and I/O context, showing
+   my backend I/O statistics. Combinations which do not make sense are
+   omitted. The fields are exactly the same as the ones in the <link
+   linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+   view.
+  </para>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-bgwriter-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 19cabc9a47..7fa050893a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1166,7 +1166,29 @@ SELECT
        b.fsyncs,
        b.fsync_time,
        b.stats_reset
-FROM pg_stat_get_io() b;
+FROM pg_stat_get_io(true) b;
+
+CREATE VIEW pg_my_stat_io AS
+SELECT
+       b.backend_type,
+       b.object,
+       b.context,
+       b.reads,
+       b.read_time,
+       b.writes,
+       b.write_time,
+       b.writebacks,
+       b.writeback_time,
+       b.extends,
+       b.extend_time,
+       b.op_bytes,
+       b.hits,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.fsync_time,
+       b.stats_reset
+FROM pg_stat_get_io(false) b;
 
 CREATE VIEW pg_stat_wal AS
     SELECT
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b2ca3f39b7..b0905e6bf1 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -406,10 +406,11 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 
 		.fixed_amount = true,
 
-		.snapshot_ctl_off = offsetof(PgStat_Snapshot, io),
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, global_io),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, io),
-		.shared_data_off = offsetof(PgStatShared_IO, stats),
-		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
+		/* [de-]serialize global_stats only */
+		.shared_data_off = offsetof(PgStatShared_IO, global_stats),
+		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->global_stats),
 
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
@@ -580,6 +581,9 @@ pgstat_shutdown_hook(int code, Datum arg)
 
 	pgstat_report_stat(true);
 
+	/* reset the pgstat_io counter related to this proc number */
+	pgstat_reset_io_counter_internal(MyProcNumber, GetCurrentTimestamp());
+
 	/* there shouldn't be any pending changes left */
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 8af55989ee..43280f4892 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -31,6 +31,7 @@ typedef struct PgStat_PendingIO
 static PgStat_PendingIO PendingIOStats;
 bool		have_iostats = false;
 
+void		pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Check that stats have not been counted for any combination of IOObject,
@@ -154,11 +155,19 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 PgStat_IO *
-pgstat_fetch_stat_io(void)
+pgstat_fetch_global_stat_io(void)
 {
 	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
 
-	return &pgStatLocal.snapshot.io;
+	return &pgStatLocal.snapshot.global_io;
+}
+
+PgStat_IO *
+pgstat_fetch_my_stat_io(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
+
+	return &pgStatLocal.snapshot.my_io;
 }
 
 /*
@@ -174,13 +183,16 @@ pgstat_flush_io(bool nowait)
 {
 	LWLock	   *bktype_lock;
 	PgStat_BktypeIO *bktype_shstats;
+	PgStat_BktypeIO *global_bktype_shstats;
 
 	if (!have_iostats)
 		return false;
 
 	bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType];
 	bktype_shstats =
-		&pgStatLocal.shmem->io.stats.stats[MyBackendType];
+		&pgStatLocal.shmem->io.stats[MyProcNumber].stats[MyBackendType];
+	global_bktype_shstats =
+		&pgStatLocal.shmem->io.global_stats.stats[MyBackendType];
 
 	if (!nowait)
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
@@ -194,19 +206,28 @@ pgstat_flush_io(bool nowait)
 			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 			{
 				instr_time	time;
+				PgStat_Counter counter;
+
+				counter = PendingIOStats.counts[io_object][io_context][io_op];
+
+				bktype_shstats->counts[io_object][io_context][io_op] += counter;
 
-				bktype_shstats->counts[io_object][io_context][io_op] +=
-					PendingIOStats.counts[io_object][io_context][io_op];
+				global_bktype_shstats->counts[io_object][io_context][io_op] +=
+					counter;
 
 				time = PendingIOStats.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
+
+				global_bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
 
 	Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
+	Assert(pgstat_bktype_io_stats_valid(global_bktype_shstats, MyBackendType));
 
 	LWLockRelease(bktype_lock);
 
@@ -260,13 +281,38 @@ pgstat_io_init_shmem_cb(void *stats)
 		LWLockInitialize(&stat_shmem->locks[i], LWTRANCHE_PGSTATS_DATA);
 }
 
+void
+pgstat_reset_io_counter_internal(int index, TimestampTz ts)
+{
+	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
+	{
+		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats[index].stats[i];
+
+		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+
+		/*
+		 * Use the lock in the first BackendType's PgStat_BktypeIO to protect
+		 * the reset timestamp as well.
+		 */
+		if (i == 0)
+			pgStatLocal.shmem->io.stats[index].stat_reset_timestamp = ts;
+
+		memset(bktype_shstats, 0, sizeof(*bktype_shstats));
+		LWLockRelease(bktype_lock);
+	}
+}
+
 void
 pgstat_io_reset_all_cb(TimestampTz ts)
 {
+	for (int i = 0; i < NumProcStatSlots; i++)
+		pgstat_reset_io_counter_internal(i, ts);
+
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
 
@@ -275,7 +321,7 @@ pgstat_io_reset_all_cb(TimestampTz ts)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.shmem->io.stats.stat_reset_timestamp = ts;
+			pgStatLocal.shmem->io.global_stats.stat_reset_timestamp = ts;
 
 		memset(bktype_shstats, 0, sizeof(*bktype_shstats));
 		LWLockRelease(bktype_lock);
@@ -288,8 +334,10 @@ pgstat_io_snapshot_cb(void)
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
-		PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io.stats[i];
+		PgStat_BktypeIO *bktype_global_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
+		PgStat_BktypeIO *bktype_global_snap = &pgStatLocal.snapshot.global_io.stats[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats[MyProcNumber].stats[i];
+		PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.my_io.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_SHARED);
 
@@ -298,10 +346,15 @@ pgstat_io_snapshot_cb(void)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.snapshot.io.stat_reset_timestamp =
-				pgStatLocal.shmem->io.stats.stat_reset_timestamp;
+		{
+			pgStatLocal.snapshot.global_io.stat_reset_timestamp =
+				pgStatLocal.shmem->io.global_stats.stat_reset_timestamp;
+			pgStatLocal.snapshot.my_io.stat_reset_timestamp =
+				pgStatLocal.shmem->io.stats[MyProcNumber].stat_reset_timestamp;
+		}
 
 		/* using struct assignment due to better type safety */
+		*bktype_global_snap = *bktype_global_shstats;
 		*bktype_snap = *bktype_shstats;
 		LWLockRelease(bktype_lock);
 	}
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index ec93bf6902..7cf1c64f9e 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -128,7 +128,7 @@ StatsShmemSize(void)
 {
 	Size		sz;
 
-	sz = MAXALIGN(sizeof(PgStat_ShmemControl));
+	sz = MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_IO), NumProcStatSlots));
 	sz = add_size(sz, pgstat_dsa_init_size());
 
 	/* Add shared memory for all the custom fixed-numbered statistics */
@@ -172,7 +172,7 @@ StatsShmemInit(void)
 		Assert(!found);
 
 		/* the allocation of pgStatLocal.shmem itself */
-		p += MAXALIGN(sizeof(PgStat_ShmemControl));
+		p += MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_IO), NumProcStatSlots));
 
 		/*
 		 * Create a small dsa allocation in plain shared memory. This is
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3221137123..58e321f421 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1359,18 +1359,30 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	ReturnSetInfo *rsinfo;
 	PgStat_IO  *backends_io_stats;
 	Datum		reset_time;
+	bool		is_global;
+
+	is_global = PG_GETARG_BOOL(0);
 
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 
-	backends_io_stats = pgstat_fetch_stat_io();
+	if (is_global)
+		backends_io_stats = pgstat_fetch_global_stat_io();
+	else
+		backends_io_stats = pgstat_fetch_my_stat_io();
 
 	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
 
 	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
 	{
-		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
-		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
+		Datum		bktype_desc;
+		PgStat_BktypeIO *bktype_stats;
+
+		if (!is_global && bktype != MyBackendType)
+			continue;
+
+		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		bktype_stats = &backends_io_stats->stats[bktype];
 
 		/*
 		 * In Assert builds, we can afford an extra loop through all of the
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 1980d492c3..fbee0db2eb 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202408301
+#define CATALOG_VERSION_NO	202409021
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 85f42be1b3..1184341fc8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5825,10 +5825,10 @@
 { oid => '6214', descr => 'statistics: per backend type IO statistics',
   proname => 'pg_stat_get_io', prorows => '30', proretset => 't',
   provolatile => 'v', proparallel => 'r', prorettype => 'record',
-  proargtypes => '',
-  proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  proargtypes => 'bool',
+  proallargtypes => '{bool,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{is_global,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
 { oid => '1136', descr => 'statistics: information about WAL activity',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index f63159c55c..9ec4053a3e 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -15,6 +15,7 @@
 #include "datatype/timestamp.h"
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
+#include "storage/proc.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -75,6 +76,8 @@
  */
 #define PGSTAT_KIND_EXPERIMENTAL	128
 
+#define  NumProcStatSlots (MaxBackends + NUM_AUXILIARY_PROCS)
+
 static inline bool
 pgstat_is_kind_builtin(PgStat_Kind kind)
 {
@@ -556,7 +559,8 @@ extern instr_time pgstat_prepare_io_time(bool track_io_guc);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time, uint32 cnt);
 
-extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern PgStat_IO *pgstat_fetch_global_stat_io(void);
+extern PgStat_IO *pgstat_fetch_my_stat_io(void);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
@@ -565,7 +569,7 @@ extern bool pgstat_tracks_io_object(BackendType bktype,
 									IOObject io_object, IOContext io_context);
 extern bool pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 								IOContext io_context, IOOp io_op);
-
+extern void pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Functions in pgstat_database.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index fb132e439d..1eeb8aa107 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -347,11 +347,12 @@ typedef struct PgStatShared_Checkpointer
 typedef struct PgStatShared_IO
 {
 	/*
-	 * locks[i] protects stats.stats[i]. locks[0] also protects
-	 * stats.stat_reset_timestamp.
+	 * locks[i] protects global_stats.stats[i]. locks[0] also protects
+	 * global_stats.stat_reset_timestamp.
 	 */
 	LWLock		locks[BACKEND_NUM_TYPES];
-	PgStat_IO	stats;
+	PgStat_IO	global_stats;
+	PgStat_IO	stats[FLEXIBLE_ARRAY_MEMBER];
 } PgStatShared_IO;
 
 typedef struct PgStatShared_SLRU
@@ -444,7 +445,6 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Archiver archiver;
 	PgStatShared_BgWriter bgwriter;
 	PgStatShared_Checkpointer checkpointer;
-	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
 
@@ -454,6 +454,8 @@ typedef struct PgStat_ShmemControl
 	 */
 	void	   *custom_data[PGSTAT_KIND_CUSTOM_SIZE];
 
+	/* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
+	PgStatShared_IO io;
 } PgStat_ShmemControl;
 
 
@@ -475,7 +477,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_CheckpointerStats checkpointer;
 
-	PgStat_IO	io;
+	PgStat_IO	my_io;
+	PgStat_IO	global_io;
 
 	PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS];
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 862433ee52..30dcb80f7b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1398,6 +1398,25 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_my_stat_io| SELECT backend_type,
+    object,
+    context,
+    reads,
+    read_time,
+    writes,
+    write_time,
+    writebacks,
+    writeback_time,
+    extends,
+    extend_time,
+    op_bytes,
+    hits,
+    evictions,
+    reuses,
+    fsyncs,
+    fsync_time,
+    stats_reset
+   FROM pg_stat_get_io(false) b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
@@ -1902,7 +1921,7 @@ pg_stat_io| SELECT backend_type,
     fsyncs,
     fsync_time,
     stats_reset
-   FROM pg_stat_get_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
+   FROM pg_stat_get_io(true) b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_stat_progress_analyze| SELECT s.pid,
     s.datid,
     d.datname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..c489e528e0 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1261,9 +1261,14 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1285,16 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1314,23 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1551,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1567,14 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..c95cb71652 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -609,18 +609,26 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,7 +638,13 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
-
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -762,10 +776,15 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
 
 
 -- test BRIN index doesn't block HOT update
-- 
2.34.1

v1-0002-Add-pg_stat_get_backend_io.patchtext/x-diff; charset=us-asciiDownload
From 3e4ea302e9ae554cae76e7a4dc4f90b4503cf842 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 28 Aug 2024 12:59:02 +0000
Subject: [PATCH v1 2/2] Add pg_stat_get_backend_io()

Adding the pg_stat_get_backend_io() function to retrieve I/O statistics for
a particular backend pid. Note this function behaves as if stats_fetch_consistency
is set to none.
---
 doc/src/sgml/monitoring.sgml           |  18 ++++
 src/backend/utils/activity/pgstat_io.c |   6 ++
 src/backend/utils/adt/pgstatfuncs.c    | 120 +++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat        |   8 ++
 src/include/pgstat.h                   |   1 +
 src/test/regress/expected/stats.out    |  25 ++++++
 src/test/regress/sql/stats.sql         |  16 +++-
 7 files changed, 193 insertions(+), 1 deletion(-)
  10.8% doc/src/sgml/
  47.6% src/backend/utils/adt/
   9.2% src/include/catalog/
  15.3% src/test/regress/expected/
  14.4% src/test/regress/sql/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 27d2548d61..ad89d9caa7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4713,6 +4713,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. This function behaves as if <varname>stats_fetch_consistency</varname>
+        is set to <literal>none</literal> (means each execution re-fetches
+        counters from shared memory).
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 43280f4892..8fe8c224a9 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -170,6 +170,12 @@ pgstat_fetch_my_stat_io(void)
 	return &pgStatLocal.snapshot.my_io;
 }
 
+PgStat_IO *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	return &pgStatLocal.shmem->io.stats[procNumber];
+}
+
 /*
  * Flush out locally pending IO statistics
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 58e321f421..502b9662a1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,6 +1474,126 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_IO  *backends_io_stats;
+	Datum		reset_time;
+	ProcNumber	procNumber;
+	PGPROC	   *proc;
+
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	InitMaterializedSRF(fcinfo, 0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/* This backend_pid does not exist */
+
+	if (proc != NULL)
+	{
+		procNumber = GetNumberFromPGProc(proc);
+		backends_io_stats = pgstat_fetch_proc_stat_io(procNumber);
+		reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
+
+		for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
+		{
+			Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+			PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
+
+			/*
+			 * In Assert builds, we can afford an extra loop through all of
+			 * the counters checking that only expected stats are non-zero,
+			 * since it keeps the non-Assert code cleaner.
+			 */
+			Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+			/*
+			 * For those BackendTypes without IO Operation stats, skip
+			 * representing them in the view altogether.
+			 */
+			if (!pgstat_tracks_io_bktype(bktype))
+				continue;
+
+			for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+			{
+				const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+				for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+				{
+					const char *context_name = pgstat_get_io_context_name(io_context);
+
+					Datum		values[IO_NUM_COLUMNS] = {0};
+					bool		nulls[IO_NUM_COLUMNS] = {0};
+
+					/*
+					 * Some combinations of BackendType, IOObject, and
+					 * IOContext are not valid for any type of IOOp. In such
+					 * cases, omit the entire row from the view.
+					 */
+					if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+						continue;
+
+					values[IO_COL_BACKEND_TYPE] = bktype_desc;
+					values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+					values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+					values[IO_COL_RESET_TIME] = TimestampTzGetDatum(reset_time);
+
+					/*
+					 * Hard-code this to the value of BLCKSZ for now. Future
+					 * values could include XLOG_BLCKSZ, once WAL IO is
+					 * tracked, and constant multipliers, once
+					 * non-block-oriented IO (e.g. temporary file IO) is
+					 * tracked.
+					 */
+					values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+					for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+					{
+						int			op_idx = pgstat_get_io_op_index(io_op);
+						int			time_idx = pgstat_get_io_time_index(io_op);
+
+						/*
+						 * Some combinations of BackendType and IOOp, of
+						 * IOContext and IOOp, and of IOObject and IOOp are
+						 * not tracked. Set these cells in the view NULL.
+						 */
+						if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+						{
+							PgStat_Counter count =
+								bktype_stats->counts[io_obj][io_context][io_op];
+
+							values[op_idx] = Int64GetDatum(count);
+						}
+						else
+							nulls[op_idx] = true;
+
+						/* not every operation is timed */
+						if (time_idx == IO_COL_INVALID)
+							continue;
+
+						if (!nulls[op_idx])
+						{
+							PgStat_Counter time =
+								bktype_stats->times[io_obj][io_context][io_op];
+
+							values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+						}
+						else
+							nulls[time_idx] = true;
+					}
+
+					tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+										 values, nulls);
+				}
+			}
+		}
+	}
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1184341fc8..8c1e2d6a46 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5830,6 +5830,14 @@
   proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
   proargnames => '{is_global,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
+{ oid => '8896', descr => 'statistics: per backend type IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '30', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
 
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9ec4053a3e..963fa36be9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -561,6 +561,7 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 
 extern PgStat_IO *pgstat_fetch_global_stat_io(void);
 extern PgStat_IO *pgstat_fetch_my_stat_io(void);
+extern PgStat_IO *pgstat_fetch_proc_stat_io(ProcNumber  procNumber);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index c489e528e0..aced015c2f 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1263,12 +1263,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1293,6 +1299,15 @@ SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
@@ -1553,6 +1568,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1575,6 +1592,14 @@ SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index c95cb71652..d05009e1f5 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -611,12 +611,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -626,6 +632,10 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 SELECT sum(extends) AS my_io_sum_shared_after_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
@@ -778,6 +788,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
@@ -785,7 +797,9 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
   FROM pg_my_stat_io \gset
 SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
-
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
 
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
-- 
2.34.1

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Bertrand Drouvot (#1)
Re: per backend I/O statistics

At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in

Hi hackers,

Please find attached a patch to implement $SUBJECT.

While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
pg_my_stat_io view to display "my" backend I/O statistics and a new
pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
backend pid.

By having the per backend level of granularity, one could for example identify
which running backend is responsible for most of the reads, most of the extends
and so on... The pg_my_stat_io view could also be useful to check the
impact on the I/O made by some operations, queries,... in the current session.

Some remarks:

- it is split in 2 sub patches: 0001 introducing the necessary changes to provide
the pg_my_stat_io view and 0002 to add the pg_stat_get_backend_io() function.
- the idea of having per backend I/O statistics has already been mentioned in
[1] by Andres.

Some implementation choices:

- The KIND_IO stats are still "fixed amount" ones as the maximum number of
backend is fixed.
- The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build for all the backend stats (that
could be memory expensive depending on the number of max connections and given
the fact that PgStat_IO is 16KB long).
- The above point means that pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory).
- The above 2 points are also the reasons why the pg_my_stat_io view has been
added (as its results takes care of the stats_fetch_consistency setting). I think
that makes sense to rely on it in that case, while I'm not sure that would make
a lot of sense to retrieve other's backend I/O stats and taking care of
stats_fetch_consistency.

[1]: /messages/by-id/20230309003438.rectf7xo7pw5t5cj@awork3.anarazel.de

I'm not sure about the usefulness of having the stats only available
from the current session. Since they are stored in shared memory,
shouldn't we make them accessible to all backends? However, this would
introduce permission considerations and could become complex.

When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.

Just to be clear, the above comments are not meant to oppose the
current implementation approach. They are purely for the sake of
discussing comparisons with other possible approaches.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: per backend I/O statistics

At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.

As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#4Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: per backend I/O statistics

Hi,

On Tue, Sep 03, 2024 at 03:37:49PM +0900, Kyotaro Horiguchi wrote:

At Mon, 2 Sep 2024 14:55:52 +0000, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote in

Hi hackers,

Please find attached a patch to implement $SUBJECT.

While pg_stat_io provides cluster-wide I/O statistics, this patch adds a new
pg_my_stat_io view to display "my" backend I/O statistics and a new
pg_stat_get_backend_io() function to retrieve the I/O statistics for a given
backend pid.

I'm not sure about the usefulness of having the stats only available
from the current session. Since they are stored in shared memory,
shouldn't we make them accessible to all backends?

Thanks for the feedback!

The stats are accessible to all backends thanks to 0002 and the introduction
of the pg_stat_get_backend_io() function.

However, this would
introduce permission considerations and could become complex.

Not sure that the data exposed here is sensible enough to consider permission
restriction.

When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.

The pros of using the fixed amount are:

- less code change (I think as I did not write the non fixed counterpart)
- probably better performance and less scalabilty impact (in case of high rate
of backends creation/ deletion)

Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.

"Conceptually" speaking, we know what the maximum number of backend is, so I
think that using the fixed amount approach makes sense (somehow I think it can
be compared to PGSTAT_KIND_SLRU which relies on SLRU_NUM_ELEMENTS).

Just to be clear, the above comments are not meant to oppose the
current implementation approach. They are purely for the sake of
discussing comparisons with other possible approaches.

No problem at all, thanks for your feedback and sharing your thoughts!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#5Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Kyotaro Horiguchi (#3)
Re: per backend I/O statistics

Hi,

On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:

At Tue, 03 Sep 2024 15:37:49 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

When I first looked at this patch, my initial thought was whether we
should let these stats stay "fixed." The reason why the current
PGSTAT_KIND_IO is fixed is that there is only one global statistics
storage for the entire database. If we have stats for a flexible
number of backends, it would need to be non-fixed, perhaps with the
entry for INVALID_PROC_NUMBER storing the global I/O stats, I
suppose. However, one concern with that approach would be the impact
on performance due to the frequent creation and deletion of stats
entries caused by high turnover of backends.

As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.

Thanks for the feedback!

If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?

If we allow each session to disable the feature then the administrator
would lost this ability.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Bertrand Drouvot (#4)
Re: per backend I/O statistics

On 2024-Sep-03, Bertrand Drouvot wrote:

Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.

I was surprised by the size you mention so I went to look for the
definition of that struct:

typedef struct PgStat_IO
{
TimestampTz stat_reset_timestamp;
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;

(It would be good to have more than zero comments here BTW)

So what's happening is that struct PgStat_IO stores the data for every
single process type, be it regular backends, backend-like auxiliary
processes, and all other potential postmaster children. So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time). That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.

Another way to think about this might be to say that the B_BACKEND
element of the PgStat_Snapshot.io array should really be spread out to
MaxBackends separate elements. This would make it more difficult to
obtain a total value accumulating ops done by all backends (since it's
require to sum the values of each backend), but it allows you to obtain
backend-specific values, including those of remote backends rather than
only your own, as Kyotaro suggests.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#7Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Alvaro Herrera (#6)
Re: per backend I/O statistics

Hi,

On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote:

On 2024-Sep-03, Bertrand Drouvot wrote:

Cons is probably allocating shared memory space that might not be used (
sizeof(PgStat_IO) is 16392 so that could be a concern for a high number of
unused connection). OTOH, if a high number of connections is not used that might
be worth to reduce the max_connections setting.

I was surprised by the size you mention so I went to look for the
definition of that struct:

Thanks for looking at it!

typedef struct PgStat_IO
{
TimestampTz stat_reset_timestamp;
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;

(It would be good to have more than zero comments here BTW)

So what's happening is that struct PgStat_IO stores the data for every
single process type, be it regular backends, backend-like auxiliary
processes, and all other potential postmaster children.

Yeap.

So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time). That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.

Yeah, that could be an area of improvement. Thanks, I'll look at it.
Currently the filtering is done when retrieving the per backend stats but better
to do it when storing the stats.

Another way to think about this might be to say that the B_BACKEND
element of the PgStat_Snapshot.io array should really be spread out to
MaxBackends separate elements. This would make it more difficult to
obtain a total value accumulating ops done by all backends (since it's
require to sum the values of each backend), but it allows you to obtain
backend-specific values, including those of remote backends rather than
only your own, as Kyotaro suggests.

One backend can already see the stats of the other backends thanks to the
pg_stat_get_backend_io() function (that takes a backend pid as parameter)
that is introduced in the 0002 sub-patch.

I'll ensure that's still the case with the next version of the patch.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#8Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#7)
2 attachment(s)
Re: per backend I/O statistics

Hi,

On Thu, Sep 05, 2024 at 02:14:47PM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Sep 05, 2024 at 03:03:32PM +0200, Alvaro Herrera wrote:

So it seems to
me that storing one of these struct for "my backend stats" is wrong: I
think you should be using PgStat_BktypeIO instead (or maybe another
struct which has a reset timestamp plus BktypeIO, if you care about the
reset time). That struct is only 1024 bytes long, so it seems much more
reasonable to have one of those per backend.

Yeah, that could be an area of improvement. Thanks, I'll look at it.
Currently the filtering is done when retrieving the per backend stats but better
to do it when storing the stats.

I ended up creating (in v2 attached):

"
typedef struct PgStat_Backend_IO
{
TimestampTz stat_reset_timestamp;
BackendType bktype;
PgStat_BktypeIO stats;
} PgStat_Backend_IO;
"

The stat_reset_timestamp is there so that one knows when a particular backend
had its I/O stats reset. Also the backend type is part of the struct to
be able to filter the stats correctly when we display them.

The struct size is 1040 Bytes and that's much more reasonable than the size
needed for per backend I/O stats in v1 (about 16KB).

One backend can already see the stats of the other backends thanks to the
pg_stat_get_backend_io() function (that takes a backend pid as parameter)
that is introduced in the 0002 sub-patch.

0002 still provides the pg_stat_get_backend_io() function so that one could
get the stats of other backends.

Example:

postgres=# select backend_type,object,context,reads,extends,hits from pg_stat_get_backend_io(3779502);
backend_type | object | context | reads | extends | hits
----------------+---------------+-----------+-------+---------+--------
client backend | relation | bulkread | 0 | | 0
client backend | relation | bulkwrite | 0 | 0 | 0
client backend | relation | normal | 73 | 2216 | 504674
client backend | relation | vacuum | 0 | 0 | 0
client backend | temp relation | normal | 0 | 0 | 0
(5 rows)

could be an individual walsender:

postgres=# select pid, backend_type from pg_stat_activity where backend_type = 'walsender';
pid | backend_type
---------+--------------
3779565 | walsender
(1 row)

postgres=# select backend_type,object,context,reads,hits from pg_stat_get_backend_io(3779565);
backend_type | object | context | reads | hits
--------------+---------------+-----------+-------+------
walsender | relation | bulkread | 0 | 0
walsender | relation | bulkwrite | 0 | 0
walsender | relation | normal | 6 | 48
walsender | relation | vacuum | 0 | 0
walsender | temp relation | normal | 0 | 0
(5 rows)

and so on...

Remarks:

- As stated up-thread, the pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory). Indeed, the snapshot is not build in each backend to copy
all the others backends stats, as 1/ there is no use case (there is no need to
get others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.

- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From edc6da2601350edf8647b309ada4ecec4a86dd64 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 22 Aug 2024 15:16:50 +0000
Subject: [PATCH v2 1/2] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds a new
pg_my_stat_io view to display "my" backend I/O statistics. The KIND_IO stats
are still "fixed amount" ones as the maximum number of backend is fixed.

The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build in each backend to copy all
the others backends stats, as 1/ there is no use case (there is no need to get
others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.

A subsequent commit will add a new pg_stat_get_backend_io() function to be
able to retrieve the I/O statistics for a given backend pid (each execution
re-fetches counters from shared memory because as stated above there is no
snapshots being created in each backend to copy the other backends stats).

Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                  |   4 +-
 doc/src/sgml/monitoring.sgml              |  28 ++++++
 src/backend/catalog/system_views.sql      |  22 +++++
 src/backend/utils/activity/pgstat.c       |  11 ++-
 src/backend/utils/activity/pgstat_io.c    |  91 ++++++++++++++---
 src/backend/utils/activity/pgstat_shmem.c |   4 +-
 src/backend/utils/adt/pgstatfuncs.c       | 114 +++++++++++++++++++++-
 src/include/catalog/pg_proc.dat           |   9 ++
 src/include/pgstat.h                      |  14 ++-
 src/include/utils/pgstat_internal.h       |  14 ++-
 src/test/regress/expected/rules.out       |  19 ++++
 src/test/regress/expected/stats.out       |  44 ++++++++-
 src/test/regress/sql/stats.sql            |  25 ++++-
 13 files changed, 366 insertions(+), 33 deletions(-)
   8.5% doc/src/sgml/
  29.4% src/backend/utils/activity/
  23.9% src/backend/utils/adt/
   4.4% src/include/catalog/
   3.1% src/include/utils/
   3.1% src/include/
  14.4% src/test/regress/expected/
  10.0% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0aec11f443..2a59b97093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8331,7 +8331,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
+        <structname>pg_stat_io</structname></link>,
+        <link linkend="monitoring-pg-my-stat-io-view">
+        <structname>pg_my_stat_io</structname></link>, in the output of
         <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
         is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 933de6fe07..b28ca4e0ca 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -488,6 +488,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_my_stat_io</structname><indexterm><primary>pg_my_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of context and target object containing
+       my backend I/O statistics.
+       See <link linkend="monitoring-pg-my-stat-io-view">
+       <structname>pg_my_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -2948,6 +2958,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
 
 
+ </sect2>
+
+ <sect2 id="monitoring-pg-my-stat-io-view">
+  <title><structname>pg_my_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_my_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_my_stat_io</structname> view will contain one row for each
+   combination of target I/O object and I/O context, showing
+   my backend I/O statistics. Combinations which do not make sense are
+   omitted. The fields are exactly the same as the ones in the <link
+   linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+   view.
+  </para>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-bgwriter-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a1..b4939b7bc6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1168,6 +1168,28 @@ SELECT
        b.stats_reset
 FROM pg_stat_get_io() b;
 
+CREATE VIEW pg_my_stat_io AS
+SELECT
+       b.backend_type,
+       b.object,
+       b.context,
+       b.reads,
+       b.read_time,
+       b.writes,
+       b.write_time,
+       b.writebacks,
+       b.writeback_time,
+       b.extends,
+       b.extend_time,
+       b.op_bytes,
+       b.hits,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.fsync_time,
+       b.stats_reset
+FROM pg_stat_get_my_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 178b5ef65a..53b6e0bcd5 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -406,10 +406,12 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 
 		.fixed_amount = true,
 
-		.snapshot_ctl_off = offsetof(PgStat_Snapshot, io),
+		.init_backend_cb = pgstat_io_init_backend_cb,
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, global_io),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, io),
-		.shared_data_off = offsetof(PgStatShared_IO, stats),
-		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
+		/* [de-]serialize global_stats only */
+		.shared_data_off = offsetof(PgStatShared_IO, global_stats),
+		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->global_stats),
 
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
@@ -581,6 +583,9 @@ pgstat_shutdown_hook(int code, Datum arg)
 
 	pgstat_report_stat(true);
 
+	/* reset the pgstat_io counter related to this proc number */
+	pgstat_reset_io_counter_internal(MyProcNumber, GetCurrentTimestamp());
+
 	/* there shouldn't be any pending changes left */
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 8af55989ee..f2b36d8efa 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -31,6 +31,7 @@ typedef struct PgStat_PendingIO
 static PgStat_PendingIO PendingIOStats;
 bool		have_iostats = false;
 
+void		pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Check that stats have not been counted for any combination of IOObject,
@@ -154,11 +155,19 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 PgStat_IO *
-pgstat_fetch_stat_io(void)
+pgstat_fetch_global_stat_io(void)
 {
 	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
 
-	return &pgStatLocal.snapshot.io;
+	return &pgStatLocal.snapshot.global_io;
+}
+
+PgStat_Backend_IO *
+pgstat_fetch_my_stat_io(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
+
+	return &pgStatLocal.snapshot.my_io;
 }
 
 /*
@@ -174,13 +183,16 @@ pgstat_flush_io(bool nowait)
 {
 	LWLock	   *bktype_lock;
 	PgStat_BktypeIO *bktype_shstats;
+	PgStat_BktypeIO *global_bktype_shstats;
 
 	if (!have_iostats)
 		return false;
 
 	bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType];
 	bktype_shstats =
-		&pgStatLocal.shmem->io.stats.stats[MyBackendType];
+		&pgStatLocal.shmem->io.stat[MyProcNumber].stats;
+	global_bktype_shstats =
+		&pgStatLocal.shmem->io.global_stats.stats[MyBackendType];
 
 	if (!nowait)
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
@@ -194,19 +206,28 @@ pgstat_flush_io(bool nowait)
 			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 			{
 				instr_time	time;
+				PgStat_Counter counter;
+
+				counter = PendingIOStats.counts[io_object][io_context][io_op];
 
-				bktype_shstats->counts[io_object][io_context][io_op] +=
-					PendingIOStats.counts[io_object][io_context][io_op];
+				bktype_shstats->counts[io_object][io_context][io_op] += counter;
+
+				global_bktype_shstats->counts[io_object][io_context][io_op] +=
+					counter;
 
 				time = PendingIOStats.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
+
+				global_bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
 
 	Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
+	Assert(pgstat_bktype_io_stats_valid(global_bktype_shstats, MyBackendType));
 
 	LWLockRelease(bktype_lock);
 
@@ -260,13 +281,35 @@ pgstat_io_init_shmem_cb(void *stats)
 		LWLockInitialize(&stat_shmem->locks[i], LWTRANCHE_PGSTATS_DATA);
 }
 
+void
+pgstat_reset_io_counter_internal(int index, TimestampTz ts)
+{
+	LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[0];
+	PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stat[index].stats;
+
+
+	/*
+	 * Use the lock in the first BackendType's PgStat_BktypeIO to protect the
+	 * reset timestamp as well.
+	 */
+	LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+
+	pgStatLocal.shmem->io.stat[index].stat_reset_timestamp = ts;
+
+	memset(bktype_shstats, 0, sizeof(*bktype_shstats));
+	LWLockRelease(bktype_lock);
+}
+
 void
 pgstat_io_reset_all_cb(TimestampTz ts)
 {
+	for (int i = 0; i < NumProcStatSlots; i++)
+		pgstat_reset_io_counter_internal(i, ts);
+
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
 
@@ -275,7 +318,7 @@ pgstat_io_reset_all_cb(TimestampTz ts)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.shmem->io.stats.stat_reset_timestamp = ts;
+			pgStatLocal.shmem->io.global_stats.stat_reset_timestamp = ts;
 
 		memset(bktype_shstats, 0, sizeof(*bktype_shstats));
 		LWLockRelease(bktype_lock);
@@ -285,11 +328,28 @@ pgstat_io_reset_all_cb(TimestampTz ts)
 void
 pgstat_io_snapshot_cb(void)
 {
+	/* First, our own stats */
+	PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stat[MyProcNumber].stats;
+	PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.my_io.stats;
+	LWLock	   *mybktype_lock = &pgStatLocal.shmem->io.locks[0];
+
+	LWLockAcquire(mybktype_lock, LW_SHARED);
+
+	pgStatLocal.snapshot.my_io.stat_reset_timestamp =
+		pgStatLocal.shmem->io.stat[MyProcNumber].stat_reset_timestamp;
+
+	pgStatLocal.snapshot.my_io.bktype =
+		pgStatLocal.shmem->io.stat[MyProcNumber].bktype;
+
+	*bktype_snap = *bktype_shstats;
+	LWLockRelease(mybktype_lock);
+
+	/* Now, the global stats */
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
-		PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io.stats[i];
+		PgStat_BktypeIO *bktype_global_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
+		PgStat_BktypeIO *bktype_global_snap = &pgStatLocal.snapshot.global_io.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_SHARED);
 
@@ -298,11 +358,11 @@ pgstat_io_snapshot_cb(void)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.snapshot.io.stat_reset_timestamp =
-				pgStatLocal.shmem->io.stats.stat_reset_timestamp;
+			pgStatLocal.snapshot.global_io.stat_reset_timestamp =
+				pgStatLocal.shmem->io.global_stats.stat_reset_timestamp;
 
 		/* using struct assignment due to better type safety */
-		*bktype_snap = *bktype_shstats;
+		*bktype_global_snap = *bktype_global_shstats;
 		LWLockRelease(bktype_lock);
 	}
 }
@@ -485,3 +545,10 @@ pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 
 	return true;
 }
+
+void
+pgstat_io_init_backend_cb(void)
+{
+	/* Initialize our backend type in the IO statistics */
+	pgStatLocal.shmem->io.stat[MyProcNumber].bktype = MyBackendType;
+}
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index ec93bf6902..3cfcba9609 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -128,7 +128,7 @@ StatsShmemSize(void)
 {
 	Size		sz;
 
-	sz = MAXALIGN(sizeof(PgStat_ShmemControl));
+	sz = MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_Backend_IO), NumProcStatSlots));
 	sz = add_size(sz, pgstat_dsa_init_size());
 
 	/* Add shared memory for all the custom fixed-numbered statistics */
@@ -172,7 +172,7 @@ StatsShmemInit(void)
 		Assert(!found);
 
 		/* the allocation of pgStatLocal.shmem itself */
-		p += MAXALIGN(sizeof(PgStat_ShmemControl));
+		p += MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_Backend_IO), NumProcStatSlots));
 
 		/*
 		 * Create a small dsa allocation in plain shared memory. This is
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97dc09ac0d..e72dc5daa7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1363,14 +1363,17 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 
-	backends_io_stats = pgstat_fetch_stat_io();
+	backends_io_stats = pgstat_fetch_global_stat_io();
 
 	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
 
 	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
 	{
-		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
-		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
+		Datum		bktype_desc;
+		PgStat_BktypeIO *bktype_stats;
+
+		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		bktype_stats = &backends_io_stats->stats[bktype];
 
 		/*
 		 * In Assert builds, we can afford an extra loop through all of the
@@ -1462,6 +1465,111 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_my_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend_IO *backend_io_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backend_io_stats = pgstat_fetch_my_stat_io();
+
+	bktype = backend_io_stats->bktype;
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_io_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_io_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_io_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ff5436acac..62658e9a00 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5831,6 +5831,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: My backend IO statistics',
+  proname => 'pg_stat_get_my_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => '',
+  proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_my_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a..9d63b1a5b7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
+#include "storage/proc.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -76,6 +77,8 @@
  */
 #define PGSTAT_KIND_EXPERIMENTAL	128
 
+#define  NumProcStatSlots (MaxBackends + NUM_AUXILIARY_PROCS)
+
 static inline bool
 pgstat_is_kind_builtin(PgStat_Kind kind)
 {
@@ -351,6 +354,12 @@ typedef struct PgStat_IO
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend_IO
+{
+	TimestampTz stat_reset_timestamp;
+	BackendType bktype;
+	PgStat_BktypeIO stats;
+} PgStat_Backend_IO;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -559,7 +568,8 @@ extern instr_time pgstat_prepare_io_time(bool track_io_guc);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time, uint32 cnt);
 
-extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern PgStat_IO *pgstat_fetch_global_stat_io(void);
+extern PgStat_Backend_IO *pgstat_fetch_my_stat_io(void);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
@@ -568,7 +578,7 @@ extern bool pgstat_tracks_io_object(BackendType bktype,
 									IOObject io_object, IOContext io_context);
 extern bool pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 								IOContext io_context, IOOp io_op);
-
+extern void pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Functions in pgstat_database.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 25820cbf0a..3b2f2b28dd 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -353,11 +353,12 @@ typedef struct PgStatShared_Checkpointer
 typedef struct PgStatShared_IO
 {
 	/*
-	 * locks[i] protects stats.stats[i]. locks[0] also protects
-	 * stats.stat_reset_timestamp.
+	 * locks[i] protects global_stats.stats[i]. locks[0] also protects
+	 * global_stats.stat_reset_timestamp.
 	 */
 	LWLock		locks[BACKEND_NUM_TYPES];
-	PgStat_IO	stats;
+	PgStat_IO	global_stats;
+	PgStat_Backend_IO	stat[FLEXIBLE_ARRAY_MEMBER];
 } PgStatShared_IO;
 
 typedef struct PgStatShared_SLRU
@@ -450,7 +451,6 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Archiver archiver;
 	PgStatShared_BgWriter bgwriter;
 	PgStatShared_Checkpointer checkpointer;
-	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
 
@@ -460,6 +460,8 @@ typedef struct PgStat_ShmemControl
 	 */
 	void	   *custom_data[PGSTAT_KIND_CUSTOM_SIZE];
 
+	/* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
+	PgStatShared_IO io;
 } PgStat_ShmemControl;
 
 
@@ -481,7 +483,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_CheckpointerStats checkpointer;
 
-	PgStat_IO	io;
+	PgStat_Backend_IO	my_io;
+	PgStat_IO	global_io;
 
 	PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS];
 
@@ -682,6 +685,7 @@ extern bool pgstat_flush_wal(bool nowait);
 extern bool pgstat_have_pending_wal(void);
 
 extern void pgstat_wal_init_backend_cb(void);
+extern void pgstat_io_init_backend_cb(void);
 extern void pgstat_wal_init_shmem_cb(void *stats);
 extern void pgstat_wal_reset_all_cb(TimestampTz ts);
 extern void pgstat_wal_snapshot_cb(void);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae..785298d7b0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1398,6 +1398,25 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_my_stat_io| SELECT backend_type,
+    object,
+    context,
+    reads,
+    read_time,
+    writes,
+    write_time,
+    writebacks,
+    writeback_time,
+    extends,
+    extend_time,
+    op_bytes,
+    hits,
+    evictions,
+    reuses,
+    fsyncs,
+    fsync_time,
+    stats_reset
+   FROM pg_stat_get_my_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..c489e528e0 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1261,9 +1261,14 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1285,16 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1314,23 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1551,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1567,14 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..c95cb71652 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -609,18 +609,26 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,7 +638,13 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
-
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -762,10 +776,15 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
 
 
 -- test BRIN index doesn't block HOT update
-- 
2.34.1

v2-0002-Add-pg_stat_get_backend_io.patchtext/x-diff; charset=us-asciiDownload
From 4e9e50f43a83cbba7caa6824c74df1e0f536cc3e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 28 Aug 2024 12:59:02 +0000
Subject: [PATCH v2 2/2] Add pg_stat_get_backend_io()

Adding the pg_stat_get_backend_io() function to retrieve I/O statistics for
a particular backend pid. Note this function behaves as if stats_fetch_consistency
is set to none.
---
 doc/src/sgml/monitoring.sgml           |  18 ++++
 src/backend/utils/activity/pgstat_io.c |   6 ++
 src/backend/utils/adt/pgstatfuncs.c    | 120 +++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat        |   9 ++
 src/include/pgstat.h                   |   1 +
 src/test/regress/expected/stats.out    |  25 ++++++
 src/test/regress/sql/stats.sql         |  16 +++-
 7 files changed, 194 insertions(+), 1 deletion(-)
  10.9% doc/src/sgml/
  47.1% src/backend/utils/adt/
   9.2% src/include/catalog/
  15.4% src/test/regress/expected/
  14.4% src/test/regress/sql/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b28ca4e0ca..fb908172f8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4770,6 +4770,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. This function behaves as if <varname>stats_fetch_consistency</varname>
+        is set to <literal>none</literal> (means each execution re-fetches
+        counters from shared memory).
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f2b36d8efa..780ba48d1a 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -170,6 +170,12 @@ pgstat_fetch_my_stat_io(void)
 	return &pgStatLocal.snapshot.my_io;
 }
 
+PgStat_Backend_IO *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	return &pgStatLocal.shmem->io.stat[procNumber];
+}
+
 /*
  * Flush out locally pending IO statistics
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index e72dc5daa7..ca22251acd 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1570,6 +1570,126 @@ pg_stat_get_my_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend_IO *backend_io_stats;
+	Datum		reset_time;
+	ProcNumber	procNumber;
+	PGPROC	   *proc;
+	BackendType bktype;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	InitMaterializedSRF(fcinfo, 0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/* Maybe an auxiliary process? */
+	if (proc == NULL)
+		proc = AuxiliaryPidGetProc(backend_pid);
+
+	/* This backend_pid does not exist */
+	if (proc != NULL)
+	{
+		procNumber = GetNumberFromPGProc(proc);
+		backend_io_stats = pgstat_fetch_proc_stat_io(procNumber);
+		bktype = backend_io_stats->bktype;
+		reset_time = TimestampTzGetDatum(backend_io_stats->stat_reset_timestamp);
+
+		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		bktype_stats = &backend_io_stats->stats;
+
+		/*
+		 * In Assert builds, we can afford an extra loop through all of the
+		 * counters checking that only expected stats are non-zero, since it
+		 * keeps the non-Assert code cleaner.
+		 */
+		Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+		for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+		{
+			const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+			for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+			{
+				const char *context_name = pgstat_get_io_context_name(io_context);
+
+				Datum		values[IO_NUM_COLUMNS] = {0};
+				bool		nulls[IO_NUM_COLUMNS] = {0};
+
+				/*
+				 * Some combinations of BackendType, IOObject, and IOContext
+				 * are not valid for any type of IOOp. In such cases, omit the
+				 * entire row from the view.
+				 */
+				if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+					continue;
+
+				values[IO_COL_BACKEND_TYPE] = bktype_desc;
+				values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+				values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+				if (backend_io_stats->stat_reset_timestamp != 0)
+					values[IO_COL_RESET_TIME] = reset_time;
+				else
+					nulls[IO_COL_RESET_TIME] = true;
+
+				/*
+				 * Hard-code this to the value of BLCKSZ for now. Future
+				 * values could include XLOG_BLCKSZ, once WAL IO is tracked,
+				 * and constant multipliers, once non-block-oriented IO (e.g.
+				 * temporary file IO) is tracked.
+				 */
+				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+				for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+				{
+					int			op_idx = pgstat_get_io_op_index(io_op);
+					int			time_idx = pgstat_get_io_time_index(io_op);
+
+					/*
+					 * Some combinations of BackendType and IOOp, of IOContext
+					 * and IOOp, and of IOObject and IOOp are not tracked. Set
+					 * these cells in the view NULL.
+					 */
+					if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+					{
+						PgStat_Counter count =
+							bktype_stats->counts[io_obj][io_context][io_op];
+
+						values[op_idx] = Int64GetDatum(count);
+					}
+					else
+						nulls[op_idx] = true;
+
+					/* not every operation is timed */
+					if (time_idx == IO_COL_INVALID)
+						continue;
+
+					if (!nulls[op_idx])
+					{
+						PgStat_Counter time =
+							bktype_stats->times[io_obj][io_context][io_op];
+
+						values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+					}
+					else
+						nulls[time_idx] = true;
+				}
+
+				tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+									 values, nulls);
+			}
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62658e9a00..0a540a590f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5840,6 +5840,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_my_io' },
 
+{ oid => '8896', descr => 'statistics: per backend type IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '30', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9d63b1a5b7..2a2b4a2947 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -570,6 +570,7 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 
 extern PgStat_IO *pgstat_fetch_global_stat_io(void);
 extern PgStat_Backend_IO *pgstat_fetch_my_stat_io(void);
+extern PgStat_Backend_IO *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index c489e528e0..aced015c2f 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1263,12 +1263,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1293,6 +1299,15 @@ SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
@@ -1553,6 +1568,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1575,6 +1592,14 @@ SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index c95cb71652..d05009e1f5 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -611,12 +611,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -626,6 +632,10 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 SELECT sum(extends) AS my_io_sum_shared_after_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
@@ -778,6 +788,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
@@ -785,7 +797,9 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
   FROM pg_my_stat_io \gset
 SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
-
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
 
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
-- 
2.34.1

#9Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#8)
2 attachment(s)
Re: per backend I/O statistics

Hi,

On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote:

- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).

Please find attached v3, mandatory rebase due to fc415edf8c.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 335949f1c408a99aed74c0231d5e436b5f319746 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 22 Aug 2024 15:16:50 +0000
Subject: [PATCH v3 1/2] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds a new
pg_my_stat_io view to display "my" backend I/O statistics. The KIND_IO stats
are still "fixed amount" ones as the maximum number of backend is fixed.

The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build in each backend to copy all
the others backends stats, as 1/ there is no use case (there is no need to get
others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.

A subsequent commit will add a new pg_stat_get_backend_io() function to be
able to retrieve the I/O statistics for a given backend pid (each execution
re-fetches counters from shared memory because as stated above there is no
snapshots being created in each backend to copy the other backends stats).

Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                  |   4 +-
 doc/src/sgml/monitoring.sgml              |  28 ++++++
 src/backend/catalog/system_views.sql      |  22 +++++
 src/backend/utils/activity/pgstat.c       |  11 ++-
 src/backend/utils/activity/pgstat_io.c    |  91 ++++++++++++++---
 src/backend/utils/activity/pgstat_shmem.c |   4 +-
 src/backend/utils/adt/pgstatfuncs.c       | 114 +++++++++++++++++++++-
 src/include/catalog/pg_proc.dat           |   9 ++
 src/include/pgstat.h                      |  14 ++-
 src/include/utils/pgstat_internal.h       |  14 ++-
 src/test/regress/expected/rules.out       |  19 ++++
 src/test/regress/expected/stats.out       |  44 ++++++++-
 src/test/regress/sql/stats.sql            |  25 ++++-
 13 files changed, 366 insertions(+), 33 deletions(-)
   8.5% doc/src/sgml/
  29.4% src/backend/utils/activity/
  23.9% src/backend/utils/adt/
   4.4% src/include/catalog/
   3.1% src/include/utils/
   3.1% src/include/
  14.4% src/test/regress/expected/
  10.0% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0aec11f443..2a59b97093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8331,7 +8331,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
+        <structname>pg_stat_io</structname></link>,
+        <link linkend="monitoring-pg-my-stat-io-view">
+        <structname>pg_my_stat_io</structname></link>, in the output of
         <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
         is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 933de6fe07..b28ca4e0ca 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -488,6 +488,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_my_stat_io</structname><indexterm><primary>pg_my_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of context and target object containing
+       my backend I/O statistics.
+       See <link linkend="monitoring-pg-my-stat-io-view">
+       <structname>pg_my_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -2948,6 +2958,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
 
 
+ </sect2>
+
+ <sect2 id="monitoring-pg-my-stat-io-view">
+  <title><structname>pg_my_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_my_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_my_stat_io</structname> view will contain one row for each
+   combination of target I/O object and I/O context, showing
+   my backend I/O statistics. Combinations which do not make sense are
+   omitted. The fields are exactly the same as the ones in the <link
+   linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+   view.
+  </para>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-bgwriter-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a1..b4939b7bc6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1168,6 +1168,28 @@ SELECT
        b.stats_reset
 FROM pg_stat_get_io() b;
 
+CREATE VIEW pg_my_stat_io AS
+SELECT
+       b.backend_type,
+       b.object,
+       b.context,
+       b.reads,
+       b.read_time,
+       b.writes,
+       b.write_time,
+       b.writebacks,
+       b.writeback_time,
+       b.extends,
+       b.extend_time,
+       b.op_bytes,
+       b.hits,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.fsync_time,
+       b.stats_reset
+FROM pg_stat_get_my_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index a7f2dfc744..deb30cd2fc 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -406,10 +406,12 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 
 		.fixed_amount = true,
 
-		.snapshot_ctl_off = offsetof(PgStat_Snapshot, io),
+		.init_backend_cb = pgstat_io_init_backend_cb,
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, global_io),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, io),
-		.shared_data_off = offsetof(PgStatShared_IO, stats),
-		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
+		/* [de-]serialize global_stats only */
+		.shared_data_off = offsetof(PgStatShared_IO, global_stats),
+		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->global_stats),
 
 		.flush_fixed_cb = pgstat_io_flush_cb,
 		.have_fixed_pending_cb = pgstat_io_have_pending_cb,
@@ -587,6 +589,9 @@ pgstat_shutdown_hook(int code, Datum arg)
 
 	pgstat_report_stat(true);
 
+	/* reset the pgstat_io counter related to this proc number */
+	pgstat_reset_io_counter_internal(MyProcNumber, GetCurrentTimestamp());
+
 	/* there shouldn't be any pending changes left */
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index cc2ffc78aa..611ca21720 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -31,6 +31,7 @@ typedef struct PgStat_PendingIO
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
 
+void		pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Check that stats have not been counted for any combination of IOObject,
@@ -154,11 +155,19 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 PgStat_IO *
-pgstat_fetch_stat_io(void)
+pgstat_fetch_global_stat_io(void)
 {
 	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
 
-	return &pgStatLocal.snapshot.io;
+	return &pgStatLocal.snapshot.global_io;
+}
+
+PgStat_Backend_IO *
+pgstat_fetch_my_stat_io(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
+
+	return &pgStatLocal.snapshot.my_io;
 }
 
 /*
@@ -192,13 +201,16 @@ pgstat_io_flush_cb(bool nowait)
 {
 	LWLock	   *bktype_lock;
 	PgStat_BktypeIO *bktype_shstats;
+	PgStat_BktypeIO *global_bktype_shstats;
 
 	if (!have_iostats)
 		return false;
 
 	bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType];
 	bktype_shstats =
-		&pgStatLocal.shmem->io.stats.stats[MyBackendType];
+		&pgStatLocal.shmem->io.stat[MyProcNumber].stats;
+	global_bktype_shstats =
+		&pgStatLocal.shmem->io.global_stats.stats[MyBackendType];
 
 	if (!nowait)
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
@@ -212,19 +224,28 @@ pgstat_io_flush_cb(bool nowait)
 			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 			{
 				instr_time	time;
+				PgStat_Counter counter;
+
+				counter = PendingIOStats.counts[io_object][io_context][io_op];
 
-				bktype_shstats->counts[io_object][io_context][io_op] +=
-					PendingIOStats.counts[io_object][io_context][io_op];
+				bktype_shstats->counts[io_object][io_context][io_op] += counter;
+
+				global_bktype_shstats->counts[io_object][io_context][io_op] +=
+					counter;
 
 				time = PendingIOStats.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
+
+				global_bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
 
 	Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
+	Assert(pgstat_bktype_io_stats_valid(global_bktype_shstats, MyBackendType));
 
 	LWLockRelease(bktype_lock);
 
@@ -278,13 +299,35 @@ pgstat_io_init_shmem_cb(void *stats)
 		LWLockInitialize(&stat_shmem->locks[i], LWTRANCHE_PGSTATS_DATA);
 }
 
+void
+pgstat_reset_io_counter_internal(int index, TimestampTz ts)
+{
+	LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[0];
+	PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stat[index].stats;
+
+
+	/*
+	 * Use the lock in the first BackendType's PgStat_BktypeIO to protect the
+	 * reset timestamp as well.
+	 */
+	LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+
+	pgStatLocal.shmem->io.stat[index].stat_reset_timestamp = ts;
+
+	memset(bktype_shstats, 0, sizeof(*bktype_shstats));
+	LWLockRelease(bktype_lock);
+}
+
 void
 pgstat_io_reset_all_cb(TimestampTz ts)
 {
+	for (int i = 0; i < NumProcStatSlots; i++)
+		pgstat_reset_io_counter_internal(i, ts);
+
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
 
@@ -293,7 +336,7 @@ pgstat_io_reset_all_cb(TimestampTz ts)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.shmem->io.stats.stat_reset_timestamp = ts;
+			pgStatLocal.shmem->io.global_stats.stat_reset_timestamp = ts;
 
 		memset(bktype_shstats, 0, sizeof(*bktype_shstats));
 		LWLockRelease(bktype_lock);
@@ -303,11 +346,28 @@ pgstat_io_reset_all_cb(TimestampTz ts)
 void
 pgstat_io_snapshot_cb(void)
 {
+	/* First, our own stats */
+	PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stat[MyProcNumber].stats;
+	PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.my_io.stats;
+	LWLock	   *mybktype_lock = &pgStatLocal.shmem->io.locks[0];
+
+	LWLockAcquire(mybktype_lock, LW_SHARED);
+
+	pgStatLocal.snapshot.my_io.stat_reset_timestamp =
+		pgStatLocal.shmem->io.stat[MyProcNumber].stat_reset_timestamp;
+
+	pgStatLocal.snapshot.my_io.bktype =
+		pgStatLocal.shmem->io.stat[MyProcNumber].bktype;
+
+	*bktype_snap = *bktype_shstats;
+	LWLockRelease(mybktype_lock);
+
+	/* Now, the global stats */
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
-		PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io.stats[i];
+		PgStat_BktypeIO *bktype_global_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
+		PgStat_BktypeIO *bktype_global_snap = &pgStatLocal.snapshot.global_io.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_SHARED);
 
@@ -316,11 +376,11 @@ pgstat_io_snapshot_cb(void)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.snapshot.io.stat_reset_timestamp =
-				pgStatLocal.shmem->io.stats.stat_reset_timestamp;
+			pgStatLocal.snapshot.global_io.stat_reset_timestamp =
+				pgStatLocal.shmem->io.global_stats.stat_reset_timestamp;
 
 		/* using struct assignment due to better type safety */
-		*bktype_snap = *bktype_shstats;
+		*bktype_global_snap = *bktype_global_shstats;
 		LWLockRelease(bktype_lock);
 	}
 }
@@ -503,3 +563,10 @@ pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 
 	return true;
 }
+
+void
+pgstat_io_init_backend_cb(void)
+{
+	/* Initialize our backend type in the IO statistics */
+	pgStatLocal.shmem->io.stat[MyProcNumber].bktype = MyBackendType;
+}
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index ec93bf6902..3cfcba9609 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -128,7 +128,7 @@ StatsShmemSize(void)
 {
 	Size		sz;
 
-	sz = MAXALIGN(sizeof(PgStat_ShmemControl));
+	sz = MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_Backend_IO), NumProcStatSlots));
 	sz = add_size(sz, pgstat_dsa_init_size());
 
 	/* Add shared memory for all the custom fixed-numbered statistics */
@@ -172,7 +172,7 @@ StatsShmemInit(void)
 		Assert(!found);
 
 		/* the allocation of pgStatLocal.shmem itself */
-		p += MAXALIGN(sizeof(PgStat_ShmemControl));
+		p += MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_Backend_IO), NumProcStatSlots));
 
 		/*
 		 * Create a small dsa allocation in plain shared memory. This is
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 33c7b25560..4c00570396 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1363,14 +1363,17 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 
-	backends_io_stats = pgstat_fetch_stat_io();
+	backends_io_stats = pgstat_fetch_global_stat_io();
 
 	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
 
 	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
 	{
-		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
-		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
+		Datum		bktype_desc;
+		PgStat_BktypeIO *bktype_stats;
+
+		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		bktype_stats = &backends_io_stats->stats[bktype];
 
 		/*
 		 * In Assert builds, we can afford an extra loop through all of the
@@ -1462,6 +1465,111 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_my_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend_IO *backend_io_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backend_io_stats = pgstat_fetch_my_stat_io();
+
+	bktype = backend_io_stats->bktype;
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_io_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_io_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_io_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ff5436acac..62658e9a00 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5831,6 +5831,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: My backend IO statistics',
+  proname => 'pg_stat_get_my_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => '',
+  proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_my_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a..9d63b1a5b7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
+#include "storage/proc.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -76,6 +77,8 @@
  */
 #define PGSTAT_KIND_EXPERIMENTAL	128
 
+#define  NumProcStatSlots (MaxBackends + NUM_AUXILIARY_PROCS)
+
 static inline bool
 pgstat_is_kind_builtin(PgStat_Kind kind)
 {
@@ -351,6 +354,12 @@ typedef struct PgStat_IO
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend_IO
+{
+	TimestampTz stat_reset_timestamp;
+	BackendType bktype;
+	PgStat_BktypeIO stats;
+} PgStat_Backend_IO;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -559,7 +568,8 @@ extern instr_time pgstat_prepare_io_time(bool track_io_guc);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time, uint32 cnt);
 
-extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern PgStat_IO *pgstat_fetch_global_stat_io(void);
+extern PgStat_Backend_IO *pgstat_fetch_my_stat_io(void);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
@@ -568,7 +578,7 @@ extern bool pgstat_tracks_io_object(BackendType bktype,
 									IOObject io_object, IOContext io_context);
 extern bool pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 								IOContext io_context, IOOp io_op);
-
+extern void pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Functions in pgstat_database.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index bba90e898d..ea4cc6a2c6 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -366,11 +366,12 @@ typedef struct PgStatShared_Checkpointer
 typedef struct PgStatShared_IO
 {
 	/*
-	 * locks[i] protects stats.stats[i]. locks[0] also protects
-	 * stats.stat_reset_timestamp.
+	 * locks[i] protects global_stats.stats[i]. locks[0] also protects
+	 * global_stats.stat_reset_timestamp.
 	 */
 	LWLock		locks[BACKEND_NUM_TYPES];
-	PgStat_IO	stats;
+	PgStat_IO	global_stats;
+	PgStat_Backend_IO	stat[FLEXIBLE_ARRAY_MEMBER];
 } PgStatShared_IO;
 
 typedef struct PgStatShared_SLRU
@@ -463,7 +464,6 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Archiver archiver;
 	PgStatShared_BgWriter bgwriter;
 	PgStatShared_Checkpointer checkpointer;
-	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
 
@@ -473,6 +473,8 @@ typedef struct PgStat_ShmemControl
 	 */
 	void	   *custom_data[PGSTAT_KIND_CUSTOM_SIZE];
 
+	/* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
+	PgStatShared_IO io;
 } PgStat_ShmemControl;
 
 
@@ -494,7 +496,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_CheckpointerStats checkpointer;
 
-	PgStat_IO	io;
+	PgStat_Backend_IO	my_io;
+	PgStat_IO	global_io;
 
 	PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS];
 
@@ -629,6 +632,7 @@ extern bool pgstat_io_flush_cb(bool nowait);
 extern void pgstat_io_init_shmem_cb(void *stats);
 extern void pgstat_io_reset_all_cb(TimestampTz ts);
 extern void pgstat_io_snapshot_cb(void);
+extern void pgstat_io_init_backend_cb(void);
 
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae..785298d7b0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1398,6 +1398,25 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_my_stat_io| SELECT backend_type,
+    object,
+    context,
+    reads,
+    read_time,
+    writes,
+    write_time,
+    writebacks,
+    writeback_time,
+    extends,
+    extend_time,
+    op_bytes,
+    hits,
+    evictions,
+    reuses,
+    fsyncs,
+    fsync_time,
+    stats_reset
+   FROM pg_stat_get_my_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..c489e528e0 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1261,9 +1261,14 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1285,16 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1314,23 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1551,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1567,14 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..c95cb71652 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -609,18 +609,26 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,7 +638,13 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
-
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -762,10 +776,15 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
 
 
 -- test BRIN index doesn't block HOT update
-- 
2.34.1

v3-0002-Add-pg_stat_get_backend_io.patchtext/x-diff; charset=us-asciiDownload
From b5719ca7a1c6257929b71c45ba8cb9da3bd53665 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 28 Aug 2024 12:59:02 +0000
Subject: [PATCH v3 2/2] Add pg_stat_get_backend_io()

Adding the pg_stat_get_backend_io() function to retrieve I/O statistics for
a particular backend pid. Note this function behaves as if stats_fetch_consistency
is set to none.
---
 doc/src/sgml/monitoring.sgml           |  18 ++++
 src/backend/utils/activity/pgstat_io.c |   6 ++
 src/backend/utils/adt/pgstatfuncs.c    | 120 +++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat        |   9 ++
 src/include/pgstat.h                   |   1 +
 src/test/regress/expected/stats.out    |  25 ++++++
 src/test/regress/sql/stats.sql         |  16 +++-
 7 files changed, 194 insertions(+), 1 deletion(-)
  10.9% doc/src/sgml/
  47.1% src/backend/utils/adt/
   9.2% src/include/catalog/
  15.4% src/test/regress/expected/
  14.4% src/test/regress/sql/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b28ca4e0ca..fb908172f8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4770,6 +4770,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. This function behaves as if <varname>stats_fetch_consistency</varname>
+        is set to <literal>none</literal> (means each execution re-fetches
+        counters from shared memory).
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 611ca21720..cf6a49cf48 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -170,6 +170,12 @@ pgstat_fetch_my_stat_io(void)
 	return &pgStatLocal.snapshot.my_io;
 }
 
+PgStat_Backend_IO *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	return &pgStatLocal.shmem->io.stat[procNumber];
+}
+
 /*
  * Check if there any IO stats waiting for flush.
  */
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 4c00570396..795ff4e2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1570,6 +1570,126 @@ pg_stat_get_my_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend_IO *backend_io_stats;
+	Datum		reset_time;
+	ProcNumber	procNumber;
+	PGPROC	   *proc;
+	BackendType bktype;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	InitMaterializedSRF(fcinfo, 0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/* Maybe an auxiliary process? */
+	if (proc == NULL)
+		proc = AuxiliaryPidGetProc(backend_pid);
+
+	/* This backend_pid does not exist */
+	if (proc != NULL)
+	{
+		procNumber = GetNumberFromPGProc(proc);
+		backend_io_stats = pgstat_fetch_proc_stat_io(procNumber);
+		bktype = backend_io_stats->bktype;
+		reset_time = TimestampTzGetDatum(backend_io_stats->stat_reset_timestamp);
+
+		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		bktype_stats = &backend_io_stats->stats;
+
+		/*
+		 * In Assert builds, we can afford an extra loop through all of the
+		 * counters checking that only expected stats are non-zero, since it
+		 * keeps the non-Assert code cleaner.
+		 */
+		Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+		for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+		{
+			const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+			for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+			{
+				const char *context_name = pgstat_get_io_context_name(io_context);
+
+				Datum		values[IO_NUM_COLUMNS] = {0};
+				bool		nulls[IO_NUM_COLUMNS] = {0};
+
+				/*
+				 * Some combinations of BackendType, IOObject, and IOContext
+				 * are not valid for any type of IOOp. In such cases, omit the
+				 * entire row from the view.
+				 */
+				if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+					continue;
+
+				values[IO_COL_BACKEND_TYPE] = bktype_desc;
+				values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+				values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+				if (backend_io_stats->stat_reset_timestamp != 0)
+					values[IO_COL_RESET_TIME] = reset_time;
+				else
+					nulls[IO_COL_RESET_TIME] = true;
+
+				/*
+				 * Hard-code this to the value of BLCKSZ for now. Future
+				 * values could include XLOG_BLCKSZ, once WAL IO is tracked,
+				 * and constant multipliers, once non-block-oriented IO (e.g.
+				 * temporary file IO) is tracked.
+				 */
+				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+				for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+				{
+					int			op_idx = pgstat_get_io_op_index(io_op);
+					int			time_idx = pgstat_get_io_time_index(io_op);
+
+					/*
+					 * Some combinations of BackendType and IOOp, of IOContext
+					 * and IOOp, and of IOObject and IOOp are not tracked. Set
+					 * these cells in the view NULL.
+					 */
+					if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+					{
+						PgStat_Counter count =
+							bktype_stats->counts[io_obj][io_context][io_op];
+
+						values[op_idx] = Int64GetDatum(count);
+					}
+					else
+						nulls[op_idx] = true;
+
+					/* not every operation is timed */
+					if (time_idx == IO_COL_INVALID)
+						continue;
+
+					if (!nulls[op_idx])
+					{
+						PgStat_Counter time =
+							bktype_stats->times[io_obj][io_context][io_op];
+
+						values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+					}
+					else
+						nulls[time_idx] = true;
+				}
+
+				tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+									 values, nulls);
+			}
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 62658e9a00..0a540a590f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5840,6 +5840,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_my_io' },
 
+{ oid => '8896', descr => 'statistics: per backend type IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '30', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9d63b1a5b7..2a2b4a2947 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -570,6 +570,7 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 
 extern PgStat_IO *pgstat_fetch_global_stat_io(void);
 extern PgStat_Backend_IO *pgstat_fetch_my_stat_io(void);
+extern PgStat_Backend_IO *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index c489e528e0..aced015c2f 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1263,12 +1263,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1293,6 +1299,15 @@ SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
@@ -1553,6 +1568,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1575,6 +1592,14 @@ SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index c95cb71652..d05009e1f5 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -611,12 +611,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -626,6 +632,10 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 SELECT sum(extends) AS my_io_sum_shared_after_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
@@ -778,6 +788,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
@@ -785,7 +797,9 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
   FROM pg_my_stat_io \gset
 SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
-
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
 
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
-- 
2.34.1

#10Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: per backend I/O statistics

Hi,

On Fri, Sep 06, 2024 at 03:03:17PM +0000, Bertrand Drouvot wrote:

The struct size is 1040 Bytes and that's much more reasonable than the size
needed for per backend I/O stats in v1 (about 16KB).

One could think that's a high increase of shared memory usage with a high
number of connections (due to the fact that it's implemented as "fixed amount"
stats based on the max number of backends).

So out of curiosity, I did some measurements with:

dynamic_shared_memory_type=sysv
shared_memory_type=sysv
max_connections=20000

On my lab, ipcs -m gives me:

- on master

key shmid owner perms bytes nattch status
0x00113a04 51347487 postgres 600 1149394944 6
0x4bc9f2fa 51347488 postgres 600 4006976 6
0x46790800 51347489 postgres 600 1048576 2

- with v3 applied

key shmid owner perms bytes nattch status
0x00113a04 51347477 postgres 600 1170227200 6
0x08e04b0a 51347478 postgres 600 4006976 6
0x74688c9c 51347479 postgres 600 1048576 2

So, with 20000 max_connections (not advocating that's a reasonable number in
all the cases), that's a 1170227200 - 1149394944 = 20 MB increase of shared
memory (expected with 20K max_connections and the new struct size of 1040 Bytes)
as compare to master which is 1096 MB. It means that v3 produces about 2% shared
memory increase with 20000 max_connections.

Out of curiosity with max_connections=100000, then:

- on master:

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00113a04 52953134 postgres 600 5053915136 6
0x37abf5ce 52953135 postgres 600 20006976 6
0x71c07d5c 52953136 postgres 600 1048576 2

- with v3:

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00113a04 52953144 postgres 600 5157945344 6
0x7afb24de 52953145 postgres 600 20006976 6
0x2695af58 52953146 postgres 600 1048576 2

So, with 100000 max_connections (not advocating that's a reasonable number in
all the cases), that's a 5157945344 - 5053915136 = 100 MB increase of shared
memory (expected with 100K max_connections and the new struct size of 1040 Bytes)
as compare to master which is about 4800 MB. It means that v3 produces about
2% shared memory increase with 100000 max_connections.

Then, based on those numbers, I think that the "fixed amount" approach is a good
one as 1/ the amount of shared memory increase is "relatively" small and 2/ most
probably provides performance benefits as compare to the "non fixed" approach.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Bertrand Drouvot (#8)
Re: per backend I/O statistics

Hi,

Thanks for working on this!

Your patch applies and builds cleanly.

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

- As stated up-thread, the pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory). Indeed, the snapshot is not build in each backend to copy
all the others backends stats, as 1/ there is no use case (there is no need to
get others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.

I believe this is the correct approach.

I manually tested your patches, and they work as expected. Here is
some feedback:

- The stats_reset column is NULL in both pg_my_stat_io and
pg_stat_get_backend_io() until the first call to reset io statistics.
While I'm not sure if it's necessary, it might be useful to display
the more recent of the two times in the stats_reset column: the
statistics reset time or the backend creation time.

- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#12Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nazir Bilal Yavuz (#11)
2 attachment(s)
Re: per backend I/O statistics

Hi,

On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:

Hi,

Thanks for working on this!

Your patch applies and builds cleanly.

Thanks for looking at it!

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

- As stated up-thread, the pg_stat_get_backend_io() behaves as if
stats_fetch_consistency is set to none (each execution re-fetches counters
from shared memory). Indeed, the snapshot is not build in each backend to copy
all the others backends stats, as 1/ there is no use case (there is no need to
get others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.

I believe this is the correct approach.

Thanks for sharing your thoughts.

I manually tested your patches, and they work as expected. Here is
some feedback:

- The stats_reset column is NULL in both pg_my_stat_io and
pg_stat_get_backend_io() until the first call to reset io statistics.
While I'm not sure if it's necessary, it might be useful to display
the more recent of the two times in the stats_reset column: the
statistics reset time or the backend creation time.

I'm not sure about that as one can already get the backend "creation time"
through pg_stat_activity.backend_start.

- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.

doh! Nice catch, thanks!

And also new backends that are not re-using a previous "existing" process slot
are getting the last reset time (if any). So the right place to fix this is in
pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
right "conceptually" speaking).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 2bc490060ce8ed6557ca5a586f36cd43e7480d8e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 22 Aug 2024 15:16:50 +0000
Subject: [PATCH v4 1/2] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds a new
pg_my_stat_io view to display "my" backend I/O statistics. The KIND_IO stats
are still "fixed amount" ones as the maximum number of backend is fixed.

The statistics snapshot is made for the global stats (the aggregated ones) and
for my backend stats. The snapshot is not build in each backend to copy all
the others backends stats, as 1/ there is no use case (there is no need to get
others backends I/O statistics while taking care of the stats_fetch_consistency)
and 2/ that could be memory expensive depending of the number of max connections.

A subsequent commit will add a new pg_stat_get_backend_io() function to be
able to retrieve the I/O statistics for a given backend pid (each execution
re-fetches counters from shared memory because as stated above there is no
snapshots being created in each backend to copy the other backends stats).

XXX: Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                  |   4 +-
 doc/src/sgml/monitoring.sgml              |  28 ++++++
 src/backend/catalog/system_views.sql      |  22 +++++
 src/backend/utils/activity/pgstat.c       |  11 ++-
 src/backend/utils/activity/pgstat_io.c    |  93 +++++++++++++++---
 src/backend/utils/activity/pgstat_shmem.c |   4 +-
 src/backend/utils/adt/pgstatfuncs.c       | 114 +++++++++++++++++++++-
 src/include/catalog/pg_proc.dat           |   9 ++
 src/include/pgstat.h                      |  14 ++-
 src/include/utils/pgstat_internal.h       |  14 ++-
 src/test/regress/expected/rules.out       |  19 ++++
 src/test/regress/expected/stats.out       |  44 ++++++++-
 src/test/regress/sql/stats.sql            |  25 ++++-
 13 files changed, 368 insertions(+), 33 deletions(-)
   8.4% doc/src/sgml/
  29.9% src/backend/utils/activity/
  23.7% src/backend/utils/adt/
   4.4% src/include/catalog/
   3.1% src/include/utils/
   3.1% src/include/
  14.3% src/test/regress/expected/
   9.9% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0aec11f443..2a59b97093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8331,7 +8331,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
+        <structname>pg_stat_io</structname></link>,
+        <link linkend="monitoring-pg-my-stat-io-view">
+        <structname>pg_my_stat_io</structname></link>, in the output of
         <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
         is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 933de6fe07..b28ca4e0ca 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -488,6 +488,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_my_stat_io</structname><indexterm><primary>pg_my_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of context and target object containing
+       my backend I/O statistics.
+       See <link linkend="monitoring-pg-my-stat-io-view">
+       <structname>pg_my_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -2948,6 +2958,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
 
 
 
+ </sect2>
+
+ <sect2 id="monitoring-pg-my-stat-io-view">
+  <title><structname>pg_my_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_my_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_my_stat_io</structname> view will contain one row for each
+   combination of target I/O object and I/O context, showing
+   my backend I/O statistics. Combinations which do not make sense are
+   omitted. The fields are exactly the same as the ones in the <link
+   linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+   view.
+  </para>
+
  </sect2>
 
  <sect2 id="monitoring-pg-stat-bgwriter-view">
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 7fd5d256a1..b4939b7bc6 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1168,6 +1168,28 @@ SELECT
        b.stats_reset
 FROM pg_stat_get_io() b;
 
+CREATE VIEW pg_my_stat_io AS
+SELECT
+       b.backend_type,
+       b.object,
+       b.context,
+       b.reads,
+       b.read_time,
+       b.writes,
+       b.write_time,
+       b.writebacks,
+       b.writeback_time,
+       b.extends,
+       b.extend_time,
+       b.op_bytes,
+       b.hits,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.fsync_time,
+       b.stats_reset
+FROM pg_stat_get_my_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index a7f2dfc744..fda4ca9a4a 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -406,10 +406,12 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 
 		.fixed_amount = true,
 
-		.snapshot_ctl_off = offsetof(PgStat_Snapshot, io),
+		.init_backend_cb = pgstat_io_init_backend_cb,
+		.snapshot_ctl_off = offsetof(PgStat_Snapshot, global_io),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, io),
-		.shared_data_off = offsetof(PgStatShared_IO, stats),
-		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
+		/* [de-]serialize global_stats only */
+		.shared_data_off = offsetof(PgStatShared_IO, global_stats),
+		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->global_stats),
 
 		.flush_fixed_cb = pgstat_io_flush_cb,
 		.have_fixed_pending_cb = pgstat_io_have_pending_cb,
@@ -587,6 +589,9 @@ pgstat_shutdown_hook(int code, Datum arg)
 
 	pgstat_report_stat(true);
 
+	/* reset the pgstat_io counter related to this proc number */
+	pgstat_reset_io_counter_internal(MyProcNumber, 0);
+
 	/* there shouldn't be any pending changes left */
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index cc2ffc78aa..d32121fc1f 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -31,6 +31,7 @@ typedef struct PgStat_PendingIO
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
 
+void		pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Check that stats have not been counted for any combination of IOObject,
@@ -154,11 +155,19 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 PgStat_IO *
-pgstat_fetch_stat_io(void)
+pgstat_fetch_global_stat_io(void)
 {
 	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
 
-	return &pgStatLocal.snapshot.io;
+	return &pgStatLocal.snapshot.global_io;
+}
+
+PgStat_Backend_IO *
+pgstat_fetch_my_stat_io(void)
+{
+	pgstat_snapshot_fixed(PGSTAT_KIND_IO);
+
+	return &pgStatLocal.snapshot.my_io;
 }
 
 /*
@@ -192,13 +201,16 @@ pgstat_io_flush_cb(bool nowait)
 {
 	LWLock	   *bktype_lock;
 	PgStat_BktypeIO *bktype_shstats;
+	PgStat_BktypeIO *global_bktype_shstats;
 
 	if (!have_iostats)
 		return false;
 
 	bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType];
 	bktype_shstats =
-		&pgStatLocal.shmem->io.stats.stats[MyBackendType];
+		&pgStatLocal.shmem->io.stat[MyProcNumber].stats;
+	global_bktype_shstats =
+		&pgStatLocal.shmem->io.global_stats.stats[MyBackendType];
 
 	if (!nowait)
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
@@ -212,19 +224,28 @@ pgstat_io_flush_cb(bool nowait)
 			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
 			{
 				instr_time	time;
+				PgStat_Counter counter;
+
+				counter = PendingIOStats.counts[io_object][io_context][io_op];
 
-				bktype_shstats->counts[io_object][io_context][io_op] +=
-					PendingIOStats.counts[io_object][io_context][io_op];
+				bktype_shstats->counts[io_object][io_context][io_op] += counter;
+
+				global_bktype_shstats->counts[io_object][io_context][io_op] +=
+					counter;
 
 				time = PendingIOStats.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
+
+				global_bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
 
 	Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
+	Assert(pgstat_bktype_io_stats_valid(global_bktype_shstats, MyBackendType));
 
 	LWLockRelease(bktype_lock);
 
@@ -278,13 +299,35 @@ pgstat_io_init_shmem_cb(void *stats)
 		LWLockInitialize(&stat_shmem->locks[i], LWTRANCHE_PGSTATS_DATA);
 }
 
+void
+pgstat_reset_io_counter_internal(int index, TimestampTz ts)
+{
+	LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[0];
+	PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stat[index].stats;
+
+
+	/*
+	 * Use the lock in the first BackendType's PgStat_BktypeIO to protect the
+	 * reset timestamp as well.
+	 */
+	LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
+
+	pgStatLocal.shmem->io.stat[index].stat_reset_timestamp = ts;
+
+	memset(bktype_shstats, 0, sizeof(*bktype_shstats));
+	LWLockRelease(bktype_lock);
+}
+
 void
 pgstat_io_reset_all_cb(TimestampTz ts)
 {
+	for (int i = 0; i < NumProcStatSlots; i++)
+		pgstat_reset_io_counter_internal(i, ts);
+
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
+		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
 
@@ -293,7 +336,7 @@ pgstat_io_reset_all_cb(TimestampTz ts)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.shmem->io.stats.stat_reset_timestamp = ts;
+			pgStatLocal.shmem->io.global_stats.stat_reset_timestamp = ts;
 
 		memset(bktype_shstats, 0, sizeof(*bktype_shstats));
 		LWLockRelease(bktype_lock);
@@ -303,11 +346,28 @@ pgstat_io_reset_all_cb(TimestampTz ts)
 void
 pgstat_io_snapshot_cb(void)
 {
+	/* First, our own stats */
+	PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stat[MyProcNumber].stats;
+	PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.my_io.stats;
+	LWLock	   *mybktype_lock = &pgStatLocal.shmem->io.locks[0];
+
+	LWLockAcquire(mybktype_lock, LW_SHARED);
+
+	pgStatLocal.snapshot.my_io.stat_reset_timestamp =
+		pgStatLocal.shmem->io.stat[MyProcNumber].stat_reset_timestamp;
+
+	pgStatLocal.snapshot.my_io.bktype =
+		pgStatLocal.shmem->io.stat[MyProcNumber].bktype;
+
+	*bktype_snap = *bktype_shstats;
+	LWLockRelease(mybktype_lock);
+
+	/* Now, the global stats */
 	for (int i = 0; i < BACKEND_NUM_TYPES; i++)
 	{
 		LWLock	   *bktype_lock = &pgStatLocal.shmem->io.locks[i];
-		PgStat_BktypeIO *bktype_shstats = &pgStatLocal.shmem->io.stats.stats[i];
-		PgStat_BktypeIO *bktype_snap = &pgStatLocal.snapshot.io.stats[i];
+		PgStat_BktypeIO *bktype_global_shstats = &pgStatLocal.shmem->io.global_stats.stats[i];
+		PgStat_BktypeIO *bktype_global_snap = &pgStatLocal.snapshot.global_io.stats[i];
 
 		LWLockAcquire(bktype_lock, LW_SHARED);
 
@@ -316,11 +376,11 @@ pgstat_io_snapshot_cb(void)
 		 * the reset timestamp as well.
 		 */
 		if (i == 0)
-			pgStatLocal.snapshot.io.stat_reset_timestamp =
-				pgStatLocal.shmem->io.stats.stat_reset_timestamp;
+			pgStatLocal.snapshot.global_io.stat_reset_timestamp =
+				pgStatLocal.shmem->io.global_stats.stat_reset_timestamp;
 
 		/* using struct assignment due to better type safety */
-		*bktype_snap = *bktype_shstats;
+		*bktype_global_snap = *bktype_global_shstats;
 		LWLockRelease(bktype_lock);
 	}
 }
@@ -503,3 +563,12 @@ pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 
 	return true;
 }
+
+void
+pgstat_io_init_backend_cb(void)
+{
+	/* Initialize our backend type in the IO statistics */
+	pgStatLocal.shmem->io.stat[MyProcNumber].bktype = MyBackendType;
+	/* Set the stat_reset_timestamp to zero */
+	pgStatLocal.shmem->io.stat[MyProcNumber].stat_reset_timestamp = 0;
+}
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index ec93bf6902..3cfcba9609 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -128,7 +128,7 @@ StatsShmemSize(void)
 {
 	Size		sz;
 
-	sz = MAXALIGN(sizeof(PgStat_ShmemControl));
+	sz = MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_Backend_IO), NumProcStatSlots));
 	sz = add_size(sz, pgstat_dsa_init_size());
 
 	/* Add shared memory for all the custom fixed-numbered statistics */
@@ -172,7 +172,7 @@ StatsShmemInit(void)
 		Assert(!found);
 
 		/* the allocation of pgStatLocal.shmem itself */
-		p += MAXALIGN(sizeof(PgStat_ShmemControl));
+		p += MAXALIGN(sizeof(PgStat_ShmemControl) + mul_size(sizeof(PgStat_Backend_IO), NumProcStatSlots));
 
 		/*
 		 * Create a small dsa allocation in plain shared memory. This is
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 33c7b25560..4c00570396 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1363,14 +1363,17 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 
-	backends_io_stats = pgstat_fetch_stat_io();
+	backends_io_stats = pgstat_fetch_global_stat_io();
 
 	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
 
 	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
 	{
-		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
-		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
+		Datum		bktype_desc;
+		PgStat_BktypeIO *bktype_stats;
+
+		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		bktype_stats = &backends_io_stats->stats[bktype];
 
 		/*
 		 * In Assert builds, we can afford an extra loop through all of the
@@ -1462,6 +1465,111 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_my_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend_IO *backend_io_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backend_io_stats = pgstat_fetch_my_stat_io();
+
+	bktype = backend_io_stats->bktype;
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_io_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_io_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_io_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 53a081ed88..4d5af3d7cf 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5880,6 +5880,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: My backend IO statistics',
+  proname => 'pg_stat_get_my_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => '',
+  proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_my_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index be2c91168a..9d63b1a5b7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -16,6 +16,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/pgarch.h"	/* for MAX_XFN_CHARS */
 #include "replication/conflict.h"
+#include "storage/proc.h"
 #include "utils/backend_progress.h" /* for backward compatibility */
 #include "utils/backend_status.h"	/* for backward compatibility */
 #include "utils/relcache.h"
@@ -76,6 +77,8 @@
  */
 #define PGSTAT_KIND_EXPERIMENTAL	128
 
+#define  NumProcStatSlots (MaxBackends + NUM_AUXILIARY_PROCS)
+
 static inline bool
 pgstat_is_kind_builtin(PgStat_Kind kind)
 {
@@ -351,6 +354,12 @@ typedef struct PgStat_IO
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend_IO
+{
+	TimestampTz stat_reset_timestamp;
+	BackendType bktype;
+	PgStat_BktypeIO stats;
+} PgStat_Backend_IO;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -559,7 +568,8 @@ extern instr_time pgstat_prepare_io_time(bool track_io_guc);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time, uint32 cnt);
 
-extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern PgStat_IO *pgstat_fetch_global_stat_io(void);
+extern PgStat_Backend_IO *pgstat_fetch_my_stat_io(void);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
@@ -568,7 +578,7 @@ extern bool pgstat_tracks_io_object(BackendType bktype,
 									IOObject io_object, IOContext io_context);
 extern bool pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 								IOContext io_context, IOOp io_op);
-
+extern void pgstat_reset_io_counter_internal(int index, TimestampTz ts);
 
 /*
  * Functions in pgstat_database.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index bba90e898d..ea4cc6a2c6 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -366,11 +366,12 @@ typedef struct PgStatShared_Checkpointer
 typedef struct PgStatShared_IO
 {
 	/*
-	 * locks[i] protects stats.stats[i]. locks[0] also protects
-	 * stats.stat_reset_timestamp.
+	 * locks[i] protects global_stats.stats[i]. locks[0] also protects
+	 * global_stats.stat_reset_timestamp.
 	 */
 	LWLock		locks[BACKEND_NUM_TYPES];
-	PgStat_IO	stats;
+	PgStat_IO	global_stats;
+	PgStat_Backend_IO	stat[FLEXIBLE_ARRAY_MEMBER];
 } PgStatShared_IO;
 
 typedef struct PgStatShared_SLRU
@@ -463,7 +464,6 @@ typedef struct PgStat_ShmemControl
 	PgStatShared_Archiver archiver;
 	PgStatShared_BgWriter bgwriter;
 	PgStatShared_Checkpointer checkpointer;
-	PgStatShared_IO io;
 	PgStatShared_SLRU slru;
 	PgStatShared_Wal wal;
 
@@ -473,6 +473,8 @@ typedef struct PgStat_ShmemControl
 	 */
 	void	   *custom_data[PGSTAT_KIND_CUSTOM_SIZE];
 
+	/* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
+	PgStatShared_IO io;
 } PgStat_ShmemControl;
 
 
@@ -494,7 +496,8 @@ typedef struct PgStat_Snapshot
 
 	PgStat_CheckpointerStats checkpointer;
 
-	PgStat_IO	io;
+	PgStat_Backend_IO	my_io;
+	PgStat_IO	global_io;
 
 	PgStat_SLRUStats slru[SLRU_NUM_ELEMENTS];
 
@@ -629,6 +632,7 @@ extern bool pgstat_io_flush_cb(bool nowait);
 extern void pgstat_io_init_shmem_cb(void *stats);
 extern void pgstat_io_reset_all_cb(TimestampTz ts);
 extern void pgstat_io_snapshot_cb(void);
+extern void pgstat_io_init_backend_cb(void);
 
 
 /*
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a1626f3fae..785298d7b0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1398,6 +1398,25 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_my_stat_io| SELECT backend_type,
+    object,
+    context,
+    reads,
+    read_time,
+    writes,
+    write_time,
+    writebacks,
+    writeback_time,
+    extends,
+    extend_time,
+    op_bytes,
+    hits,
+    evictions,
+    reuses,
+    fsyncs,
+    fsync_time,
+    stats_reset
+   FROM pg_stat_get_my_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 6e08898b18..c489e528e0 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1261,9 +1261,14 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1285,16 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1314,23 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1551,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1567,14 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index d8ac0d06f4..c95cb71652 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my]_stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -609,18 +609,26 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,7 +638,13 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
-
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -762,10 +776,15 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
 
 
 -- test BRIN index doesn't block HOT update
-- 
2.34.1

v4-0002-Add-pg_stat_get_backend_io.patchtext/x-diff; charset=us-asciiDownload
From 1988e754cec0771c8fc207c266ddea9dda5e70aa Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 28 Aug 2024 12:59:02 +0000
Subject: [PATCH v4 2/2] Add pg_stat_get_backend_io()

Adding the pg_stat_get_backend_io() function to retrieve I/O statistics for
a particular backend pid. Note this function behaves as if stats_fetch_consistency
is set to none.
---
 doc/src/sgml/monitoring.sgml           |  18 ++++
 src/backend/utils/activity/pgstat_io.c |   6 ++
 src/backend/utils/adt/pgstatfuncs.c    | 120 +++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat        |   9 ++
 src/include/pgstat.h                   |   1 +
 src/test/regress/expected/stats.out    |  25 ++++++
 src/test/regress/sql/stats.sql         |  16 +++-
 7 files changed, 194 insertions(+), 1 deletion(-)
  10.9% doc/src/sgml/
  47.1% src/backend/utils/adt/
   9.2% src/include/catalog/
  15.4% src/test/regress/expected/
  14.4% src/test/regress/sql/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b28ca4e0ca..fb908172f8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4770,6 +4770,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. This function behaves as if <varname>stats_fetch_consistency</varname>
+        is set to <literal>none</literal> (means each execution re-fetches
+        counters from shared memory).
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index d32121fc1f..9764fd399b 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -170,6 +170,12 @@ pgstat_fetch_my_stat_io(void)
 	return &pgStatLocal.snapshot.my_io;
 }
 
+PgStat_Backend_IO *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	return &pgStatLocal.shmem->io.stat[procNumber];
+}
+
 /*
  * Check if there any IO stats waiting for flush.
  */
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 4c00570396..795ff4e2f6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1570,6 +1570,126 @@ pg_stat_get_my_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend_IO *backend_io_stats;
+	Datum		reset_time;
+	ProcNumber	procNumber;
+	PGPROC	   *proc;
+	BackendType bktype;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	InitMaterializedSRF(fcinfo, 0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/* Maybe an auxiliary process? */
+	if (proc == NULL)
+		proc = AuxiliaryPidGetProc(backend_pid);
+
+	/* This backend_pid does not exist */
+	if (proc != NULL)
+	{
+		procNumber = GetNumberFromPGProc(proc);
+		backend_io_stats = pgstat_fetch_proc_stat_io(procNumber);
+		bktype = backend_io_stats->bktype;
+		reset_time = TimestampTzGetDatum(backend_io_stats->stat_reset_timestamp);
+
+		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+		bktype_stats = &backend_io_stats->stats;
+
+		/*
+		 * In Assert builds, we can afford an extra loop through all of the
+		 * counters checking that only expected stats are non-zero, since it
+		 * keeps the non-Assert code cleaner.
+		 */
+		Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+		for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+		{
+			const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+			for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+			{
+				const char *context_name = pgstat_get_io_context_name(io_context);
+
+				Datum		values[IO_NUM_COLUMNS] = {0};
+				bool		nulls[IO_NUM_COLUMNS] = {0};
+
+				/*
+				 * Some combinations of BackendType, IOObject, and IOContext
+				 * are not valid for any type of IOOp. In such cases, omit the
+				 * entire row from the view.
+				 */
+				if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+					continue;
+
+				values[IO_COL_BACKEND_TYPE] = bktype_desc;
+				values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+				values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+				if (backend_io_stats->stat_reset_timestamp != 0)
+					values[IO_COL_RESET_TIME] = reset_time;
+				else
+					nulls[IO_COL_RESET_TIME] = true;
+
+				/*
+				 * Hard-code this to the value of BLCKSZ for now. Future
+				 * values could include XLOG_BLCKSZ, once WAL IO is tracked,
+				 * and constant multipliers, once non-block-oriented IO (e.g.
+				 * temporary file IO) is tracked.
+				 */
+				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+				for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+				{
+					int			op_idx = pgstat_get_io_op_index(io_op);
+					int			time_idx = pgstat_get_io_time_index(io_op);
+
+					/*
+					 * Some combinations of BackendType and IOOp, of IOContext
+					 * and IOOp, and of IOObject and IOOp are not tracked. Set
+					 * these cells in the view NULL.
+					 */
+					if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+					{
+						PgStat_Counter count =
+							bktype_stats->counts[io_obj][io_context][io_op];
+
+						values[op_idx] = Int64GetDatum(count);
+					}
+					else
+						nulls[op_idx] = true;
+
+					/* not every operation is timed */
+					if (time_idx == IO_COL_INVALID)
+						continue;
+
+					if (!nulls[op_idx])
+					{
+						PgStat_Counter time =
+							bktype_stats->times[io_obj][io_context][io_op];
+
+						values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+					}
+					else
+						nulls[time_idx] = true;
+				}
+
+				tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+									 values, nulls);
+			}
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4d5af3d7cf..c4295f7c4d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5889,6 +5889,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_my_io' },
 
+{ oid => '8896', descr => 'statistics: per backend type IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '30', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9d63b1a5b7..2a2b4a2947 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -570,6 +570,7 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 
 extern PgStat_IO *pgstat_fetch_global_stat_io(void);
 extern PgStat_Backend_IO *pgstat_fetch_my_stat_io(void);
+extern PgStat_Backend_IO *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index c489e528e0..aced015c2f 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1263,12 +1263,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1293,6 +1299,15 @@ SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
@@ -1553,6 +1568,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1575,6 +1592,14 @@ SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index c95cb71652..d05009e1f5 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -611,12 +611,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -626,6 +632,10 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 SELECT sum(extends) AS my_io_sum_shared_after_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
@@ -778,6 +788,8 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_io \gset
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
   FROM pg_my_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
@@ -785,7 +797,9 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
   FROM pg_my_stat_io \gset
 SELECT :my_io_stats_post_reset < :my_io_stats_pre_reset;
-
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS backend_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :backend_io_stats_post_reset < :backend_io_stats_pre_reset;
 
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
-- 
2.34.1

#13Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Bertrand Drouvot (#12)
Re: per backend I/O statistics

Hi,

On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:

- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.

doh! Nice catch, thanks!

And also new backends that are not re-using a previous "existing" process slot
are getting the last reset time (if any). So the right place to fix this is in
pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
right "conceptually" speaking).

Thanks, I confirm that it is fixed.

You mentioned some refactoring upthread:

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).

Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats? If you meant the same thing, please don't
mind it.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#14Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nazir Bilal Yavuz (#13)
Re: per backend I/O statistics

Hi,

On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:

Hi,

On Fri, 13 Sept 2024 at 19:09, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Fri, Sep 13, 2024 at 04:45:08PM +0300, Nazir Bilal Yavuz wrote:

- The pgstat_reset_io_counter_internal() is called in the
pgstat_shutdown_hook(). This causes the stats_reset column showing the
termination time of the old backend when its proc num is reassigned to
a new backend.

doh! Nice catch, thanks!

And also new backends that are not re-using a previous "existing" process slot
are getting the last reset time (if any). So the right place to fix this is in
pgstat_io_init_backend_cb(): done in v4 attached. v4 also sets the reset time to
0 in pgstat_shutdown_hook() (but that's not necessary though, that's just to be
right "conceptually" speaking).

Thanks, I confirm that it is fixed.

Thanks!

You mentioned some refactoring upthread:

On Fri, 6 Sept 2024 at 18:03, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

- If we agree on the current proposal then I'll do some refactoring in
pg_stat_get_backend_io(), pg_stat_get_my_io() and pg_stat_get_io() to avoid
duplicated code (it's not done yet to ease the review).

Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats?

The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]/messages/by-id/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal), the
statistics snapshot is build for "my backend stats" (means it depends of the
stats_fetch_consistency value). I can see use cases for that.

OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
none (each execution re-fetches counters from shared memory) (see [2]/messages/by-id/ZtsZtaRza9bFFeF8@ip-10-97-1-34.eu-west-3.compute.internal). Indeed,
the snapshot is not build in each backend to copy all the others backends stats,
as 1/ I think that there is no use case (there is no need to get others backends
I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
could be memory expensive depending of the number of max connections.

So I think it's better to keep both functions as they behave differently.

Thoughts?

If you meant the same thing, please don't
mind it.

What I meant to say is to try to remove the duplicate code that we can find in
those 3 functions (say creating a new function that would contain the duplicate
code and make use of this new function in the 3 others). Did not look at it in
details yet.

[1]: /messages/by-id/ZtXR+CtkEVVE/LHF@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/ZtsZtaRza9bFFeF8@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#15Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Bertrand Drouvot (#14)
Re: per backend I/O statistics

Hi,

On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

On Tue, Sep 17, 2024 at 02:52:01PM +0300, Nazir Bilal Yavuz wrote:

Could we remove pg_stat_get_my_io() completely and use
pg_stat_get_backend_io() with the current backend's pid to get the
current backend's stats?

The reason why I keep pg_stat_get_my_io() is because (as mentioned in [1]), the
statistics snapshot is build for "my backend stats" (means it depends of the
stats_fetch_consistency value). I can see use cases for that.

OTOH, pg_stat_get_backend_io() behaves as if stats_fetch_consistency is set to
none (each execution re-fetches counters from shared memory) (see [2]). Indeed,
the snapshot is not build in each backend to copy all the others backends stats,
as 1/ I think that there is no use case (there is no need to get others backends
I/O statistics while taking care of the stats_fetch_consistency) and 2/ that
could be memory expensive depending of the number of max connections.

So I think it's better to keep both functions as they behave differently.

Thoughts?

Yes, that is correct. Sorry, you already had explained it and I had
agreed with it but I forgot.

If you meant the same thing, please don't
mind it.

What I meant to say is to try to remove the duplicate code that we can find in
those 3 functions (say creating a new function that would contain the duplicate
code and make use of this new function in the 3 others). Did not look at it in
details yet.

I got it, thanks for the explanation.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#16Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nazir Bilal Yavuz (#15)
Re: per backend I/O statistics

Hi,

On Tue, Sep 17, 2024 at 04:47:51PM +0300, Nazir Bilal Yavuz wrote:

Hi,

On Tue, 17 Sept 2024 at 16:07, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

So I think it's better to keep both functions as they behave differently.

Thoughts?

Yes, that is correct. Sorry, you already had explained it and I had
agreed with it but I forgot.

No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#17Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#5)
Re: per backend I/O statistics

On Wed, Sep 04, 2024 at 04:45:24AM +0000, Bertrand Drouvot wrote:

On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:

As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.

Thanks for the feedback!

If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?

If we allow each session to disable the feature then the administrator
would lost this ability.

Hmm, I've been studying this patch, and I am not completely sure to
agree with this feeling of using fixed-numbered stats, actually, after
reading the whole and seeing the structure of the patch
(FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't
know exactly the number of slots we need to know for the
fixed-numbered stats as MaxBackends may change). If we make these
kind of stats variable-numbered, does it have to actually involve many
creations or removals of the stats entries, though? One point is that
the number of entries to know about is capped by max_connections,
which is a PGC_POSTMASTER. That's the same kind of control as
replication slots. So one approach would be to reuse entries in the
dshash and use in the hashing key the number in the procarrays. If a
new connection spawns and reuses a slot that was used in the past,
then reset all the existing fields and assign its PID.

Another thing is the consistency of the data that we'd like to keep at
shutdown. If the connections have a balanced amount of stats shared
among them, doing decision-making based on them is kind of easy. But
that may cause confusion if the activity is unbalanced across the
sessions. We could also not flush them to disk as an option, but it
still seems more useful to me to save this data across restarts if one
takes frequent snapshots of the new system view reporting everything,
so as it is possible to get an idea of the deltas across the snapshots
for each connection slot.
--
Michael

#18Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#16)
Re: per backend I/O statistics

On Tue, Sep 17, 2024 at 01:56:34PM +0000, Bertrand Drouvot wrote:

No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )

We've discussed a bit this patch offline, but after studying the patch
I doubt that this part is a good idea:

+	/* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
+	PgStatShared_IO io;
 } PgStat_ShmemControl;

We are going to be in trouble if we introduce a second member in this
routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl
relies on the fact that all its members after deterministic offset
positions in this structure. So this lacks flexibility. This choice
is caused by the fact that we don't exactly know the number of
backends because that's controlled by the PGC_POSTMASTER GUC
max_connections so the size of the structure would be undefined.

There is a parallel with replication slot statistics here, where we
save the replication slot data in the dshash based on their index
number in shmem. Hence, wouldn't it be better to do like replication
slot stats, where we use the dshash and a key based on the procnum of
each backend or auxiliary process (ProcNumber in procnumber.h)? If at
restart max_connections is lower than what was previously used, we
could discard entries that would not fit anymore into the charts.
This is probably not something that happens often, so I'm not really
worried about forcing the removal of these stats depending on how the
upper-bound of ProcNumber evolves.

So, using a new kind of ID and making this kind variable-numbered may
ease the implementation quite a bit, while avoiding any extensibility
issues with the shmem portion of the patch if these are
fixed-numbered. The reporting of these stats comes down to having a
parallel with pgstat_count_io_op_time(), but to make sure that the
stats are split by connection slot number rather than the current
split of pg_stat_io. All its callers are in localbuf.c, bufmgr.c and
md.c, living with some duplication in the code paths to gather the
stats may be OK.

pg_stat_get_my_io() is based on a O(n^3). IOOBJECT_NUM_TYPES is
fortunately low, still that's annoying.

This would rely on the fact that we would use the ProcNumber for the
dshash key, and this information is not provided in pg_stat_activity.
Perhaps we should add this information in pg_stat_activity so as it
would be easily possible to do joins with a SQL function that returns
a SRF with all the stats associated with a given connection slot
(auxiliary or backend process)? That would be a separate patch.
Perhaps that's even something that has popped up for the work with
threading (did not follow this part closely, TBH)?

The active PIDs of the live sessions are not stored in the active
stats, why not? Perhaps that's useless anyway if we expose the
ProcNumbers in pg_stat_activity and make the stats available with a
single function taking in input a ProcNumber. Just mentioning an
option to consider.
--
Michael

#19Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#17)
Re: per backend I/O statistics

Hi,

On Fri, Sep 20, 2024 at 12:53:43PM +0900, Michael Paquier wrote:

On Wed, Sep 04, 2024 at 04:45:24AM +0000, Bertrand Drouvot wrote:

On Tue, Sep 03, 2024 at 04:07:58PM +0900, Kyotaro Horiguchi wrote:

As an additional benefit of this approach, the client can set a
connection variable, for example, no_backend_iostats to true, or set
its inverse variable to false, to restrict memory usage to only the
required backends.

Thanks for the feedback!

If we were to add an on/off switch button, I think I'd vote for a global one
instead. Indeed, I see this feature more like an "Administrator" one, where
the administrator wants to be able to find out which session is reponsible of
what (from an I/O point of view): like being able to anwser "which session is
generating this massive amount of reads"?

If we allow each session to disable the feature then the administrator
would lost this ability.

Hmm, I've been studying this patch,

Thanks for looking at it!

and I am not completely sure to
agree with this feeling of using fixed-numbered stats, actually, after
reading the whole and seeing the structure of the patch
(FLEXIBLE_ARRAY_MEMBER is a new way to handle the fact that we don't
know exactly the number of slots we need to know for the
fixed-numbered stats as MaxBackends may change).

Right, that's a new way of dealing with "unknown" number of slots (and it has
cons as you mentioned in [1]/messages/by-id/Zuz5iQ4AjcuOMx_w@paquier.xyz).

If we make these
kind of stats variable-numbered, does it have to actually involve many
creations or removals of the stats entries, though? One point is that
the number of entries to know about is capped by max_connections,
which is a PGC_POSTMASTER. That's the same kind of control as
replication slots. So one approach would be to reuse entries in the
dshash and use in the hashing key the number in the procarrays. If a
new connection spawns and reuses a slot that was used in the past,
then reset all the existing fields and assign its PID.

Yeah, like it's done currently with the "fixed-numbered" stats proposal. That
sounds reasonable to me, I'll look at this proposed approach and come back with
a new patch version, thanks!

Another thing is the consistency of the data that we'd like to keep at
shutdown. If the connections have a balanced amount of stats shared
among them, doing decision-making based on them is kind of easy. But
that may cause confusion if the activity is unbalanced across the
sessions. We could also not flush them to disk as an option, but it
still seems more useful to me to save this data across restarts if one
takes frequent snapshots of the new system view reporting everything,
so as it is possible to get an idea of the deltas across the snapshots
for each connection slot.

The idea that has been implemented so far in this patch is that we still maintain
an aggregated version of the stats (visible through pg_stat_io) and that only the
aggregated stats are flushed/read to/from disk (means we don't flush the
per-backend stats).

I think that it makes sense that way. The way I see it is that the per-backend
I/O stats is more for current activity instrumentation. So it's not clear to me
what would be the benefits of restoring the per-backend stats at startup knowing
that: 1) we restored the aggregated stats and 2) the sessions that were responsible
for the the restored stats are gone.

[1]: /messages/by-id/Zuz5iQ4AjcuOMx_w@paquier.xyz

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#18)
Re: per backend I/O statistics

Hi,

On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:

On Tue, Sep 17, 2024 at 01:56:34PM +0000, Bertrand Drouvot wrote:

No problem at all! (I re-explained because I'm not always 100% sure that my
explanations are crystal clear ;-) )

We've discussed a bit this patch offline, but after studying the patch
I doubt that this part is a good idea:

+	/* has to be at the end due to FLEXIBLE_ARRAY_MEMBER */
+	PgStatShared_IO io;
} PgStat_ShmemControl;

We are going to be in trouble if we introduce a second member in this
routine that has a FLEXIBLE_ARRAY_MEMBER, because PgStat_ShmemControl
relies on the fact that all its members after deterministic offset
positions in this structure.

Agree that it would be an issue should we have to add a new FLEXIBLE_ARRAY_MEMBER.

So this lacks flexibility. This choice
is caused by the fact that we don't exactly know the number of
backends because that's controlled by the PGC_POSTMASTER GUC
max_connections so the size of the structure would be undefined.

Right.

There is a parallel with replication slot statistics here, where we
save the replication slot data in the dshash based on their index
number in shmem. Hence, wouldn't it be better to do like replication
slot stats, where we use the dshash and a key based on the procnum of
each backend or auxiliary process (ProcNumber in procnumber.h)? If at
restart max_connections is lower than what was previously used, we
could discard entries that would not fit anymore into the charts.
This is probably not something that happens often, so I'm not really
worried about forcing the removal of these stats depending on how the
upper-bound of ProcNumber evolves.

Yeah, I'll look at implementing the dshash based on their procnum and see
where it goes.

So, using a new kind of ID and making this kind variable-numbered may
ease the implementation quite a bit, while avoiding any extensibility
issues with the shmem portion of the patch if these are
fixed-numbered.

Agree.

This would rely on the fact that we would use the ProcNumber for the
dshash key, and this information is not provided in pg_stat_activity.
Perhaps we should add this information in pg_stat_activity so as it
would be easily possible to do joins with a SQL function that returns
a SRF with all the stats associated with a given connection slot
(auxiliary or backend process)?

I'm not sure that's needed. What has been done in the previous versions is
to get the stats based on the pid (see pg_stat_get_backend_io()) where the
procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)).

Do you see an issue with that approach?

The active PIDs of the live sessions are not stored in the active
stats, why not?

That was not needed. We can still retrieve the stats based on the pid thanks
to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having
to actually store the pid in the stats. I think that's fine because the pid
only matters at "display" time (pg_stat_get_backend_io()).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#21Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#20)
Re: per backend I/O statistics

On Mon, Oct 07, 2024 at 09:54:21AM +0000, Bertrand Drouvot wrote:

On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:

This would rely on the fact that we would use the ProcNumber for the
dshash key, and this information is not provided in pg_stat_activity.
Perhaps we should add this information in pg_stat_activity so as it
would be easily possible to do joins with a SQL function that returns
a SRF with all the stats associated with a given connection slot
(auxiliary or backend process)?

I'm not sure that's needed. What has been done in the previous versions is
to get the stats based on the pid (see pg_stat_get_backend_io()) where the
procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)).

Ah, I see. So you could just have the proc number in the key to
control the upper-bound on the number of possible stats entries in the
dshash.

Assuming that none of this data is persisted to the stats file at
shutdown and that the stats of a single entry are reset each time a
new backend reuses a previous proc slot, that would be OK by me, I
guess.

The active PIDs of the live sessions are not stored in the active
stats, why not?

That was not needed. We can still retrieve the stats based on the pid thanks
to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having
to actually store the pid in the stats. I think that's fine because the pid
only matters at "display" time (pg_stat_get_backend_io()).

Okay, per the above and the persistency of the stats.
--
Michael

#22Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#21)
Re: per backend I/O statistics

Hi,

On Tue, Oct 08, 2024 at 01:46:23PM +0900, Michael Paquier wrote:

On Mon, Oct 07, 2024 at 09:54:21AM +0000, Bertrand Drouvot wrote:

On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:

This would rely on the fact that we would use the ProcNumber for the
dshash key, and this information is not provided in pg_stat_activity.
Perhaps we should add this information in pg_stat_activity so as it
would be easily possible to do joins with a SQL function that returns
a SRF with all the stats associated with a given connection slot
(auxiliary or backend process)?

I'm not sure that's needed. What has been done in the previous versions is
to get the stats based on the pid (see pg_stat_get_backend_io()) where the
procnumber is retrieved with something like GetNumberFromPGProc(BackendPidGetProc(pid)).

Ah, I see. So you could just have the proc number in the key to
control the upper-bound on the number of possible stats entries in the
dshash.

Yes.

Assuming that none of this data is persisted to the stats file at
shutdown and that the stats of a single entry are reset each time a
new backend reuses a previous proc slot, that would be OK by me, I
guess.

The active PIDs of the live sessions are not stored in the active
stats, why not?

That was not needed. We can still retrieve the stats based on the pid thanks
to something like GetNumberFromPGProc(BackendPidGetProc(pid)) without having
to actually store the pid in the stats. I think that's fine because the pid
only matters at "display" time (pg_stat_get_backend_io()).

Okay, per the above and the persistency of the stats.

Great, I'll work on an updated patch version then.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#23Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#22)
Re: per backend I/O statistics

Hi,

On Tue, Oct 08, 2024 at 04:28:39PM +0000, Bertrand Drouvot wrote:

On Fri, Sep 20, 2024 at 01:26:49PM +0900, Michael Paquier wrote:

Okay, per the above and the persistency of the stats.

Great, I'll work on an updated patch version then.

I spend some time on this during the last 2 days and I think we have 3 design
options.

=== GOALS ===

But first let's sump up the goals that I think we agreed on:

- Keep pg_stat_io as it is today: give the whole server picture and serialize
the stats to disk.

- Introduce per-backend IO stats and 2 new APIs to:

1. Provide the IO stats for "my backend" (through say pg_my_stat_io), this
would take care of the stats_fetch_consistency.

2. Retrieve the IO stats for another backend (through say pg_stat_get_backend_io(pid))
that would _not_ take care of stats_fetch_consistency, as:

2.1/ I think that there is no use case (there is no need to get others
backends I/O statistics while taking care of the stats_fetch_consistency)

2.2/ That could be memory expensive to store a snapshot for all the backends
(depending of the number of backend created)

- There is no need to serialize the per-backend IO stats to disk (no point to
see stats for backends that do not exist anymore after a re-start).

- The per-backend IO stats should be variable-numbered (not fixed), as per
up-thread discussion.

=== OPTIONS ===

So, based on this, I think that we could:

Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this
KIND take care of the aggregated view (pg_stat_io) and the per-backend stats.

Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO
that would be variable-numbered.

Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that
would be variable-numbered and store the "aggregated stats aka pg_stat_io" in
shared memory (not part of the variable-numbered hash). Per-backend stats
could be aggregated into "pg_stat_io" during the flush_pending_cb call for example.

=== BEST OPTION? ===

I would opt for Option 2 as:

- The stats system is currently not designed for Option 1 and our goals (for
example the shared_data_len is used to serialize but also to fetch the entries,
see pgstat_fetch_entry()) so that would need some hack to serialize only a part
of them and still be able to fetch them all).

- Mixing "fixed" and "variable" in the same KIND does not sound like a good idea
(though that might be possible with some hacks, I don't think that would be
easy to maintain).

- Having the per-backend as "variable" in its dedicated kind looks more reasonable
and less error-prone.

- I don't think there is a stats design similar to option 3 currently, so I'm
not sure there is a need to develop something new while Option 2 could be done.

- Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization
part.

- Option 2 seems to offer more flexibility (as compare to Option 1 and 3).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#24Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#23)
Re: per backend I/O statistics

Hi,

On Thu, Oct 31, 2024 at 05:09:56AM +0000, Bertrand Drouvot wrote:

=== OPTIONS ===

So, based on this, I think that we could:

Option 1: "move" the existing PGSTAT_KIND_IO to variable-numbered and let this
KIND take care of the aggregated view (pg_stat_io) and the per-backend stats.

Option 2: let PGSTAT_KIND_IO as it is and introduce a new PGSTAT_KIND_BACKEND_IO
that would be variable-numbered.

Option 3: Remove PGSTAT_KIND_IO, introduce a new PGSTAT_KIND_BACKEND_IO that
would be variable-numbered and store the "aggregated stats aka pg_stat_io" in
shared memory (not part of the variable-numbered hash). Per-backend stats
could be aggregated into "pg_stat_io" during the flush_pending_cb call for example.

=== BEST OPTION? ===

I would opt for Option 2 as:

- The stats system is currently not designed for Option 1 and our goals (for
example the shared_data_len is used to serialize but also to fetch the entries,
see pgstat_fetch_entry()) so that would need some hack to serialize only a part
of them and still be able to fetch them all).

- Mixing "fixed" and "variable" in the same KIND does not sound like a good idea
(though that might be possible with some hacks, I don't think that would be
easy to maintain).

- Having the per-backend as "variable" in its dedicated kind looks more reasonable
and less error-prone.

- I don't think there is a stats design similar to option 3 currently, so I'm
not sure there is a need to develop something new while Option 2 could be done.

- Option 3 would need some hack for (at least) the "pg_stat_io" [de]serialization
part.

- Option 2 seems to offer more flexibility (as compare to Option 1 and 3).

Thoughts?

And why not add more per-backend stats in the future? (once the I/O part is done).

I think that's one more reason to go with option 2 (and implementing a brand new
PGSTAT_KIND_BACKEND kind).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#25Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#24)
Re: per backend I/O statistics

Hi,

On Mon, Nov 04, 2024 at 10:01:50AM +0000, Bertrand Drouvot wrote:

And why not add more per-backend stats in the future? (once the I/O part is done).

I think that's one more reason to go with option 2 (and implementing a brand new
PGSTAT_KIND_BACKEND kind).

I'm starting working on option 2, I think it will be easier to discuss with
a patch proposal to look at.

If in the meantime, one strongly disagree with option 2 (means implement a brand
new PGSTAT_KIND_BACKEND and keep PGSTAT_KIND_IO), please let me know.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#26Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#25)
Re: per backend I/O statistics

On Tue, Nov 05, 2024 at 05:37:15PM +0000, Bertrand Drouvot wrote:

I'm starting working on option 2, I think it will be easier to discuss with
a patch proposal to look at.

If in the meantime, one strongly disagree with option 2 (means implement a brand
new PGSTAT_KIND_BACKEND and keep PGSTAT_KIND_IO), please let me know.

Sorry for the late reply, catching up a bit.

As you are quoting in [1]/messages/by-id/ZyMRJIbUpNPoCXUe@ip-10-97-1-34.eu-west-3.compute.internal -- Michael, you do not expect the backend-io stats and
the more global pg_stat_io to achieve the same level of consistency as
the backend stats would be gone at restart, and wiped out when a
backend shuts down. So, splitting them with a different stats kind
feels more natural because it would be possible to control how each
stat kind behaves depending on the code shutdown and reset paths
within their own callbacks rather than making the callbacks of
PGSTAT_KIND_IO more complex than they already are. And pg_stat_io is
a fixed-numbered stats kind because of the way it aggregates its stats
with a number states defined at compile-time.

Is the structure you have in mind different than PgStat_BktypeIO?
Perhaps a split is better anyway with that in mind.

The amount of memory required to store the snapshots of backend-IO
does not worry me much, TBH, but you are worried about a high turnover
of connections that could cause a lot of bloat in the backend-IO
snapshots because of the persistency that these stats would have,
right? Actually, we already have cases with other stats kinds where
it is possible for a backend to hold references to some stats on an
object that has been dropped concurrently. At least, that's possible
when a backend shuts down. If possible, supporting snapshots would be
more consistent with the other stats.

Just to be clear, I am not in favor of making PgStat_HashKey larger
than it already is. With 8 bytes allocated for the object ID, the
chances of conflicts in the dshash for a single variable-numbered
stats kind play in our favor already even with a couple of million
entries.

[1]: /messages/by-id/ZyMRJIbUpNPoCXUe@ip-10-97-1-34.eu-west-3.compute.internal -- Michael
--
Michael

#27Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#26)
Re: per backend I/O statistics

Hi,

On Wed, Nov 06, 2024 at 08:39:07AM +0900, Michael Paquier wrote:

On Tue, Nov 05, 2024 at 05:37:15PM +0000, Bertrand Drouvot wrote:

I'm starting working on option 2, I think it will be easier to discuss with
a patch proposal to look at.

If in the meantime, one strongly disagree with option 2 (means implement a brand
new PGSTAT_KIND_BACKEND and keep PGSTAT_KIND_IO), please let me know.

Sorry for the late reply, catching up a bit.

No problem at all, thanks for looking at it!

As you are quoting in [1], you do not expect the backend-io stats and
the more global pg_stat_io to achieve the same level of consistency as
the backend stats would be gone at restart, and wiped out when a
backend shuts down.

Yes.

So, splitting them with a different stats kind
feels more natural because it would be possible to control how each
stat kind behaves depending on the code shutdown and reset paths
within their own callbacks rather than making the callbacks of
PGSTAT_KIND_IO more complex than they already are.

Yeah, thanks for sharing your thoughts.

And pg_stat_io is
a fixed-numbered stats kind because of the way it aggregates its stats
with a number states defined at compile-time.

Is the structure you have in mind different than PgStat_BktypeIO?

Very close.

Perhaps a split is better anyway with that in mind.

The in-progress patch (not shared yet) is using the following:

"
typedef struct PgStat_Backend
{
TimestampTz stat_reset_timestamp;
BackendType bktype;
PgStat_BktypeIO stats;
} PgStat_Backend;
"

The bktype is used to be able to filter the stats correctly when we display them.

The amount of memory required to store the snapshots of backend-IO
does not worry me much, TBH, but you are worried about a high turnover
of connections that could cause a lot of bloat in the backend-IO
snapshots because of the persistency that these stats would have,
right?

Not only a high turnover but also a high number of entries created in the hash.

Furthermore I don't see any use case of relying on stats_fetch_consistency
while querying other backend's stats.

If possible, supporting snapshots would be
more consistent with the other stats.

I have I mind to support the snapshots _only_ when querying our own stats. I can
measure the memory impact if we use them also when querying other backends stats
too (though I don't see a use case).

Just to be clear, I am not in favor of making PgStat_HashKey larger
than it already is.

That's not needed, the patch I'm working on stores the proc number in the
objid field of the key.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#28Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#27)
Re: per backend I/O statistics

On Wed, Nov 06, 2024 at 01:51:02PM +0000, Bertrand Drouvot wrote:

That's not needed, the patch I'm working on stores the proc number in the
objid field of the key.

Relying on the procnumber for the object ID gets a +1 here. That
provides an automatic cap on the maximum number of entries that can
exist at once for this new stats kind.
--
Michael

#29Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#28)
Re: per backend I/O statistics

Hi,

On Thu, Nov 07, 2024 at 09:50:59AM +0900, Michael Paquier wrote:

On Wed, Nov 06, 2024 at 01:51:02PM +0000, Bertrand Drouvot wrote:

That's not needed, the patch I'm working on stores the proc number in the
objid field of the key.

Relying on the procnumber for the object ID gets a +1 here.

Thanks!

That
provides an automatic cap on the maximum number of entries that can
exist at once for this new stats kind.

Yeah.

POC patch is now working, will wear a reviewer hat now and will share in one
or 2 days.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#30Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#29)
4 attachment(s)
Re: per backend I/O statistics

Hi,

On Thu, Nov 07, 2024 at 04:32:44PM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Nov 07, 2024 at 09:50:59AM +0900, Michael Paquier wrote:

On Wed, Nov 06, 2024 at 01:51:02PM +0000, Bertrand Drouvot wrote:

That's not needed, the patch I'm working on stores the proc number in the
objid field of the key.

Relying on the procnumber for the object ID gets a +1 here.

Thanks!

That
provides an automatic cap on the maximum number of entries that can
exist at once for this new stats kind.

Yeah.

POC patch is now working, will wear a reviewer hat now and will share in one
or 2 days.

Please find attached v5 that implements the per-backend IO stats as
variable-numbered stats kind (as per the up-thread discussion).

It is split into 4 sub-patches:

==== 0001 (the largest one)

Introduces a new statistics KIND. It is named PGSTAT_KIND_PER_BACKEND as it could
be used in the future to store other statistics (than the I/O ones) per backend.
The new KIND is a variable-numbered one and has an automatic cap on the
maximum number of entries (as its hash key contains the proc number).

There is no need to serialize the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so a
new "to_serialize" field is added in the PgStat_KindInfo struct.

It adds a new pg_my_stat_io view to display "my" backend I/O statistics.

It also adds a new function pg_stat_reset_single_backend_io_counters() to be
able to reset the I/O stats for a given backend pid.

It contains doc updates and dedicated tests.

A few remarks:

1. one assertion in pgstat_drop_entry_internal() is not necessary true anymore
with this new stat kind. So, adding an extra bool as parameter to take care of it.

2. a new struct "PgStat_Backend" is created and does contain the backend type.
The backend type is used for filtering purpose when the stats are displayed.

3. when the stats are reset, we need to keep the backend type so the change
in shared_stat_reset_contents().

4. the pending stats are updated in the existing pgstat_count_io_op_n() and
pgstat_count_io_op_time() functions.

5. this new kind has its own flush callback: pgstat_per_backend_flush_cb().
But there is no need to maintain 2 callbacks for the I/O stats, so 0002 will
merge both and remove the one from the fixed stats. The reason it's not done
in 0001 is to ease the review.

==== 0002

Merge both IO stats flush callback. There is no need to keep both callbacks.

Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.

The patch removes:

1. pgstat_io_flush_cb()
2. PendingIOStats (as the same information is already stored in the pending
entries)
3. have_iostats (the patch relies on the pending entries instead)

==== 0003

Don't include other backend's stats in the snapshot.

When stats_fetch_consistency is set to 'snapshot', don't include other backend's
stats in the snapshot. There is no use case, so save memory usage.

==== 0004

Adding the pg_stat_get_backend_io() function to retrieve I/O statistics for
a particular backend pid.

Note that this function does not return any rows if stats_fetch_consistency
is set to 'snapshot' and the pid of interest is not our own pid (there is no use
case of retrieving other backend's stats with stats_fetch_consistency set to
'snapshot').

The patch:

1. replaces pg_stat_get_my_io() with pg_stat_get_backend_io()
2. adds pgstat_fetch_proc_stat_io()
3. adds doc
4. adds tests

=== Remarks

R1. The key field is still "objid" even if it stores the proc number. I think
that's fine and there is no need to rename it as it's related comment states:

"
/* object ID (table, function, etc.), or identifier. */
"

R2. pg_stat_get_io() and pg_stat_get_backend_io() could probably be merged (
duplicated code). That's not mandatory to provide the new per backend I/O stats
feature. So let's keep it as future work to ease the review.

R3. There is no use case of retrieving other backend's IO stats with
stats_fetch_consistency set to 'snapshot'. Avoiding this behavior allows us to
save memory (that could be non negligeable as pgstat_build_snapshot() would add
_all_ the other backend's stats in the snapshot). I choose to document it and to
not return any rows: I think that's better than returning some data (that would
not "ensure" the snapshot consistency) or an error.

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v5-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 89de6e71540912ca294be38fea0ed5636b718521 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 28 Oct 2024 12:50:32 +0000
Subject: [PATCH v5 1/4] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds a new
statistics kind and a new pg_my_stat_io view to display "my" backend I/O statistics.

The new KIND is named PGSTAT_KIND_PER_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to serialize the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so a
new "to_serialize" field is added in the PgStat_KindInfo struct.

Also adding a new function pg_stat_reset_single_backend_io_counters() to be
able to reset the I/O stats for a given backend pid.

A subsequent commit will add a new pg_stat_get_backend_io() function to be
able to retrieve the I/O statistics for a given backend pid.

XXX: Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                      |   4 +-
 doc/src/sgml/monitoring.sgml                  |  43 ++++++
 src/backend/catalog/system_functions.sql      |   2 +
 src/backend/catalog/system_views.sql          |  22 ++++
 src/backend/utils/activity/backend_status.c   |   3 +
 src/backend/utils/activity/pgstat.c           |  34 ++++-
 src/backend/utils/activity/pgstat_io.c        | 114 ++++++++++++++--
 src/backend/utils/activity/pgstat_shmem.c     |  30 ++++-
 src/backend/utils/adt/pgstatfuncs.c           | 124 ++++++++++++++++++
 src/include/catalog/pg_proc.dat               |  14 ++
 src/include/pgstat.h                          |  27 +++-
 src/include/utils/pgstat_internal.h           |  13 ++
 .../injection_points/injection_stats.c        |   1 +
 src/test/regress/expected/rules.out           |  19 +++
 src/test/regress/expected/stats.out           |  60 ++++++++-
 src/test/regress/sql/stats.sql                |  31 ++++-
 src/tools/pgindent/typedefs.list              |   2 +
 17 files changed, 517 insertions(+), 26 deletions(-)
  10.2% doc/src/sgml/
  29.7% src/backend/utils/activity/
  19.2% src/backend/utils/adt/
   5.1% src/include/catalog/
   7.2% src/include/
  14.7% src/test/regress/expected/
  10.4% src/test/regress/sql/
   3.1% src/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d54f904956..fbabf371a6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8361,7 +8361,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
+        <structname>pg_stat_io</structname></link>,
+        <link linkend="monitoring-pg-my-stat-io-view">
+        <structname>pg_my_stat_io</structname></link>, in the output of
         <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
         is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 331315f8d3..fc6aded3da 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -488,6 +488,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_my_stat_io</structname><indexterm><primary>pg_my_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of context and target object containing
+       my backend I/O statistics.
+       See <link linkend="monitoring-pg-my-stat-io-view">
+       <structname>pg_my_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -2946,7 +2956,23 @@ description | Waiting for a newly initialized WAL file to reach durable storage
    </para>
   </note>
 
+ </sect2>
 
+ <sect2 id="monitoring-pg-my-stat-io-view">
+  <title><structname>pg_my_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_my_stat_io</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_my_stat_io</structname> view will contain one row for each
+   combination of target I/O object and I/O context, showing
+   my backend I/O statistics. Combinations which do not make sense are
+   omitted. The fields are exactly the same as the ones in the <link
+   linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+   view.
+  </para>
 
  </sect2>
 
@@ -4971,6 +4997,23 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_single_backend_io_counters</primary>
+        </indexterm>
+        <function>pg_stat_reset_single_backend_io_counters</function> ( <type>int4</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets I/O statistics for a single backend to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..8a1ae44fca 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_backend_io_counters(int4) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3456b821bc..09af4a40a8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1170,6 +1170,28 @@ SELECT
        b.stats_reset
 FROM pg_stat_get_io() b;
 
+CREATE VIEW pg_my_stat_io AS
+SELECT
+       b.backend_type,
+       b.object,
+       b.context,
+       b.reads,
+       b.read_time,
+       b.writes,
+       b.write_time,
+       b.writebacks,
+       b.writeback_time,
+       b.extends,
+       b.extend_time,
+       b.op_bytes,
+       b.hits,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.fsync_time,
+       b.stats_reset
+FROM pg_stat_get_my_io() b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index bdb3a296ca..aa683e150c 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -249,6 +249,9 @@ pgstat_beinit(void)
 	Assert(MyProcNumber >= 0 && MyProcNumber < NumBackendStatSlots);
 	MyBEEntry = &BackendStatusArray[MyProcNumber];
 
+	/* Create the per backend stat entry */
+	pgstat_create_backend_stat(MyProcNumber);
+
 	/* Set up a process-exit hook to clean up */
 	on_shmem_exit(pgstat_beshutdown_hook, 0);
 }
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index ea8c5691e8..aacf61c9a4 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -281,6 +281,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "database",
 
 		.fixed_amount = false,
+		.to_serialize = true,
 		/* so pg_stat_database entries can be seen in all databases */
 		.accessed_across_databases = true,
 
@@ -297,6 +298,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "relation",
 
 		.fixed_amount = false,
+		.to_serialize = true,
 
 		.shared_size = sizeof(PgStatShared_Relation),
 		.shared_data_off = offsetof(PgStatShared_Relation, stats),
@@ -311,6 +313,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "function",
 
 		.fixed_amount = false,
+		.to_serialize = true,
 
 		.shared_size = sizeof(PgStatShared_Function),
 		.shared_data_off = offsetof(PgStatShared_Function, stats),
@@ -324,6 +327,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "replslot",
 
 		.fixed_amount = false,
+		.to_serialize = true,
 
 		.accessed_across_databases = true,
 
@@ -340,6 +344,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "subscription",
 
 		.fixed_amount = false,
+		.to_serialize = true,
 		/* so pg_stat_subscription_stats entries can be seen in all databases */
 		.accessed_across_databases = true,
 
@@ -352,6 +357,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_PER_BACKEND] = {
+		.name = "perbackend",
+
+		.fixed_amount = false,
+		.to_serialize = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_PendingIO),
+
+		.flush_pending_cb = pgstat_per_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -359,6 +380,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "archiver",
 
 		.fixed_amount = true,
+		.to_serialize = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, archiver),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, archiver),
@@ -374,6 +396,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "bgwriter",
 
 		.fixed_amount = true,
+		.to_serialize = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, bgwriter),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, bgwriter),
@@ -389,6 +412,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "checkpointer",
 
 		.fixed_amount = true,
+		.to_serialize = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, checkpointer),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, checkpointer),
@@ -404,6 +428,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "io",
 
 		.fixed_amount = true,
+		.to_serialize = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, io),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, io),
@@ -421,6 +446,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "slru",
 
 		.fixed_amount = true,
+		.to_serialize = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, slru),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, slru),
@@ -438,6 +464,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.name = "wal",
 
 		.fixed_amount = true,
+		.to_serialize = true,
 
 		.snapshot_ctl_off = offsetof(PgStat_Snapshot, wal),
 		.shared_ctl_off = offsetof(PgStat_ShmemControl, wal),
@@ -515,7 +542,7 @@ pgstat_discard_stats(void)
 
 	/*
 	 * Reset stats contents. This will set reset timestamps of fixed-numbered
-	 * stats to the current time (no variable stats exist).
+	 * stats to the current time (the per-backend variable stat exists too).
 	 */
 	pgstat_reset_after_failure();
 }
@@ -756,7 +783,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / backend / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush of fixed-numbered stats */
@@ -1660,6 +1687,9 @@ pgstat_write_statsfile(XLogRecPtr redo)
 
 		kind_info = pgstat_get_kind_info(ps->key.kind);
 
+		if (!kind_info->to_serialize)
+			continue;
+
 		/* if not dropped the valid-entry refcount should exist */
 		Assert(pg_atomic_read_u32(&ps->refcount) > 0);
 
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index cc2ffc78aa..a2bb11adc9 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -21,13 +21,6 @@
 #include "utils/pgstat_internal.h"
 
 
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
-
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
 
@@ -82,12 +75,17 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op)
 void
 pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint32 cnt)
 {
+	PgStat_PendingIO *entry_ref;
+
 	Assert((unsigned int) io_object < IOOBJECT_NUM_TYPES);
 	Assert((unsigned int) io_context < IOCONTEXT_NUM_TYPES);
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	entry_ref = pgstat_prep_per_backend_pending(MyProcNumber);
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
+	entry_ref->counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
 }
@@ -122,6 +120,10 @@ void
 pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 						instr_time start_time, uint32 cnt)
 {
+	PgStat_PendingIO *entry_ref;
+
+	entry_ref = pgstat_prep_per_backend_pending(MyProcNumber);
+
 	if (track_io_timing)
 	{
 		instr_time	io_time;
@@ -148,6 +150,8 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+		INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+					   io_time);
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
@@ -161,6 +165,13 @@ pgstat_fetch_stat_io(void)
 	return &pgStatLocal.snapshot.io;
 }
 
+PgStat_Backend *
+pgstat_fetch_my_stat_io(void)
+{
+	return (PgStat_Backend *)
+		pgstat_fetch_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid, MyProcNumber);
+}
+
 /*
  * Check if there any IO stats waiting for flush.
  */
@@ -171,12 +182,18 @@ pgstat_io_have_pending_cb(void)
 }
 
 /*
- * Simpler wrapper of pgstat_io_flush_cb()
+ * Simpler wrapper of pgstat_io_flush_cb() and pgstat_per_backend_flush_cb().
  */
 void
 pgstat_flush_io(bool nowait)
 {
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+									 MyProcNumber, false, NULL);
+
 	(void) pgstat_io_flush_cb(nowait);
+	(void) pgstat_per_backend_flush_cb(entry_ref, nowait);
 }
 
 /*
@@ -235,6 +252,49 @@ pgstat_io_flush_cb(bool nowait)
 	return false;
 }
 
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_PendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_PendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
 const char *
 pgstat_get_io_context_name(IOContext io_context)
 {
@@ -325,6 +385,38 @@ pgstat_io_snapshot_cb(void)
 	}
 }
 
+void
+pgstat_create_backend_stat(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+
+	/* set the backend type */
+	shstatent->stats.bktype = MyBackendType;
+}
+
+PgStat_PendingIO *
+pgstat_prep_per_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
 /*
 * IO statistics are not collected for all BackendTypes.
 *
@@ -503,3 +595,9 @@ pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 
 	return true;
 }
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index c1b7ff76b1..b871c1d7d0 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -813,13 +813,24 @@ pgstat_free_entry(PgStatShared_HashEntry *shent, dshash_seq_status *hstat)
  */
 static bool
 pgstat_drop_entry_internal(PgStatShared_HashEntry *shent,
-						   dshash_seq_status *hstat)
+						   dshash_seq_status *hstat, bool ignore_aux_proc)
 {
 	Assert(shent->body != InvalidDsaPointer);
 
 	/* should already have released local reference */
 	if (pgStatEntryRefHash)
-		Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key));
+	{
+		/*
+		 * The following assertion is not correct when coming from
+		 * pgstat_discard_stats() and for per backend stats linked to an
+		 * auxiliary process.
+		 */
+		if (ignore_aux_proc && shent->key.kind == PGSTAT_KIND_PER_BACKEND &&
+			MyProcNumber >= MaxBackends)
+			return false;
+		else
+			Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key));
+	}
 
 	/*
 	 * Signal that the entry is dropped - this will eventually cause other
@@ -882,7 +893,7 @@ pgstat_drop_database_and_contents(Oid dboid)
 		if (p->key.dboid != dboid)
 			continue;
 
-		if (!pgstat_drop_entry_internal(p, &hstat))
+		if (!pgstat_drop_entry_internal(p, &hstat, false))
 		{
 			/*
 			 * Even statistics for a dropped database might currently be
@@ -941,7 +952,7 @@ pgstat_drop_entry(PgStat_Kind kind, Oid dboid, uint64 objid)
 	shent = dshash_find(pgStatLocal.shared_hash, &key, true);
 	if (shent)
 	{
-		freed = pgstat_drop_entry_internal(shent, NULL);
+		freed = pgstat_drop_entry_internal(shent, NULL, false);
 
 		/*
 		 * Database stats contain other stats. Drop those as well when
@@ -969,7 +980,7 @@ pgstat_drop_all_entries(void)
 		if (ps->dropped)
 			continue;
 
-		if (!pgstat_drop_entry_internal(ps, &hstat))
+		if (!pgstat_drop_entry_internal(ps, &hstat, true))
 			not_freed_count++;
 	}
 	dshash_seq_term(&hstat);
@@ -982,11 +993,20 @@ static void
 shared_stat_reset_contents(PgStat_Kind kind, PgStatShared_Common *header,
 						   TimestampTz ts)
 {
+	BackendType bktype;
 	const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
+	/* save the bktype */
+	if (kind == PGSTAT_KIND_PER_BACKEND)
+		bktype = ((PgStatShared_Backend *) header)->stats.bktype;
+
 	memset(pgstat_get_entry_data(kind, header), 0,
 		   pgstat_get_entry_len(kind));
 
+	/* restore the bktype */
+	if (kind == PGSTAT_KIND_PER_BACKEND)
+		((PgStatShared_Backend *) header)->stats.bktype = bktype;
+
 	if (kind_info->reset_timestamp_cb)
 		kind_info->reset_timestamp_cb(header, ts);
 }
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index f7b50e0b5a..fd16986156 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,6 +1474,111 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_my_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend *backend_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backend_stats = pgstat_fetch_my_stat_io();
+
+	bktype = backend_stats->bktype;
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1722,6 +1827,7 @@ pg_stat_reset_shared(PG_FUNCTION_ARGS)
 		pgstat_reset_of_kind(PGSTAT_KIND_BGWRITER);
 		pgstat_reset_of_kind(PGSTAT_KIND_CHECKPOINTER);
 		pgstat_reset_of_kind(PGSTAT_KIND_IO);
+		pgstat_reset_of_kind(PGSTAT_KIND_PER_BACKEND);
 		XLogPrefetchResetStats();
 		pgstat_reset_of_kind(PGSTAT_KIND_SLRU);
 		pgstat_reset_of_kind(PGSTAT_KIND_WAL);
@@ -1779,6 +1885,24 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+Datum
+pg_stat_reset_single_backend_io_counters(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/* Maybe an auxiliary process? */
+	if (proc == NULL)
+		proc = AuxiliaryPidGetProc(backend_pid);
+
+	if (proc)
+		pgstat_reset(PGSTAT_KIND_PER_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f23321a41f..89eb89efe4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5903,6 +5903,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: my backend IO statistics',
+  proname => 'pg_stat_get_my_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => '',
+  proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_my_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6042,6 +6051,11 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '9987',
+  descr => 'statistics: reset collected IO statistics for a single backend',
+  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_single_backend_io_counters' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index df53fa2d4f..13b9c71759 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_PER_BACKEND	6
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -347,12 +348,24 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_PendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_PendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	BackendType bktype;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -562,6 +575,7 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time, uint32 cnt);
 
 extern PgStat_IO *pgstat_fetch_stat_io(void);
+extern PgStat_Backend *pgstat_fetch_my_stat_io(void);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
@@ -570,6 +584,7 @@ extern bool pgstat_tracks_io_object(BackendType bktype,
 									IOObject io_object, IOContext io_context);
 extern bool pgstat_tracks_io_op(BackendType bktype, IOObject io_object,
 								IOContext io_context, IOOp io_op);
+extern void pgstat_create_backend_stat(ProcNumber procnum);
 
 
 /*
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 61b2e1f96b..44a00d4779 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -194,6 +194,11 @@ typedef struct PgStat_KindInfo
 	 */
 	bool		accessed_across_databases:1;
 
+	/*
+	 * Do serialize or not this kind of stats.
+	 */
+	bool		to_serialize:1;
+
 	/*
 	 * The size of an entry in the shared stats hash table (pointed to by
 	 * PgStatShared_HashEntry->body).  For fixed-numbered statistics, this is
@@ -428,6 +433,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -630,9 +640,12 @@ extern void pgstat_flush_io(bool nowait);
 
 extern bool pgstat_io_have_pending_cb(void);
 extern bool pgstat_io_flush_cb(bool nowait);
+extern bool pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 extern void pgstat_io_init_shmem_cb(void *stats);
 extern void pgstat_io_reset_all_cb(TimestampTz ts);
 extern void pgstat_io_snapshot_cb(void);
+extern PgStat_PendingIO *pgstat_prep_per_backend_pending(ProcNumber procnum);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 
 /*
diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index d89d055913..ca4b6b6e9f 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -39,6 +39,7 @@ static bool injection_stats_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 static const PgStat_KindInfo injection_stats = {
 	.name = "injection_points",
 	.fixed_amount = false,		/* Bounded by the number of points */
+	.to_serialize = true,
 
 	/* Injection points are system-wide */
 	.accessed_across_databases = true,
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2b47013f11..f92516b047 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1398,6 +1398,25 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_my_stat_io| SELECT backend_type,
+    object,
+    context,
+    reads,
+    read_time,
+    writes,
+    write_time,
+    writebacks,
+    writeback_time,
+    extends,
+    extend_time,
+    op_bytes,
+    hits,
+    evictions,
+    reuses,
+    fsyncs,
+    fsync_time,
+    stats_reset
+   FROM pg_stat_get_my_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..846e693483 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my_]stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1261,9 +1261,14 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1285,16 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1314,23 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1551,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1567,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_single_backend_io_counters() does
+SELECT pg_stat_reset_single_backend_io_counters(pg_backend_pid());
+ pg_stat_reset_single_backend_io_counters 
+------------------------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..9cb14b7182 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my_]stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -609,18 +609,26 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- extends.
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -631,6 +639,14 @@ SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -762,10 +778,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_single_backend_io_counters() does
+SELECT pg_stat_reset_single_backend_io_counters(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
 
 -- test BRIN index doesn't block HOT update
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1847bbfa95..4188d056ca 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2117,6 +2117,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2132,6 +2133,7 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.34.1

v5-0002-Merge-both-IO-stats-flush-callbacks.patchtext/x-diff; charset=us-asciiDownload
From edd45c3aa0731a9645dbcfe0475e40c642e92269 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 6 Nov 2024 16:19:44 +0000
Subject: [PATCH v5 2/4] Merge both IO stats flush callbacks

There is no need to keep both callbacks.

Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.
---
 src/backend/utils/activity/pgstat.c    |  2 -
 src/backend/utils/activity/pgstat_io.c | 97 ++++++--------------------
 2 files changed, 23 insertions(+), 76 deletions(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index aacf61c9a4..d051db5c10 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -435,8 +435,6 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_IO, stats),
 		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
 
-		.flush_fixed_cb = pgstat_io_flush_cb,
-		.have_fixed_pending_cb = pgstat_io_have_pending_cb,
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
 		.snapshot_cb = pgstat_io_snapshot_cb,
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index a2bb11adc9..1384f8103e 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -21,10 +21,6 @@
 #include "utils/pgstat_internal.h"
 
 
-static PgStat_PendingIO PendingIOStats;
-static bool have_iostats = false;
-
-
 /*
  * Check that stats have not been counted for any combination of IOObject,
  * IOContext, and IOOp which are not tracked for the passed-in BackendType. If
@@ -83,11 +79,7 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
 	entry_ref = pgstat_prep_per_backend_pending(MyProcNumber);
-
-	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 	entry_ref->counts[io_object][io_context][io_op] += cnt;
-
-	have_iostats = true;
 }
 
 /*
@@ -148,8 +140,6 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 				INSTR_TIME_ADD(pgBufferUsage.local_blk_read_time, io_time);
 		}
 
-		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
-					   io_time);
 		INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
 					   io_time);
 	}
@@ -173,16 +163,7 @@ pgstat_fetch_my_stat_io(void)
 }
 
 /*
- * Check if there any IO stats waiting for flush.
- */
-bool
-pgstat_io_have_pending_cb(void)
-{
-	return have_iostats;
-}
-
-/*
- * Simpler wrapper of pgstat_io_flush_cb() and pgstat_per_backend_flush_cb().
+ * Simpler wrapper of pgstat_per_backend_flush_cb().
  */
 void
 pgstat_flush_io(bool nowait)
@@ -192,81 +173,39 @@ pgstat_flush_io(bool nowait)
 	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_PER_BACKEND, InvalidOid,
 									 MyProcNumber, false, NULL);
 
-	(void) pgstat_io_flush_cb(nowait);
 	(void) pgstat_per_backend_flush_cb(entry_ref, nowait);
 }
 
 /*
- * Flush out locally pending IO statistics
+ * Flush out locally pending backend statistics
  *
  * If no stats have been recorded, this function returns false.
- *
- * If nowait is true, this function returns true if the lock could not be
- * acquired. Otherwise, return false.
  */
 bool
-pgstat_io_flush_cb(bool nowait)
+pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 {
-	LWLock	   *bktype_lock;
+	PgStatShared_Backend *shbackendioent;
+	PgStat_PendingIO *pendingent;
 	PgStat_BktypeIO *bktype_shstats;
+	LWLock	   *bktype_lock;
+	PgStat_BktypeIO *bktype_global_shstats;
 
-	if (!have_iostats)
+	if (!pgstat_lock_entry(entry_ref, nowait))
 		return false;
 
+	/* global IO stats */
 	bktype_lock = &pgStatLocal.shmem->io.locks[MyBackendType];
-	bktype_shstats =
-		&pgStatLocal.shmem->io.stats.stats[MyBackendType];
+	bktype_global_shstats = &pgStatLocal.shmem->io.stats.stats[MyBackendType];
 
 	if (!nowait)
 		LWLockAcquire(bktype_lock, LW_EXCLUSIVE);
 	else if (!LWLockConditionalAcquire(bktype_lock, LW_EXCLUSIVE))
-		return true;
-
-	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
-		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
-		{
-			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
-			{
-				instr_time	time;
-
-				bktype_shstats->counts[io_object][io_context][io_op] +=
-					PendingIOStats.counts[io_object][io_context][io_op];
-
-				time = PendingIOStats.pending_times[io_object][io_context][io_op];
-
-				bktype_shstats->times[io_object][io_context][io_op] +=
-					INSTR_TIME_GET_MICROSEC(time);
-			}
-		}
-	}
-
-	Assert(pgstat_bktype_io_stats_valid(bktype_shstats, MyBackendType));
-
-	LWLockRelease(bktype_lock);
-
-	memset(&PendingIOStats, 0, sizeof(PendingIOStats));
-
-	have_iostats = false;
-
-	return false;
-}
-
-/*
- * Flush out locally pending backend statistics
- *
- * If no stats have been recorded, this function returns false.
- */
-bool
-pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
-{
-	PgStatShared_Backend *shbackendioent;
-	PgStat_PendingIO *pendingent;
-	PgStat_BktypeIO *bktype_shstats;
-
-	if (!pgstat_lock_entry(entry_ref, nowait))
+		pgstat_unlock_entry(entry_ref);
 		return false;
+	}
 
+	/* per backend IO stats */
 	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
 	bktype_shstats = &shbackendioent->stats.stats;
 	pendingent = (PgStat_PendingIO *) entry_ref->pending;
@@ -279,6 +218,7 @@ pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 			{
 				instr_time	time;
 
+				/* per backend IO stats */
 				bktype_shstats->counts[io_object][io_context][io_op] +=
 					pendingent->counts[io_object][io_context][io_op];
 
@@ -286,10 +226,19 @@ pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
+
+				/* global IO stats */
+				bktype_global_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				bktype_global_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
 
+	LWLockRelease(bktype_lock);
+
 	pgstat_unlock_entry(entry_ref);
 
 	return true;
-- 
2.34.1

v5-0003-Don-t-include-other-backend-s-stats-in-the-snapsh.patchtext/x-diff; charset=us-asciiDownload
From 2d710a44af4d2758a0f33703893712693dfaf0af Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 7 Nov 2024 12:10:37 +0000
Subject: [PATCH v5 3/4] Don't include other backend's stats in the snapshot

When stats_fetch_consistency is set to 'snapshot', don't include other backend's
stats in the snapshot. There is no use case, so save memory usage.
---
 src/backend/utils/activity/pgstat.c | 4 ++++
 1 file changed, 4 insertions(+)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index d051db5c10..4982dec8a9 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1181,6 +1181,10 @@ pgstat_build_snapshot(void)
 			!kind_info->accessed_across_databases)
 			continue;
 
+		/* there is no need to include other backend's stats */
+		if (kind == PGSTAT_KIND_PER_BACKEND && p->key.objid != MyProcNumber)
+			continue;
+
 		if (p->dropped)
 			continue;
 
-- 
2.34.1

v5-0004-Add-pg_stat_get_backend_io.patchtext/x-diff; charset=us-asciiDownload
From 2bf372f079e9cbecd82ce431622f7e683f659596 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 7 Nov 2024 14:27:05 +0000
Subject: [PATCH v5 4/4] Add pg_stat_get_backend_io()

Adding the pg_stat_get_backend_io() function to retrieve I/O statistics for
a particular backend pid.

Note that this function does not return any rows if stats_fetch_consistency
is set to 'snapshot' and the pid of interest is not our own pid (there is no use
case of retrieving other backend's stats with stats_fetch_consistency set to
'snapshot').
---
 doc/src/sgml/monitoring.sgml           | 17 ++++++++
 src/backend/catalog/system_views.sql   |  2 +-
 src/backend/utils/activity/pgstat_io.c | 18 ++++++++
 src/backend/utils/adt/pgstatfuncs.c    | 27 +++++++++++-
 src/include/catalog/pg_proc.dat        | 14 +++----
 src/include/pgstat.h                   |  1 +
 src/test/regress/expected/rules.out    |  2 +-
 src/test/regress/expected/stats.out    | 57 ++++++++++++++++++++++++++
 src/test/regress/sql/stats.sql         | 33 +++++++++++++++
 9 files changed, 160 insertions(+), 11 deletions(-)
  10.4% doc/src/sgml/
   5.8% src/backend/utils/activity/
   8.6% src/backend/utils/adt/
  16.0% src/include/catalog/
  32.9% src/test/regress/expected/
  24.1% src/test/regress/sql/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index fc6aded3da..a6672548b8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4814,6 +4814,23 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. This function does not return any rows if <varname>stats_fetch_consistency</varname>
+        is set to <literal>snapshot</literal> and the process ID is not our own.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 09af4a40a8..6f17f505f1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1190,7 +1190,7 @@ SELECT
        b.fsyncs,
        b.fsync_time,
        b.stats_reset
-FROM pg_stat_get_my_io() b;
+FROM pg_stat_get_backend_io(NULL) b;
 
 CREATE VIEW pg_stat_wal AS
     SELECT
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 1384f8103e..1063c6bec3 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -162,6 +162,24 @@ pgstat_fetch_my_stat_io(void)
 		pgstat_fetch_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid, MyProcNumber);
 }
 
+/*
+ * Returns other backend's IO stats or NULL if pgstat_fetch_consistency is set
+ * to 'snapshot'.
+ */
+PgStat_Backend *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	if (pgstat_fetch_consistency == PGSTAT_FETCH_CONSISTENCY_SNAPSHOT)
+		return NULL;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_PER_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
 /*
  * Simpler wrapper of pgstat_per_backend_flush_cb().
  */
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index fd16986156..e2041be237 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1475,7 +1475,7 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 }
 
 Datum
-pg_stat_get_my_io(PG_FUNCTION_ARGS)
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 {
 	ReturnSetInfo *rsinfo;
 	PgStat_Backend *backend_stats;
@@ -1487,7 +1487,30 @@ pg_stat_get_my_io(PG_FUNCTION_ARGS)
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 
-	backend_stats = pgstat_fetch_my_stat_io();
+	if (PG_ARGISNULL(0) || PG_GETARG_INT32(0) == MyProcPid)
+		backend_stats = pgstat_fetch_my_stat_io();
+	else
+	{
+		PGPROC	   *proc;
+		ProcNumber	procNumber;
+		int			pid = PG_GETARG_INT32(0);
+
+		proc = BackendPidGetProc(pid);
+
+		/* maybe an auxiliary process? */
+		if (proc == NULL)
+			proc = AuxiliaryPidGetProc(pid);
+
+		if (proc != NULL)
+		{
+			procNumber = GetNumberFromPGProc(proc);
+			backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+			if (!backend_stats)
+				return (Datum) 0;
+		}
+		else
+			return (Datum) 0;
+	}
 
 	bktype = backend_stats->bktype;
 	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 89eb89efe4..fafee5a947 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5903,14 +5903,14 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
-{ oid => '8806', descr => 'statistics: my backend IO statistics',
-  proname => 'pg_stat_get_my_io', prorows => '5', proretset => 't',
+{ oid => '8806', descr => 'statistics: per backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
   provolatile => 'v', proparallel => 'r', prorettype => 'record',
-  proargtypes => '',
-  proallargtypes => '{text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
-  prosrc => 'pg_stat_get_my_io' },
+  proargtypes => 'int4', proisstrict => 'f',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
 
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 13b9c71759..54c9f8b45c 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -576,6 +576,7 @@ extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 
 extern PgStat_IO *pgstat_fetch_stat_io(void);
 extern PgStat_Backend *pgstat_fetch_my_stat_io(void);
+extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
 extern const char *pgstat_get_io_object_name(IOObject io_object);
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index f92516b047..3cc0cfbcb5 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1416,7 +1416,7 @@ pg_my_stat_io| SELECT backend_type,
     fsyncs,
     fsync_time,
     stats_reset
-   FROM pg_stat_get_my_io() b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
+   FROM pg_stat_get_backend_io(NULL::integer) b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 846e693483..846ffb59f2 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1263,12 +1263,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1293,6 +1299,15 @@ SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
@@ -1331,6 +1346,48 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+-- Check the stats_fetch_consistency behavior on per backend I/O stats
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush 
+--------------------------
+ 
+(1 row)
+
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends = :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends = :backend_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Don't return any rows if querying other backend's stats with snapshot set
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ ?column? 
+----------
+ t
+(1 row)
+
+ROLLBACK;
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 9cb14b7182..e597c4d597 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -611,12 +611,18 @@ SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(extends) AS my_io_sum_shared_before_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_my_stat_io
   WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -626,6 +632,10 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
 SELECT sum(extends) AS my_io_sum_shared_after_extends
   FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (not for the backend).
@@ -647,6 +657,29 @@ SELECT current_setting('fsync') = 'off'
   OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
       AND :my_io_sum_shared_after_fsyncs= 0);
 
+-- Check the stats_fetch_consistency behavior on per backend I/O stats
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
+BEGIN;
+SET LOCAL stats_fetch_consistency = snapshot;
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
+SELECT pg_stat_force_next_flush();
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends = :my_io_sum_shared_before_extends;
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends = :backend_io_sum_shared_before_extends;
+-- Don't return any rows if querying other backend's stats with snapshot set
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ROLLBACK;
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
-- 
2.34.1

#31Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#30)
Re: per backend I/O statistics

On Fri, Nov 08, 2024 at 02:09:30PM +0000, Bertrand Drouvot wrote:

==== 0001 (the largest one)

Introduces a new statistics KIND. It is named PGSTAT_KIND_PER_BACKEND as it could
be used in the future to store other statistics (than the I/O ones) per backend.
The new KIND is a variable-numbered one and has an automatic cap on the
maximum number of entries (as its hash key contains the proc number).

There is no need to serialize the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so a
new "to_serialize" field is added in the PgStat_KindInfo struct.

It adds a new pg_my_stat_io view to display "my" backend I/O statistics.

It also adds a new function pg_stat_reset_single_backend_io_counters() to be
able to reset the I/O stats for a given backend pid.

It contains doc updates and dedicated tests.

A few remarks:

1. one assertion in pgstat_drop_entry_internal() is not necessary true anymore
with this new stat kind. So, adding an extra bool as parameter to take care of it.

Why? I have to admit that the addition of this argument at this level
specific to a new stats kind feels really weird and it looks like a
layer violation, while this happens only when resetting the whole
pgstats state because we have loaded a portion of the stats but the
file loaded was in such a state that we don't have a consistent
picture.

2. a new struct "PgStat_Backend" is created and does contain the backend type.
The backend type is used for filtering purpose when the stats are displayed.

Is that necessary? We can guess that from the PID with a join in
pg_stat_activity. pg_stat_get_backend_io() from 0004 returns a set of
tuples for a specific PID, as well..

+	/* save the bktype */
+	if (kind == PGSTAT_KIND_PER_BACKEND)
+		bktype = ((PgStatShared_Backend *) header)->stats.bktype;
[...]
+	/* restore the bktype */
+	if (kind == PGSTAT_KIND_PER_BACKEND)
+		((PgStatShared_Backend *) header)->stats.bktype = bktype;

Including the backend type results in these blips in
shared_stat_reset_contents() which should not have anything related
to stats kinds and should remain neutral, as well.

pgstat_prep_per_backend_pending() is used only in pgstat_io.c, so I
guess that it should be static in this file rather than declared in
pgstat.h?

+typedef struct PgStat_PendingIO

Perhaps this part should use a separate structure named
"BackendPendingIO"? The definition of the structure has to be in
pgstat.h as this is the pending_size of the new stats kind. It looks
like it would be cleaner to keep PgStat_PendingIO local to
pgstat_io.c, and define PgStat_PendingIO based on
PgStat_BackendPendingIO?

+	/*
+	 * Do serialize or not this kind of stats.
+	 */
+	bool		to_serialize:1;

Not sure that "serialize" is the best term that applies here. For
pgstats entries, serialization refers to the matter of writing their
entries with a "serialized" name because they have an undefined number
when stored locally after a reload. I'd suggest to split this concept
into its own patch, rename the flag as "write_to_file" (or what you
think is suited), and also apply the flag in the fixed-numbered loop
done in pgstat_write_statsfile() before going through the dshash.

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_single_backend_io_counters</primary>
+        </indexterm>
+        <function>pg_stat_reset_single_backend_io_counters</function> ( <type>int4</type> )
+        <returnvalue>void</returnvalue>

This should document that the input argument is a PID.

Is pg_my_stat_io() the best name ever? I'd suggest to just merge 0004
with 0001.

Structurally, it may be cleaner to keep all the callbacks and the
backend-I/O specific logic into a separate file, perhaps
pgstat_io_backend.c or pgstat_backend_io?

==== 0002

Merge both IO stats flush callback. There is no need to keep both callbacks.

Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.

Not sure to be a fan of that, TBH, still I get the performance
argument of the matter. Each stats kind has its own flush path, and
this assumes that we'll never going to diverge in terms of the
information maintained for each backend and the existing pg_stat_io.
Perhaps there could be a divergence at some point?

=== Remarks

R1. The key field is still "objid" even if it stores the proc number. I think
that's fine and there is no need to rename it as it's related comment states:

Let's not change the object ID and the hash key. The approach you are
using here with a variable-numbered stats kinds and a lookup with the
procnumber is sensible here.

R2. pg_stat_get_io() and pg_stat_get_backend_io() could probably be merged (
duplicated code). That's not mandatory to provide the new per backend I/O stats
feature. So let's keep it as future work to ease the review.

Not sure about that.

R3. There is no use case of retrieving other backend's IO stats with
stats_fetch_consistency set to 'snapshot'. Avoiding this behavior allows us to
save memory (that could be non negligeable as pgstat_build_snapshot() would add
_all_ the other backend's stats in the snapshot). I choose to document it and to
not return any rows: I think that's better than returning some data (that would
not "ensure" the snapshot consistency) or an error.

I'm pretty sure that there is a sensible argument about being able to
get a snapshot of the I/O stat of another backend and consider that a
feature. For example, keeping a look at the activity of the
checkpointer while doing a scan at a certain point in time? With a
GUC, we could have both behaviors and control which one we want.
--
Michael

#32Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#31)
Re: per backend I/O statistics

Hi,

On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote:

On Fri, Nov 08, 2024 at 02:09:30PM +0000, Bertrand Drouvot wrote:

1. one assertion in pgstat_drop_entry_internal() is not necessary true anymore
with this new stat kind. So, adding an extra bool as parameter to take care of it.

Why? I have to admit that the addition of this argument at this level
specific to a new stats kind feels really weird and it looks like a
layer violation, while this happens only when resetting the whole
pgstats state because we have loaded a portion of the stats but the
file loaded was in such a state that we don't have a consistent
picture.

Yes. I agree that looks weird and I don't like it that much. But the assertion
is not true anymore. If you:

- change the arguments to false in the pgstat_drop_entry_internal() call in
pgstat_drop_all_entries()
- start the engine
- kill -9 postgres
- restart the engine

You'll see the assert failing due to the startup process. I don't think it is
doing something wrong though, just populating its backend stats. Do you have
another view on it?

2. a new struct "PgStat_Backend" is created and does contain the backend type.
The backend type is used for filtering purpose when the stats are displayed.

Is that necessary? We can guess that from the PID with a join in
pg_stat_activity. pg_stat_get_backend_io() from 0004 returns a set of
tuples for a specific PID, as well..

How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"?

Do you mean copy/paste part of the code from pg_stat_get_activity() into
pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
I'll have a look at it.

+	/* save the bktype */
+	if (kind == PGSTAT_KIND_PER_BACKEND)
+		bktype = ((PgStatShared_Backend *) header)->stats.bktype;
[...]
+	/* restore the bktype */
+	if (kind == PGSTAT_KIND_PER_BACKEND)
+		((PgStatShared_Backend *) header)->stats.bktype = bktype;

Including the backend type results in these blips in
shared_stat_reset_contents() which should not have anything related
to stats kinds and should remain neutral, as well.

Yeah, we can simply get rid of it if we remove the backend type in PgStat_Backend.

pgstat_prep_per_backend_pending() is used only in pgstat_io.c, so I
guess that it should be static in this file rather than declared in
pgstat.h?

Good catch, thanks!

+typedef struct PgStat_PendingIO

Perhaps this part should use a separate structure named
"BackendPendingIO"? The definition of the structure has to be in
pgstat.h as this is the pending_size of the new stats kind. It looks
like it would be cleaner to keep PgStat_PendingIO local to
pgstat_io.c, and define PgStat_PendingIO based on
PgStat_BackendPendingIO?

I see what you meean, what about simply "PgStat_BackendPending" in pgstat.h?

+	/*
+	 * Do serialize or not this kind of stats.
+	 */
+	bool		to_serialize:1;

Not sure that "serialize" is the best term that applies here. For
pgstats entries, serialization refers to the matter of writing their
entries with a "serialized" name because they have an undefined number
when stored locally after a reload. I'd suggest to split this concept
into its own patch, rename the flag as "write_to_file" (or what you
think is suited), and also apply the flag in the fixed-numbered loop
done in pgstat_write_statsfile() before going through the dshash.

Makes sense to create its own patch, will have a look.

+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_single_backend_io_counters</primary>
+        </indexterm>
+        <function>pg_stat_reset_single_backend_io_counters</function> ( <type>int4</type> )
+        <returnvalue>void</returnvalue>

This should document that the input argument is a PID.

Yeap, will add.

Is pg_my_stat_io() the best name ever?

Not 100% sure, but there is already "pg_my_temp_schema" so I thought that
pg_my_stat_io would not be that bad. Open to suggestions though.

I'd suggest to just merge 0004 with 0001.

Sure.

Structurally, it may be cleaner to keep all the callbacks and the
backend-I/O specific logic into a separate file, perhaps
pgstat_io_backend.c or pgstat_backend_io?

Yeah, I was wondering the same. What about pgstat_backend.c (that would contain
only one "section" dedicated to IO currently)?

==== 0002

Merge both IO stats flush callback. There is no need to keep both callbacks.

Merging both allows to save O(N^3) while looping on IOOBJECT_NUM_TYPES,
IOCONTEXT_NUM_TYPES and IOCONTEXT_NUM_TYPES.

Not sure to be a fan of that, TBH, still I get the performance
argument of the matter. Each stats kind has its own flush path, and
this assumes that we'll never going to diverge in terms of the
information maintained for each backend and the existing pg_stat_io.
Perhaps there could be a divergence at some point?

Yeah perhaps, so in case of divergence let's split "again"? I mean for the moment
I don't see any reason to keep both and we have pros related to efficiency during
flush.

=== Remarks

R1. The key field is still "objid" even if it stores the proc number. I think
that's fine and there is no need to rename it as it's related comment states:

Let's not change the object ID and the hash key. The approach you are
using here with a variable-numbered stats kinds and a lookup with the
procnumber is sensible here.

Agree, thanks for confirming.

R2. pg_stat_get_io() and pg_stat_get_backend_io() could probably be merged (
duplicated code). That's not mandatory to provide the new per backend I/O stats
feature. So let's keep it as future work to ease the review.

Not sure about that.

I don't have a strong opinion about this one, so I'm ok to not merge those.

R3. There is no use case of retrieving other backend's IO stats with
stats_fetch_consistency set to 'snapshot'. Avoiding this behavior allows us to
save memory (that could be non negligeable as pgstat_build_snapshot() would add
_all_ the other backend's stats in the snapshot). I choose to document it and to
not return any rows: I think that's better than returning some data (that would
not "ensure" the snapshot consistency) or an error.

I'm pretty sure that there is a sensible argument about being able to
get a snapshot of the I/O stat of another backend and consider that a
feature. For example, keeping a look at the activity of the
checkpointer while doing a scan at a certain point in time?

hm, not sure how the stats_fetch_consistency set to 'snapshot' could help here:

- the checkpointer could write stuff for reason completly unrelated to "your"
backend

- if the idea is to 1. record checkpointer stats, 2. do stuff in the backend and
3. check again the checkpointer stats then I don't think setting stats_fetch_consistency
to snapshot is needed at all. In fact it's quite the opposite as 1. and 3. would
report the same values with stats_fetch_consistency set to 'snapshot' (if done
in the same transaction).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#33Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#32)
Re: per backend I/O statistics

On Thu, Nov 14, 2024 at 01:30:11PM +0000, Bertrand Drouvot wrote:

- change the arguments to false in the pgstat_drop_entry_internal() call in
pgstat_drop_all_entries()
- start the engine
- kill -9 postgres
- restart the engine

You'll see the assert failing due to the startup process. I don't think it is
doing something wrong though, just populating its backend stats. Do you have
another view on it?

It feels sad that we have to plug in the internals at this level for
this particular case. Perhaps there is something to do with more
callbacks. Or perhaps there is just no point in tracking the stats of
auxiliary processes because we already have this data in the existing
pg_stat_io already?

How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"?

Do you mean copy/paste part of the code from pg_stat_get_activity() into
pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
I'll have a look at it.

I was under the impression that we should keep PgStat_Backend for the
reset_timestamp part, but drop BackendType as it can be guessed from
pg_stat_activity itself. For example a LATERAL to grab the current
live stats of these backends.

+typedef struct PgStat_PendingIO

Perhaps this part should use a separate structure named
"BackendPendingIO"? The definition of the structure has to be in
pgstat.h as this is the pending_size of the new stats kind. It looks
like it would be cleaner to keep PgStat_PendingIO local to
pgstat_io.c, and define PgStat_PendingIO based on
PgStat_BackendPendingIO?

I see what you meean, what about simply "PgStat_BackendPending" in pgstat.h?

That should be OK.

Structurally, it may be cleaner to keep all the callbacks and the
backend-I/O specific logic into a separate file, perhaps
pgstat_io_backend.c or pgstat_backend_io?

Yeah, I was wondering the same. What about pgstat_backend.c (that would contain
only one "section" dedicated to IO currently)?

pgstat_backend.c looks good to me. Could there be other stats than
just IO, actually? Perhaps not, just asking..

- if the idea is to 1. record checkpointer stats, 2. do stuff in the backend and
3. check again the checkpointer stats then I don't think setting stats_fetch_consistency
to snapshot is needed at all. In fact it's quite the opposite as 1. and 3. would
report the same values with stats_fetch_consistency set to 'snapshot' (if done
in the same transaction).

Hmm. Not sure. It looks like you're right to treat this in a
separate patch as it would exist for different reasons than the
original proposal.
--
Michael

#34Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#33)
Re: per backend I/O statistics

Hi,

On Tue, Nov 19, 2024 at 10:47:53AM +0900, Michael Paquier wrote:

On Thu, Nov 14, 2024 at 01:30:11PM +0000, Bertrand Drouvot wrote:

- change the arguments to false in the pgstat_drop_entry_internal() call in
pgstat_drop_all_entries()
- start the engine
- kill -9 postgres
- restart the engine

You'll see the assert failing due to the startup process. I don't think it is
doing something wrong though, just populating its backend stats. Do you have
another view on it?

It feels sad that we have to plug in the internals at this level for
this particular case. Perhaps there is something to do with more
callbacks. Or perhaps there is just no point in tracking the stats of
auxiliary processes because we already have this data in the existing
pg_stat_io already?

We can not discard the "per backend" stats collection for auxiliary processes
because those are the source for pg_stat_io too (once the 2 flush callbacks are
merged).

I have another idea: after a bit more of investigation it turns out that
only the startup process is generating the failed assertion.

So, for the startup process only, what about?

- don't call pgstat_create_backend_stat() in pgstat_beinit()...
- but call it in StartupXLOG() instead (after the stats are discarded or restored).

That way we could get rid of the changes linked to the assertion and still handle
the startup process particular case. Thoughts?

How would that work for someone doing "select * from pg_stat_get_backend_io(<pid>)"?

Do you mean copy/paste part of the code from pg_stat_get_activity() into
pg_stat_get_backend_io() to get the backend type? That sounds like an idea,
I'll have a look at it.

I was under the impression that we should keep PgStat_Backend for the
reset_timestamp part, but drop BackendType as it can be guessed from
pg_stat_activity itself.

Removing BackendType sounds doable, I'll look at it.

Structurally, it may be cleaner to keep all the callbacks and the
backend-I/O specific logic into a separate file, perhaps
pgstat_io_backend.c or pgstat_backend_io?

Yeah, I was wondering the same. What about pgstat_backend.c (that would contain
only one "section" dedicated to IO currently)?

pgstat_backend.c looks good to me. Could there be other stats than
just IO, actually? Perhaps not, just asking..

Yeah that's the reason why I suggested "pgstat_backend.c". I would not be
surprised if we add more "per backend" stats (that are not I/O related) in the
future as the main machinery would be there.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#35Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#34)
Re: per backend I/O statistics

On Tue, Nov 19, 2024 at 04:28:55PM +0000, Bertrand Drouvot wrote:

So, for the startup process only, what about?

- don't call pgstat_create_backend_stat() in pgstat_beinit()...
- but call it in StartupXLOG() instead (after the stats are discarded or restored).

That way we could get rid of the changes linked to the assertion and still handle
the startup process particular case. Thoughts?

Hmm. That may prove to be a good idea in the long-term. The startup
process is a specific path kicked in at a very early stage, so it is
also a bit weird that we'd try to insert statistics while we are going
to reset them anyway a bit later. That may also be relevant for
custom statistics, actually, especially if some calls happen in some
hook paths taken before the reset is done. This could happen for
injection point stats when loading it with shared_preload_libraries,
actually, if you load, attach or run a callback at this early stage.
The existing sanity checks are a safety net that we should not change
or adapt for specific stats kinds conditions, IMO. Perhaps this had
better be done as a patch of its own, on top of the rest.

Yeah that's the reason why I suggested "pgstat_backend.c". I would not be
surprised if we add more "per backend" stats (that are not I/O related) in the
future as the main machinery would be there.

Okay.
--
Michael

#36Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#31)
Re: per backend I/O statistics

Hi,

On Thu, Nov 14, 2024 at 03:31:51PM +0900, Michael Paquier wrote:

+	/*
+	 * Do serialize or not this kind of stats.
+	 */
+	bool		to_serialize:1;

Not sure that "serialize" is the best term that applies here. For
pgstats entries, serialization refers to the matter of writing their
entries with a "serialized" name because they have an undefined number
when stored locally after a reload. I'd suggest to split this concept
into its own patch, rename the flag as "write_to_file"

Yeah, also this could useful for custom statistics. So I created a dedicated
thread and a patch proposal (see [1]/messages/by-id/Zz3skBqzBncSFIuY@ip-10-97-1-34.eu-west-3.compute.internal).

[1]: /messages/by-id/Zz3skBqzBncSFIuY@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#37Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#35)
Re: per backend I/O statistics

Hi,

On Wed, Nov 20, 2024 at 09:01:26AM +0900, Michael Paquier wrote:

On Tue, Nov 19, 2024 at 04:28:55PM +0000, Bertrand Drouvot wrote:

So, for the startup process only, what about?

- don't call pgstat_create_backend_stat() in pgstat_beinit()...
- but call it in StartupXLOG() instead (after the stats are discarded or restored).

That way we could get rid of the changes linked to the assertion and still handle
the startup process particular case. Thoughts?

Hmm. That may prove to be a good idea in the long-term. The startup
process is a specific path kicked in at a very early stage, so it is
also a bit weird that we'd try to insert statistics while we are going
to reset them anyway a bit later.

Exactly.

That may also be relevant for
custom statistics, actually, especially if some calls happen in some
hook paths taken before the reset is done. This could happen for
injection point stats when loading it with shared_preload_libraries,
actually, if you load, attach or run a callback at this early stage.

Right. I did not had in mind to go that far here (for the per backend stats
needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
is related to per backend stats only) call at the right place in StartupXLOG()
for the startup process only. As that's the startup process that will reset
or restore the stats I think that makes sense.

It looks like that what you have in mind is much more generic, what about:

- Focus on this thread first and then move the call as proposed above
- Think about a more generic idea later on (on the per-backend I/O stats is
in).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#38Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#37)
Re: per backend I/O statistics

On Wed, Nov 20, 2024 at 02:20:18PM +0000, Bertrand Drouvot wrote:

Right. I did not had in mind to go that far here (for the per backend stats
needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
is related to per backend stats only) call at the right place in StartupXLOG()
for the startup process only. As that's the startup process that will reset
or restore the stats I think that makes sense.

It looks like that what you have in mind is much more generic, what about:

- Focus on this thread first and then move the call as proposed above
- Think about a more generic idea later on (on the per-backend I/O stats is
in).

Moving pgstat_create_backend_stat() may be OK in the long-term, at
least that would document why we need to care about the startup
process.

Still, moving only the call feels incomplete, so how about adding a
boolean field in PgStat_ShmemControl that defaults to false when the
shmem area is initialized in StatsShmemInit(), then switched to true
once we know that the stats have been restored or reset by the startup
process. Something like that should work to control the dshash
inserts, I guess?
--
Michael

#39Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#36)
Re: per backend I/O statistics

On Wed, Nov 20, 2024 at 02:10:23PM +0000, Bertrand Drouvot wrote:

Yeah, also this could useful for custom statistics. So I created a dedicated
thread and a patch proposal (see [1]).

[1]: /messages/by-id/Zz3skBqzBncSFIuY@ip-10-97-1-34.eu-west-3.compute.internal

Thanks for putting that on its own thread. Will look at it.
--
Michael

#40Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#38)
Re: per backend I/O statistics

Hi,

On Thu, Nov 21, 2024 at 07:18:24AM +0900, Michael Paquier wrote:

On Wed, Nov 20, 2024 at 02:20:18PM +0000, Bertrand Drouvot wrote:

Right. I did not had in mind to go that far here (for the per backend stats
needs). My idea was "just" to move the new pgstat_create_backend_stat() (which
is related to per backend stats only) call at the right place in StartupXLOG()
for the startup process only. As that's the startup process that will reset
or restore the stats I think that makes sense.

It looks like that what you have in mind is much more generic, what about:

- Focus on this thread first and then move the call as proposed above
- Think about a more generic idea later on (on the per-backend I/O stats is
in).

Moving pgstat_create_backend_stat() may be OK in the long-term, at
least that would document why we need to care about the startup
process.

Yeap.

Still, moving only the call feels incomplete, so how about adding a
boolean field in PgStat_ShmemControl that defaults to false when the
shmem area is initialized in StatsShmemInit(), then switched to true
once we know that the stats have been restored or reset by the startup
process. Something like that should work to control the dshash
inserts, I guess?

I did some tests (on a primary and on a standby) and currently there is no call
to pgstat_get_entry_ref() at all before we reach the stats reset or restore
in StartupXLOG(). The first ones appear after "we really open for business"
(quoting process_pm_child_exit()).

Now, with the per-backend I/O stats patch in place, there is 3 processes that
could call pgstat_get_entry_ref() before we reach the stats reset or restore in
StartupXLOG() (observed by setting a breakpoint on the startup process before
stats are reset or restored):

- checkpointer
- background writer
- startup process

All calls are for the new backend stat kind. It also makes sense to see non
"regular" backends as the "regulars" are not allowed to connect yet.

Results: they could collect some stats for nothing since a reset or restore will
happen after.

Now, if we want to prevent new entries to be created before the stats are
reset or restored:

- then the end results will be the same (means starting the instance with reset
or restored stats). The only difference would be that they would not collect stats
for "nothing" during this small time window.

- it looks like that would lead to some non negligible refactoring: I started the
coding and observed the potential impact. The reason is that all the code (at least
the one I looked at) does not expect to receive a NULL value from pgstat_get_entry_ref()
(when asking for creation) and we would need to take care of all the cases
(all the stats, all backend type) for safety reason.

So, given that:

- the end result would be the same
- the code changes would be non negligible (unless we have a better idea than
pgstat_get_entry_ref() returning a NULL value).

I think that might be valid to try to implement it (to avoid those processes to
collect the stats for nothing during this small time window) but I am not sure
that's worth it (as the end result would be the same and the code change non
negligible).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#41Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#40)
Re: per backend I/O statistics

On Thu, Nov 21, 2024 at 05:23:42PM +0000, Bertrand Drouvot wrote:

So, given that:

- the end result would be the same
- the code changes would be non negligible (unless we have a better idea than
pgstat_get_entry_ref() returning a NULL value).

Hmm. created_entry only matters for pgstat_init_function_usage().
All the other callers of pgstat_prep_pending_entry() pass a NULL
value. This makes me wonder if there is an argument for reworking a
bit that. There's a fat comment about the reason why, still, that
would make the get interface for variable-numbered stats a lot
leaner.. Not sure what to think about that.
--
Michael

#42Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#41)
Re: per backend I/O statistics

Hi,

On Fri, Nov 22, 2024 at 10:36:29AM +0900, Michael Paquier wrote:

On Thu, Nov 21, 2024 at 05:23:42PM +0000, Bertrand Drouvot wrote:

So, given that:

- the end result would be the same
- the code changes would be non negligible (unless we have a better idea than
pgstat_get_entry_ref() returning a NULL value).

Hmm. created_entry only matters for pgstat_init_function_usage().
All the other callers of pgstat_prep_pending_entry() pass a NULL
value.

I meant to say all the calls that passe "create" as true in pgstat_get_entry_ref().

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#43Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#42)
Re: per backend I/O statistics

On Fri, Nov 22, 2024 at 07:49:58AM +0000, Bertrand Drouvot wrote:

On Fri, Nov 22, 2024 at 10:36:29AM +0900, Michael Paquier wrote:

Hmm. created_entry only matters for pgstat_init_function_usage().
All the other callers of pgstat_prep_pending_entry() pass a NULL
value.

I meant to say all the calls that passe "create" as true in pgstat_get_entry_ref().

Ah, OK, I think that I see your point here.

I am wondering how much this would matter as well for custom stats,
but we're not there yet without at least one release out and folks try
new things with these APIs and variable-numbered kinds. Allowing
pgstat_prep_pending_entry() to return NULL even if "create" is true
may be a good thing, at the end, because that's the only way I can see
based on the current APIs where we could say "Sorry, but the stats
have not been loaded yet, so you cannot try to do anything related to
the dshash".

From my view having a kind of barrier would be cleaner in the long
run, but it's true that it may not be mandatory, as well. pg_stat_io
is currently OK to be called because the stats are loaded for
auxiliary processes because it uses fixed-numbered stats in shmem.
And it means we already have early calls that add stats getting
overwritten once the stats are loaded from the on-disk file (Am I
getting this part right?).

Anyway, do we really require that for the sake of this thread? We
know that there's only one of each auxiliary process at a time, and
they keep a footprint in pg_stat_io already. So we could just limit
outselves to live database backends, WAL senders and autovacuum
workers, everything that's not auxiliary and spawned on request?
--
Michael

#44Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#43)
Re: per backend I/O statistics

Hi,

On Mon, Nov 25, 2024 at 10:06:44AM +0900, Michael Paquier wrote:

On Fri, Nov 22, 2024 at 07:49:58AM +0000, Bertrand Drouvot wrote:

On Fri, Nov 22, 2024 at 10:36:29AM +0900, Michael Paquier wrote:

Hmm. created_entry only matters for pgstat_init_function_usage().
All the other callers of pgstat_prep_pending_entry() pass a NULL
value.

I meant to say all the calls that passe "create" as true in pgstat_get_entry_ref().

Ah, OK, I think that I see your point here.

I am wondering how much this would matter as well for custom stats,
but we're not there yet without at least one release out and folks try
new things with these APIs and variable-numbered kinds.

Not sure here, could custom stats start incrementing before the database system
is ready to accept connections?

pgstat_prep_pending_entry() to return NULL even if "create" is true
may be a good thing, at the end, because that's the only way I can see
based on the current APIs where we could say "Sorry, but the stats
have not been loaded yet, so you cannot try to do anything related to
the dshash".

Yeah, same here.

From my view having a kind of barrier would be cleaner in the long
run, but it's true that it may not be mandatory, as well. pg_stat_io
is currently OK to be called because the stats are loaded for
auxiliary processes because it uses fixed-numbered stats in shmem.
And it means we already have early calls that add stats getting
overwritten once the stats are loaded from the on-disk file (Am I
getting this part right?).

Yeah, we can already see that, for example, the background writer could enter
pgstat_io_flush_cb() before the stats are reset or restored.

Anyway, do we really require that for the sake of this thread? We
know that there's only one of each auxiliary process at a time, and
they keep a footprint in pg_stat_io already. So we could just limit
outselves to live database backends, WAL senders and autovacuum
workers, everything that's not auxiliary and spawned on request?

I think that's a fair starting point and that we will not lose any informations
doing so (as you said there is only one of each auxiliary process at a time,
so that one could already see their stats from pg_stat_io).

The only cons that I can see is that we will not be able to merge the flush cb
but I don't think that's a blocker (the flush are done in shared memory so the
impact on performance should not be that much of an issue).

I'll come back with a new version implementing the above.

[1]: /messages/by-id/Zz9sno+JJbWqdXhQ@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#45Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#44)
Re: per backend I/O statistics[

On Mon, Nov 25, 2024 at 07:12:56AM +0000, Bertrand Drouvot wrote:

Not sure here, could custom stats start incrementing before the database system
is ready to accept connections?

In theory, that could be possible. Like pg_stat_io currently, I am
ready to assume that it likely won't matter much.

The only cons that I can see is that we will not be able to merge the flush cb
but I don't think that's a blocker (the flush are done in shared memory so the
impact on performance should not be that much of an issue).

The backend and I/O stats could begin diverging as a result of a new
implementation detail, and the performance of the flushes don't worry
me knowing at which frequency they happen on a live system.
--
Michael

#46Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#45)
1 attachment(s)
Re: per backend I/O statistics

Hi,

On Mon, Nov 25, 2024 at 04:18:54PM +0900, Michael Paquier wrote:

On Mon, Nov 25, 2024 at 07:12:56AM +0000, Bertrand Drouvot wrote:

Not sure here, could custom stats start incrementing before the database system
is ready to accept connections?

In theory, that could be possible. Like pg_stat_io currently, I am
ready to assume that it likely won't matter much.

Yeah right, agree.

The only cons that I can see is that we will not be able to merge the flush cb
but I don't think that's a blocker (the flush are done in shared memory so the
impact on performance should not be that much of an issue).

The backend and I/O stats could begin diverging as a result of a new
implementation detail, and the performance of the flushes don't worry
me knowing at which frequency they happen on a live system.

Same here.

Please find attached v6 that enables per-backend I/O stats for the
B_AUTOVAC_WORKER, B_BACKEND, B_BG_WORKER, B_STANDALONE_BACKEND, B_SLOTSYNC_WORKER
and B_WAL_SENDER backend types.

It also takes care of most of the comments that you have made in [1]/messages/by-id/ZzWZV9LdyZ9aFSWs@paquier.xyz, meaning
that it:

- removes the backend type from PgStat_Backend and look for the backend type
at "display" time.
- creates PgStat_BackendPendingIO and PgStat_PendingIO now refers to it (I
used PgStat_BackendPendingIO and not PgStat_BackendPending because this is what
it is after all).
- adds the missing comment related to the PID in the doc.
- merges 0004 with 0001 (so that pg_stat_get_backend_io() is now part of 0001).
- creates its own pgstat_backend.c file.

=== Remarks

R1: as compared to v5, v6 removes the per-backend I/O stats reset from
pg_stat_reset_shared(). I think it makes more sense that way, since we are
adding pg_stat_reset_single_backend_io_counters(). The per-backend I/O stats
behaves then as the subscription stats as far the reset is concerned.

R2: as we can't merge the flush cb anymore, only the patches related to
the stats_fetch_consistency/'snapshot' are missing in v6 (as compared to v5).
I propose to re-submit them, re-start the discussion once 0001 goes in.

[1]: /messages/by-id/ZzWZV9LdyZ9aFSWs@paquier.xyz

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v6-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 21d773ae2b64571b09a77d6de4c955338cf979ae Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 28 Oct 2024 12:50:32 +0000
Subject: [PATCH v6] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds the
ability to track and display per backend I/O statistics.

It adds a new statistics kind, a new pg_my_stat_io view to display "my" backend
I/O statistics, and 2 new functions:

- pg_stat_reset_single_backend_io_counters() to be able to reset the I/O stats
for a given backend pid.
- pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid.

The new KIND is named PGSTAT_KIND_PER_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so using
"write_to_file = false".

Note that per backend I/O statistics are not collected for the checkpointer,
the background writer, the startup process and the autovacuum launcher as those
are already visible in pg_stat_io and there is only one of those.

XXX: Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                    |   4 +-
 doc/src/sgml/monitoring.sgml                |  63 ++++++
 src/backend/catalog/system_functions.sql    |   2 +
 src/backend/catalog/system_views.sql        |  22 +++
 src/backend/utils/activity/Makefile         |   1 +
 src/backend/utils/activity/backend_status.c |   4 +
 src/backend/utils/activity/meson.build      |   1 +
 src/backend/utils/activity/pgstat.c         |  19 +-
 src/backend/utils/activity/pgstat_backend.c | 176 +++++++++++++++++
 src/backend/utils/activity/pgstat_io.c      |  37 +++-
 src/backend/utils/adt/pgstatfuncs.c         | 208 ++++++++++++++++++++
 src/include/catalog/pg_proc.dat             |  14 ++
 src/include/pgstat.h                        |  32 ++-
 src/include/utils/pgstat_internal.h         |  12 ++
 src/test/regress/expected/rules.out         |  19 ++
 src/test/regress/expected/stats.out         |  85 +++++++-
 src/test/regress/sql/stats.sql              |  47 ++++-
 src/tools/pgindent/typedefs.list            |   3 +
 18 files changed, 729 insertions(+), 20 deletions(-)
  11.6% doc/src/sgml/
  27.0% src/backend/utils/activity/
  22.4% src/backend/utils/adt/
   4.0% src/include/catalog/
   5.9% src/include/
  14.9% src/test/regress/expected/
  11.4% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a84e60c09b..20a0862661 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8365,7 +8365,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
+        <structname>pg_stat_io</structname></link>,
+        <link linkend="monitoring-pg-my-stat-io-view">
+        <structname>pg_my_stat_io</structname></link>, in the output of
         <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
         is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..8e7a7a0f36 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -488,6 +488,16 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_my_stat_io</structname><indexterm><primary>pg_my_stat_io</primary></indexterm></entry>
+      <entry>
+       One row for each combination of context and target object containing
+       my backend I/O statistics.
+       See <link linkend="monitoring-pg-my-stat-io-view">
+       <structname>pg_my_stat_io</structname></link> for details.
+     </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_replication_slots</structname><indexterm><primary>pg_stat_replication_slots</primary></indexterm></entry>
       <entry>One row per replication slot, showing statistics about the
@@ -2946,7 +2956,23 @@ description | Waiting for a newly initialized WAL file to reach durable storage
    </para>
   </note>
 
+ </sect2>
+
+ <sect2 id="monitoring-pg-my-stat-io-view">
+  <title><structname>pg_my_stat_io</structname></title>
+
+  <indexterm>
+   <primary>pg_my_stat_io</primary>
+  </indexterm>
 
+  <para>
+   The <structname>pg_my_stat_io</structname> view will contain one row for each
+   combination of target I/O object and I/O context, showing
+   my backend I/O statistics. Combinations which do not make sense are
+   omitted. The fields are exactly the same as the ones in the <link
+   linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+   view.
+  </para>
 
  </sect2>
 
@@ -4790,6 +4816,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4971,6 +5016,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_single_backend_io_counters</primary>
+        </indexterm>
+        <function>pg_stat_reset_single_backend_io_counters</function> ( <type>integer</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets I/O statistics for a single backend with the specified process ID
+        to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..1b21b242cd 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_backend_io_counters(integer) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index da9a8fe99f..fce3b791bb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1172,6 +1172,28 @@ SELECT
        b.stats_reset
 FROM pg_stat_get_io() b;
 
+CREATE VIEW pg_my_stat_io AS
+SELECT
+       b.backend_type,
+       b.object,
+       b.context,
+       b.reads,
+       b.read_time,
+       b.writes,
+       b.write_time,
+       b.writebacks,
+       b.writeback_time,
+       b.extends,
+       b.extend_time,
+       b.op_bytes,
+       b.hits,
+       b.evictions,
+       b.reuses,
+       b.fsyncs,
+       b.fsync_time,
+       b.stats_reset
+FROM pg_stat_get_backend_io(NULL) b;
+
 CREATE VIEW pg_stat_wal AS
     SELECT
         w.wal_records,
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17..24b64a2742 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -20,6 +20,7 @@ OBJS = \
 	backend_status.o \
 	pgstat.o \
 	pgstat_archiver.o \
+	pgstat_backend.o \
 	pgstat_bgwriter.o \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index bdb3a296ca..249439b990 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -249,6 +249,10 @@ pgstat_beinit(void)
 	Assert(MyProcNumber >= 0 && MyProcNumber < NumBackendStatSlots);
 	MyBEEntry = &BackendStatusArray[MyProcNumber];
 
+	/* Create the per-backend statistics entry */
+	if (pgstat_tracks_per_backend_bktype(MyBackendType))
+		pgstat_create_backend_stat(MyProcNumber);
+
 	/* Set up a process-exit hook to clean up */
 	on_shmem_exit(pgstat_beshutdown_hook, 0);
 }
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index f73c22905c..380d3dd70c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'backend_status.c',
   'pgstat.c',
   'pgstat_archiver.c',
+  'pgstat_backend.c',
   'pgstat_bgwriter.c',
   'pgstat_checkpointer.c',
   'pgstat_database.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 6f8d237826..1449d6a3f4 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -77,6 +77,7 @@
  *
  * Each statistics kind is handled in a dedicated file:
  * - pgstat_archiver.c
+ * - pgstat_backend.c
  * - pgstat_bgwriter.c
  * - pgstat_checkpointer.c
  * - pgstat_database.c
@@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_PER_BACKEND] = {
+		.name = "per-backend",
+
+		.fixed_amount = false,
+		.write_to_file = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_BackendPendingIO),
+
+		.flush_pending_cb = pgstat_per_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -768,7 +785,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / backend / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush of fixed-numbered stats */
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
new file mode 100644
index 0000000000..f079f15ff2
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -0,0 +1,176 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_backend.c
+ *	  Implementation of per-backend statistics.
+ *
+ * This file contains the implementation of per-backend statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_backend.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+/*
+ * Returns my backend IO stats.
+ */
+PgStat_Backend *
+pgstat_fetch_my_stat_io(void)
+{
+	return (PgStat_Backend *)
+		pgstat_fetch_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid, MyProcNumber);
+}
+
+/*
+ * Returns other backend's IO stats.
+ */
+PgStat_Backend *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_PER_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_BackendPendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
+/*
+ * Create the per-backend statistics entry for procnum.
+ */
+void
+pgstat_create_backend_stat(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+}
+
+/*
+ * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ */
+PgStat_BackendPendingIO *
+pgstat_prep_per_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
+/*
+ * per-backend statistics are not collected for all BackendTypes.
+ *
+ * The following BackendTypes do not participate in the per-backend stats
+ * subsystem:
+ * - The same and for the same reasons as in pgstat_tracks_io_bktype().
+ * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their
+ * I/O stats are already visible in pg_stat_io and there is only one of those.
+ *
+ * Function returns true if BackendType participates in the per-backend stats
+ * subsystem for IO and false if it does not.
+ *
+ * When adding a new BackendType, also consider adding relevant restrictions to
+ * pgstat_tracks_io_object() and pgstat_tracks_io_op().
+ */
+bool
+pgstat_tracks_per_backend_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_AUTOVAC_LAUNCHER:
+		case B_DEAD_END_BACKEND:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+		case B_WAL_SUMMARIZER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+			return false;
+
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_STANDALONE_BACKEND:
+		case B_SLOTSYNC_WORKER:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9883af2b3..1af5b75df6 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,13 +20,7 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
+typedef PgStat_BackendPendingIO PgStat_PendingIO;
 
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
@@ -87,6 +81,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if (pgstat_tracks_per_backend_bktype(MyBackendType))
+	{
+		PgStat_PendingIO *entry_ref;
+
+		entry_ref = pgstat_prep_per_backend_pending(MyProcNumber);
+		entry_ref->counts[io_object][io_context][io_op] += cnt;
+	}
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
@@ -122,6 +124,7 @@ void
 pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 						instr_time start_time, uint32 cnt)
 {
+
 	if (track_io_timing)
 	{
 		instr_time	io_time;
@@ -148,6 +151,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+
+		if (pgstat_tracks_per_backend_bktype(MyBackendType))
+		{
+			PgStat_PendingIO *entry_ref;
+
+			entry_ref = pgstat_prep_per_backend_pending(MyProcNumber);
+			INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+						   io_time);
+		}
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
@@ -171,12 +183,21 @@ pgstat_io_have_pending_cb(void)
 }
 
 /*
- * Simpler wrapper of pgstat_io_flush_cb()
+ * Simpler wrapper of pgstat_io_flush_cb() and pgstat_per_backend_flush_cb().
  */
 void
 pgstat_flush_io(bool nowait)
 {
+	PgStat_EntryRef *entry_ref;
+
 	(void) pgstat_io_flush_cb(nowait);
+
+	if (pgstat_tracks_per_backend_bktype(MyBackendType))
+	{
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+		(void) pgstat_per_backend_flush_cb(entry_ref, nowait);
+	}
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 60a397dc56..0302970246 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1480,6 +1480,165 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend *backend_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+	int			pid;
+
+	/* just to keep compiler quiet */
+	bktype = B_INVALID;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	if (PG_ARGISNULL(0) || PG_GETARG_INT32(0) == MyProcPid)
+	{
+		backend_stats = pgstat_fetch_my_stat_io();
+		pid = MyProcPid;
+		bktype = MyBackendType;
+	}
+	else
+	{
+		PGPROC	   *proc;
+		ProcNumber	procNumber;
+
+		pid = PG_GETARG_INT32(0);
+		proc = BackendPidGetProc(pid);
+
+		/*
+		 * Maybe an auxiliary process? That should not be possible, due to
+		 * pgstat_tracks_per_backend_bktype() though.
+		 */
+		if (proc == NULL)
+			proc = AuxiliaryPidGetProc(pid);
+
+		if (proc != NULL)
+		{
+			procNumber = GetNumberFromPGProc(proc);
+			backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+
+			if (!backend_stats)
+				return (Datum) 0;
+		}
+		else
+			return (Datum) 0;
+	}
+
+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}
+
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1785,6 +1944,55 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+Datum
+pg_stat_reset_single_backend_io_counters(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	BackendType bktype;
+	int			backend_pid = PG_GETARG_INT32(0);
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/*
+	 * Maybe an auxiliary process? That should not be possible, due to
+	 * pgstat_tracks_per_backend_bktype() though.
+	 */
+	if (proc == NULL)
+		proc = AuxiliaryPidGetProc(backend_pid);
+
+	/* Check that IO stats are collected for this backend type */
+	if (proc)
+	{
+		for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+		{
+			LocalPgBackendStatus *local_beentry;
+			PgBackendStatus *beentry;
+
+			/* Get the next one in the list */
+			local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+			beentry = &local_beentry->backendStatus;
+
+			/* looking for specific PID, ignore all the others */
+			if (beentry->st_procpid != backend_pid)
+				continue;
+
+			bktype = beentry->st_backendType;
+
+			if (!pgstat_tracks_per_backend_bktype(bktype))
+				PG_RETURN_VOID();
+			else
+			{
+				pgstat_reset(PGSTAT_KIND_PER_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+				break;
+			}
+		}
+	}
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cbbe8acd38..6e5c2082a7 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5913,6 +5913,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: per backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4', proisstrict => 'f',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6052,6 +6061,11 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '9987',
+  descr => 'statistics: reset collected IO statistics for a single backend',
+  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_single_backend_io_counters' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 59c28b4aca..c5151c8e6f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_PER_BACKEND	6
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -347,12 +348,23 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_BackendPendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BackendPendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -534,6 +546,14 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
 extern void pgstat_report_archiver(const char *xlog, bool failed);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_Backend *pgstat_fetch_my_stat_io(void);
+extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern bool pgstat_tracks_per_backend_bktype(BackendType bktype);
+extern void pgstat_create_backend_stat(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 7338bc1e28..625bb55864 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -604,6 +609,13 @@ extern void pgstat_archiver_init_shmem_cb(void *stats);
 extern void pgstat_archiver_reset_all_cb(TimestampTz ts);
 extern void pgstat_archiver_snapshot_cb(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_BackendPendingIO *pgstat_prep_per_backend_pending(ProcNumber procnum);
+extern bool pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 3014d047fe..3922675bbf 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1398,6 +1398,25 @@ pg_matviews| SELECT n.nspname AS schemaname,
      LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
      LEFT JOIN pg_tablespace t ON ((t.oid = c.reltablespace)))
   WHERE (c.relkind = 'm'::"char");
+pg_my_stat_io| SELECT backend_type,
+    object,
+    context,
+    reads,
+    read_time,
+    writes,
+    write_time,
+    writebacks,
+    writeback_time,
+    extends,
+    extend_time,
+    op_bytes,
+    hits,
+    evictions,
+    reuses,
+    fsyncs,
+    fsync_time,
+    stats_reset
+   FROM pg_stat_get_backend_io(NULL::integer) b(backend_type, object, context, reads, read_time, writes, write_time, writebacks, writeback_time, extends, extend_time, op_bytes, hits, evictions, reuses, fsyncs, fsync_time, stats_reset);
 pg_policies| SELECT n.nspname AS schemaname,
     c.relname AS tablename,
     pol.polname AS policyname,
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..7f2fd873b9 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my_]stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1259,11 +1259,24 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- be sure of the state of shared buffers at the point the test is run.
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1293,25 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1331,31 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the per-backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1576,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1592,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_single_backend_io_counters() does
+SELECT pg_stat_reset_single_backend_io_counters(pg_backend_pid());
+ pg_stat_reset_single_backend_io_counters 
+------------------------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..497b2b438a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_[my_]stat_io:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -607,20 +607,40 @@ SELECT pg_stat_get_subscription_stats(NULL);
 
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS backend_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset backend_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_my_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+SELECT sum(extends) AS backend_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :backend_io_sum_shared_after_extends > :backend_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -631,6 +651,18 @@ SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_my_stat_io
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the per-backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -762,10 +794,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_my_stat_io \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_my_stat_io \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_single_backend_io_counters() does
+SELECT pg_stat_reset_single_backend_io_counters(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_my_stat_io \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
 
 -- test BRIN index doesn't block HOT update
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b54428b38c..ea676361a5 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2120,6 +2120,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2135,6 +2136,8 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
+PgStat_BackendPendingIO
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.34.1

#47Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#46)
Re: per backend I/O statistics

On Mon, Nov 25, 2024 at 03:47:59PM +0000, Bertrand Drouvot wrote:

It also takes care of most of the comments that you have made in [1], meaning
that it:

- removes the backend type from PgStat_Backend and look for the backend type
at "display" time.
- creates PgStat_BackendPendingIO and PgStat_PendingIO now refers to it (I
used PgStat_BackendPendingIO and not PgStat_BackendPending because this is what
it is after all).
- adds the missing comment related to the PID in the doc.
- merges 0004 with 0001 (so that pg_stat_get_backend_io() is now part of 0001).
- creates its own pgstat_backend.c file.

I have begun studying the patch, and I have one question.

+void
+pgstat_create_backend_stat(ProcNumber procnum)
[...]
+   /* Create the per-backend statistics entry */
+   if (pgstat_tracks_per_backend_bktype(MyBackendType))
+       pgstat_create_backend_stat(MyProcNumber);

Perhaps that's a very stupid question, but I was looking at this part
of the patch, and wondered why we don't use init_backend_cb? I
vaguely recalled that MyProcNumber and MyBackendType would be set
before we go through the pgstat initialization step. MyDatabaseId is
filled after the pgstat initialization, but we don't care about this
information for such backend stats.
--
Michael

#48Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#47)
Re: per backend I/O statistics

Hi,

On Wed, Nov 27, 2024 at 03:33:38PM +0900, Michael Paquier wrote:

On Mon, Nov 25, 2024 at 03:47:59PM +0000, Bertrand Drouvot wrote:

It also takes care of most of the comments that you have made in [1], meaning
that it:

- removes the backend type from PgStat_Backend and look for the backend type
at "display" time.
- creates PgStat_BackendPendingIO and PgStat_PendingIO now refers to it (I
used PgStat_BackendPendingIO and not PgStat_BackendPending because this is what
it is after all).
- adds the missing comment related to the PID in the doc.
- merges 0004 with 0001 (so that pg_stat_get_backend_io() is now part of 0001).
- creates its own pgstat_backend.c file.

I have begun studying the patch, and I have one question.

+void
+pgstat_create_backend_stat(ProcNumber procnum)
[...]
+   /* Create the per-backend statistics entry */
+   if (pgstat_tracks_per_backend_bktype(MyBackendType))
+       pgstat_create_backend_stat(MyProcNumber);

Perhaps that's a very stupid question, but I was looking at this part
of the patch, and wondered why we don't use init_backend_cb? I
vaguely recalled that MyProcNumber and MyBackendType would be set
before we go through the pgstat initialization step. MyDatabaseId is
filled after the pgstat initialization, but we don't care about this
information for such backend stats.

Using init_backend_cb (and moving the pgstat_tracks_per_backend_bktype() check
in it) is currently producing a failed assertion:

TRAP: failed Assert("pgstat_is_initialized && !pgstat_is_shutdown"), File: "pgstat.c", Line: 1567, PID: 2080000
postgres: postgres postgres [local] initializing(ExceptionalCondition+0xbb)[0x5fd0c69afaed]
postgres: postgres postgres [local] initializing(pgstat_assert_is_up+0x3f)[0x5fd0c67d1b77]
postgres: postgres postgres [local] initializing(pgstat_get_entry_ref+0x95)[0x5fd0c67db068]
postgres: postgres postgres [local] initializing(pgstat_prep_pending_entry+0xaa)[0x5fd0c67d1181]
postgres: postgres postgres [local] initializing(pgstat_create_backend_stat+0x96)[0x5fd0c67d3986]

But even if we, say move the "pgstat_is_initialized = true" before the init_backend_cb()
call in pgstat_initialize() (not saying that makes sense, just trying out), then
we are back to the restore/reset issue but this time during initdb:

TRAP: failed Assert("!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key)"), File: "pgstat_shmem.c", Line: 852, PID: 2109086
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(ExceptionalCondition+0xbb)[0x621723499c5a]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(+0x70dd2b)[0x6217232c5d2b]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(pgstat_drop_all_entries+0x61)[0x6217232c60b5]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(+0x705356)[0x6217232bd356]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(+0x70462a)[0x6217232bc62a]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(pgstat_restore_stats+0x1c)[0x6217232b9f01]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(StartupXLOG+0x693)[0x621722da9697]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(InitPostgres+0x1d8)[0x6217234b40df]
/home/postgres/postgresql/pg_installed/pg18/bin/postgres(BootstrapModeMain+0x532)[0x621722dd79bf]

where "PID: 2109086" is a B_STANDALONE_BACKEND.

This is due to the fact that, during the initdb bootstrap, init_backend_cb() is
called before StartupXLOG().

One option could be to move B_STANDALONE_BACKEND in the "false" section of
pgstat_tracks_per_backend_bktype() and ensure that we set pgstat_is_initialized
to true before init_backend_cb() is called (but I don't think that makes that much
sense).

I'd vote to just keep the pgstat_create_backend_stat() call in pgstat_beinit().

That said, we probably should document that init_backend_cb() should not call
pgstat_get_entry_ref(), but that's probably worth another thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#49Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#48)
Re: per backend I/O statistics

On Wed, Nov 27, 2024 at 04:00:14PM +0000, Bertrand Drouvot wrote:

I'd vote to just keep the pgstat_create_backend_stat() call in pgstat_beinit().

That said, we probably should document that init_backend_cb() should not call
pgstat_get_entry_ref(), but that's probably worth another thread.

Hmm, yeah. I have considered this point and your approach does not
seem that bad to me after thinking much more about it, and you are
putting the call in pgstat_beinit(), which is about initializing the
stats for the backend. So I'm OK with what the patch does here in
terms of the location where pgstat_create_backend_stat() is called.
--
Michael

#50Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#46)
Re: per backend I/O statistics

On Mon, Nov 25, 2024 at 03:47:59PM +0000, Bertrand Drouvot wrote:

=== Remarks

R1: as compared to v5, v6 removes the per-backend I/O stats reset from
pg_stat_reset_shared(). I think it makes more sense that way, since we are
adding pg_stat_reset_single_backend_io_counters(). The per-backend I/O stats
behaves then as the subscription stats as far the reset is concerned.

Makes sense to me to not include that in pg_stat_reset_shared().

R2: as we can't merge the flush cb anymore, only the patches related to
the stats_fetch_consistency/'snapshot' are missing in v6 (as compared to v5).
I propose to re-submit them, re-start the discussion once 0001 goes in.

Yeah, thanks. I should think more about this part, but I'm still kind
of unconvinced. Let's do things step by step. For now, I have looked
at v6.

+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.

This last sentence seems unnecessary? The function is named
"backend", and well, all these processes are not backends.

+    /*
+     * Maybe an auxiliary process? That should not be possible, due to
+     * pgstat_tracks_per_backend_bktype() though.
+     */
+    if (proc == NULL)
+        proc = AuxiliaryPidGetProc(backend_pid);
[...]
+        /*
+         * Maybe an auxiliary process? That should not be possible, due to
+         * pgstat_tracks_per_backend_bktype() though.
+         */
+        if (proc == NULL)
+            proc = AuxiliaryPidGetProc(pid);

This does not seem right. Shouldn't we return immediately if
BackendPidGetProc() finds nothing matching with the PID?

+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}

Sounds to me that the backend type is not strictly required in this
function call if pg_stat_activity can tell already that?

+ (void) pgstat_per_backend_flush_cb(entry_ref, nowait);

I'd recommend to not directly call the callback, use a wrapper
function instead if need be.

pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
instr_time start_time, uint32 cnt)
{
+
if (track_io_timing)

Noise diff.

 /*
- * Simpler wrapper of pgstat_io_flush_cb()
+ * Simpler wrapper of pgstat_io_flush_cb() and pgstat_per_backend_flush_cb().
  */
 void
 pgstat_flush_io(bool nowait)

This is also called in the checkpointer and the bgwriter and the
walwriter via pgstat_report_wal(), which is kind of useless. Perhaps
just use a different, separate function instead and use that where it
makes sense (per se also the argument of upthread that backend stats
may not be only IO-related..).

Sounds to me that PgStat_BackendPendingIO should be
PgStat_BackendPendingStats?

+{ oid => '8806', descr => 'statistics: per backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',

Similarly, s/pg_stat_get_backend_io/pg_stat_get_backend_stats/?

+  descr => 'statistics: reset collected IO statistics for a single backend',
+  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 'v', 

And here, pg_stat_reset_backend_stats?

#define PGSTAT_KIND_SUBSCRIPTION 5 /* per-subscription statistics */
+#define PGSTAT_KIND_PER_BACKEND 6

Missing one comment here.

FWIW, I'm so-so about the addition of pg_my_stat_io, knowing that
pg_stat_get_backend_io(NULL/pg_backend_pid()) does the same job. I
would just add a note in the docs with a query showing how to use it
with pg_stat_activity. An example with LATERAL, doing the same work:
select a.pid, s.* from pg_stat_activity as a,
lateral pg_stat_get_backend_io(a.pid) as s
where pid = pg_backend_pid();
--
Michael

#51Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#50)
1 attachment(s)
Re: per backend I/O statistics

Hi,

On Thu, Dec 12, 2024 at 01:52:03PM +0900, Michael Paquier wrote:

On Mon, Nov 25, 2024 at 03:47:59PM +0000, Bertrand Drouvot wrote:

+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.

This last sentence seems unnecessary? The function is named
"backend", and well, all these processes are not backends.

Yeah, but you can find their pid through pg_stat_activity for which the pid field
description is "Process ID of this backend". Also we can find "backend_type" in
both pg_stat_activity and pg_stat_io, so I think this last sentence could help
to avoid any confusion though.

+    /*
+     * Maybe an auxiliary process? That should not be possible, due to
+     * pgstat_tracks_per_backend_bktype() though.
+     */
+    if (proc == NULL)
+        proc = AuxiliaryPidGetProc(backend_pid);
[...]
+        /*
+         * Maybe an auxiliary process? That should not be possible, due to
+         * pgstat_tracks_per_backend_bktype() though.
+         */
+        if (proc == NULL)
+            proc = AuxiliaryPidGetProc(pid);

This does not seem right. Shouldn't we return immediately if
BackendPidGetProc() finds nothing matching with the PID?

Yeah, that would work. I was keeping the AuxiliaryPidGetProc() calls just in case
we want to add the Aux processes back in the future. Replaced with a comment
in v7 attached instead.

+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}

Sounds to me that the backend type is not strictly required in this
function call if pg_stat_activity can tell already that?

Yeah, but it's needed in pg_stat_get_backend_io() for the stats filtering (to
display only those linked to this backend type), later in the function here:

+    /*
+     * Some combinations of BackendType, IOObject, and IOContext are
+     * not valid for any type of IOOp. In such cases, omit the entire
+     * row from the view.
+     */
+    if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+          continue;

OTOH, now that we get rid of the AuxiliaryPidGetProc() call in
pg_stat_reset_single_backend_io_counters() we can also remove the need to
look for the backend type in this function.

+ (void) pgstat_per_backend_flush_cb(entry_ref, nowait);

I'd recommend to not directly call the callback, use a wrapper
function instead if need be.

Makes sense, created its own callback in pgstat_backend.c. Bonus point, it avoids
unnecessary pgstat_tracks_per_backend_bktype() checks in :

pgstat_report_bgwriter()
pgstat_report_checkpointer()
pgstat_report_wal()

as there is no attempt to call the callback anymore in those places.

/*
- * Simpler wrapper of pgstat_io_flush_cb()
+ * Simpler wrapper of pgstat_io_flush_cb() and pgstat_per_backend_flush_cb().
*/
void
pgstat_flush_io(bool nowait)

This is also called in the checkpointer and the bgwriter and the
walwriter via pgstat_report_wal(), which is kind of useless. Perhaps
just use a different, separate function instead and use that where it
makes sense (per se also the argument of upthread that backend stats
may not be only IO-related..).

Yeah, done that way.

BTW, not related to this particular patch but I realized that pgstat_flush_io()
is called for the walwriter. Indeed, it's coming from the pgstat_report_wal()
call in WalWriterMain(). That can not report any I/O stats activity (as the
walwriter is not part of the I/O stats tracking, see pgstat_tracks_io_bktype()).

So it looks like, we could move pgstat_flush_io() outside of pgstat_report_wal()
and add the pgstat_flush_io() calls only where they need to be made (and so, not
in WalWriterMain()).

Maybe a dedicated thread is worth it for that, thoughts?

Sounds to me that PgStat_BackendPendingIO should be
PgStat_BackendPendingStats?

Yeah, can do that as that's what is being used for the pending_size for
example.

OTOH, it's clear that this one in pgstat_io.c has to be "linked" to an "IO"
related one:

"
typedef PgStat_BackendPendingStats PgStat_PendingIO;
"

The right way would probably be to do something like:

"
typedef struct PgStat_BackendPendingStats {
PgStat_BackendPendingIO pendingio;
} PgStat_BackendPendingStats;
"

But I'm not sure that's worth it until we don't have a need to add more
pending stats per-backend, thoughts?

+{ oid => '8806', descr => 'statistics: per backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',

Similarly, s/pg_stat_get_backend_io/pg_stat_get_backend_stats/?

I think that's fine to keep pg_stat_get_backend_io(). If we add more per-backend
stats in the future then we could add "dedicated" get functions too and a generic
one retrieving all of them.

+  descr => 'statistics: reset collected IO statistics for a single backend',
+  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 'v', 

And here, pg_stat_reset_backend_stats?

Same as above, we could imagine that in the future the backend would get mutiple
stats and that one would want to reset only the I/O ones for example.

#define PGSTAT_KIND_SUBSCRIPTION 5 /* per-subscription statistics */
+#define PGSTAT_KIND_PER_BACKEND 6

Missing one comment here.

Yeap.

FWIW, I'm so-so about the addition of pg_my_stat_io, knowing that
pg_stat_get_backend_io(NULL/pg_backend_pid()) does the same job.

Okay, I don't have a strong opinion about that. Removed in v7.

I
would just add a note in the docs with a query showing how to use it
with pg_stat_activity. An example with LATERAL, doing the same work:
select a.pid, s.* from pg_stat_activity as a,
lateral pg_stat_get_backend_io(a.pid) as s
where pid = pg_backend_pid();

I'm not sure it's worth it. I think that's clear that to get our own stats
then we need to provide our own backend pid. For example pg_stat_get_activity()
does not provide such an example using pg_stat_activity or using something like
pg_stat_get_activity(pg_backend_pid()).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v7-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 6662a78036e92035bd4899e4a92af42f3aaae944 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 28 Oct 2024 12:50:32 +0000
Subject: [PATCH v7] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds the
ability to track and display per backend I/O statistics.

It adds a new statistics kind and 2 new functions:

- pg_stat_reset_single_backend_io_counters() to be able to reset the I/O stats
for a given backend pid.
- pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid.

The new KIND is named PGSTAT_KIND_PER_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so using
"write_to_file = false".

Note that per backend I/O statistics are not collected for the checkpointer,
the background writer, the startup process and the autovacuum launcher as those
are already visible in pg_stat_io and there is only one of those.

XXX: Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                     |   8 +-
 doc/src/sgml/monitoring.sgml                 |  37 ++++
 src/backend/catalog/system_functions.sql     |   2 +
 src/backend/utils/activity/Makefile          |   1 +
 src/backend/utils/activity/backend_status.c  |   4 +
 src/backend/utils/activity/meson.build       |   1 +
 src/backend/utils/activity/pgstat.c          |  19 +-
 src/backend/utils/activity/pgstat_backend.c  | 182 +++++++++++++++++++
 src/backend/utils/activity/pgstat_io.c       |  25 ++-
 src/backend/utils/activity/pgstat_relation.c |   2 +
 src/backend/utils/adt/pgstatfuncs.c          | 166 +++++++++++++++++
 src/include/catalog/pg_proc.dat              |  14 ++
 src/include/pgstat.h                         |  31 +++-
 src/include/utils/pgstat_internal.h          |  14 ++
 src/test/regress/expected/stats.out          |  72 +++++++-
 src/test/regress/sql/stats.sql               |  38 +++-
 src/tools/pgindent/typedefs.list             |   3 +
 17 files changed, 598 insertions(+), 21 deletions(-)
  10.0% doc/src/sgml/
  30.9% src/backend/utils/activity/
  21.8% src/backend/utils/adt/
   4.6% src/include/catalog/
   7.1% src/include/
  13.0% src/test/regress/expected/
  11.6% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0c8325a39..8afca9b110 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8403,9 +8403,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
-        <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
-        is used, in the output of <xref linkend="sql-vacuum"/> when
+        <structname>pg_stat_io</structname></link>, in the output of the
+        <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function, in the
+        output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal>
+        option is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
         for auto-vacuums and auto-analyzes, when <xref
         linkend="guc-log-autovacuum-min-duration"/> is set and by
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..fb13c462d1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4790,6 +4790,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry id="pg-stat-get-backend-io" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4971,6 +4990,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_single_backend_io_counters</primary>
+        </indexterm>
+        <function>pg_stat_reset_single_backend_io_counters</function> ( <type>integer</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets I/O statistics for a single backend with the specified process ID
+        to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..1b21b242cd 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_backend_io_counters(integer) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17..24b64a2742 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -20,6 +20,7 @@ OBJS = \
 	backend_status.o \
 	pgstat.o \
 	pgstat_archiver.o \
+	pgstat_backend.o \
 	pgstat_bgwriter.o \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 22c6dc378c..29779ff99a 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -249,6 +249,10 @@ pgstat_beinit(void)
 	Assert(MyProcNumber >= 0 && MyProcNumber < NumBackendStatSlots);
 	MyBEEntry = &BackendStatusArray[MyProcNumber];
 
+	/* Create the per-backend statistics entry */
+	if (pgstat_tracks_per_backend_bktype(MyBackendType))
+		pgstat_create_backend_stat(MyProcNumber);
+
 	/* Set up a process-exit hook to clean up */
 	on_shmem_exit(pgstat_beshutdown_hook, 0);
 }
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index f73c22905c..380d3dd70c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'backend_status.c',
   'pgstat.c',
   'pgstat_archiver.c',
+  'pgstat_backend.c',
   'pgstat_bgwriter.c',
   'pgstat_checkpointer.c',
   'pgstat_database.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 18b7d9b47d..738ee778bb 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -77,6 +77,7 @@
  *
  * Each statistics kind is handled in a dedicated file:
  * - pgstat_archiver.c
+ * - pgstat_backend.c
  * - pgstat_bgwriter.c
  * - pgstat_checkpointer.c
  * - pgstat_database.c
@@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_PER_BACKEND] = {
+		.name = "per-backend",
+
+		.fixed_amount = false,
+		.write_to_file = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_BackendPendingIO),
+
+		.flush_pending_cb = pgstat_per_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -768,7 +785,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / backend / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush of fixed-numbered stats */
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
new file mode 100644
index 0000000000..dc7bf385c2
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -0,0 +1,182 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_backend.c
+ *	  Implementation of per-backend statistics.
+ *
+ * This file contains the implementation of per-backend statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_backend.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+/*
+ * Returns backend's IO stats.
+ */
+PgStat_Backend *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_PER_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_BackendPendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
+/*
+ * Simpler wrapper of pgstat_per_backend_flush_cb()
+ */
+void
+pgstat_flush_per_backend(bool nowait)
+{
+	if (pgstat_tracks_per_backend_bktype(MyBackendType))
+	{
+		PgStat_EntryRef *entry_ref;
+
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+		(void) pgstat_per_backend_flush_cb(entry_ref, nowait);
+	}
+}
+
+/*
+ * Create the per-backend statistics entry for procnum.
+ */
+void
+pgstat_create_backend_stat(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+}
+
+/*
+ * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ */
+PgStat_BackendPendingIO *
+pgstat_prep_per_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_PER_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
+/*
+ * per-backend statistics are not collected for all BackendTypes.
+ *
+ * The following BackendTypes do not participate in the per-backend stats
+ * subsystem:
+ * - The same and for the same reasons as in pgstat_tracks_io_bktype().
+ * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their
+ * I/O stats are already visible in pg_stat_io and there is only one of those.
+ *
+ * Function returns true if BackendType participates in the per-backend stats
+ * subsystem for IO and false if it does not.
+ *
+ * When adding a new BackendType, also consider adding relevant restrictions to
+ * pgstat_tracks_io_object() and pgstat_tracks_io_op().
+ */
+bool
+pgstat_tracks_per_backend_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_AUTOVAC_LAUNCHER:
+		case B_DEAD_END_BACKEND:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+		case B_WAL_SUMMARIZER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+			return false;
+
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_STANDALONE_BACKEND:
+		case B_SLOTSYNC_WORKER:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9883af2b3..93d34696b9 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,13 +20,7 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
+typedef PgStat_BackendPendingIO PgStat_PendingIO;
 
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
@@ -87,6 +81,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if (pgstat_tracks_per_backend_bktype(MyBackendType))
+	{
+		PgStat_PendingIO *entry_ref;
+
+		entry_ref = pgstat_prep_per_backend_pending(MyProcNumber);
+		entry_ref->counts[io_object][io_context][io_op] += cnt;
+	}
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
@@ -148,6 +150,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+
+		if (pgstat_tracks_per_backend_bktype(MyBackendType))
+		{
+			PgStat_PendingIO *entry_ref;
+
+			entry_ref = pgstat_prep_per_backend_pending(MyProcNumber);
+			INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+						   io_time);
+		}
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index faba8b64d2..c083c5bf01 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,6 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
+	pgstat_flush_per_backend(false);
 }
 
 /*
@@ -350,6 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
+	pgstat_flush_per_backend(false);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cdf37403e9..ce4d22e16c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,6 +1474,151 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend *backend_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+	int			pid;
+	PGPROC	   *proc;
+	ProcNumber	procNumber;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	pid = PG_GETARG_INT32(0);
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_per_backend_bktype() anyway. So don't need an extra call
+	 * to AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		return (Datum) 0;
+
+	procNumber = GetNumberFromPGProc(proc);
+	backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+
+	if (!backend_stats)
+		return (Datum) 0;
+
+	/* just to keep compiler quiet */
+	bktype = B_INVALID;
+
+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}
+
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1779,6 +1924,27 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+Datum
+pg_stat_reset_single_backend_io_counters(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_per_backend_bktype() anyway. So don't need an extra call
+	 * to AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_PER_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0f22c21723..55dec76795 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5913,6 +5913,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: per backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6052,6 +6061,11 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '9987',
+  descr => 'statistics: reset collected IO statistics for a single backend',
+  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_single_backend_io_counters' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 59c28b4aca..b7e8dcb1fd 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_PER_BACKEND	6	/* per-backend statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -347,12 +348,23 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_BackendPendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BackendPendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -534,6 +546,13 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
 extern void pgstat_report_archiver(const char *xlog, bool failed);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern bool pgstat_tracks_per_backend_bktype(BackendType bktype);
+extern void pgstat_create_backend_stat(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 7338bc1e28..141e16d1d3 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -604,6 +609,15 @@ extern void pgstat_archiver_init_shmem_cb(void *stats);
 extern void pgstat_archiver_reset_all_cb(TimestampTz ts);
 extern void pgstat_archiver_snapshot_cb(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern void pgstat_flush_per_backend(bool nowait);
+
+extern PgStat_BackendPendingIO *pgstat_prep_per_backend_pending(ProcNumber procnum);
+extern bool pgstat_per_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..ba8c3cfc88 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in per-backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1259,11 +1259,19 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- be sure of the state of shared buffers at the point the test is run.
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1288,17 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1318,31 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the per-backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1563,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1579,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_single_backend_io_counters() does
+SELECT pg_stat_reset_single_backend_io_counters(pg_backend_pid());
+ pg_stat_reset_single_backend_io_counters 
+------------------------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..f17799402f 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in per-backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -607,20 +607,32 @@ SELECT pg_stat_get_subscription_stats(NULL);
 
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,6 +642,17 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the per-backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
 
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
@@ -762,10 +785,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_single_backend_io_counters() does
+SELECT pg_stat_reset_single_backend_io_counters(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
 
 -- test BRIN index doesn't block HOT update
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..398dd92527 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2121,6 +2121,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2136,6 +2137,8 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
+PgStat_BackendPendingIO
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.34.1

#52Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#51)
Re: per backend I/O statistics

On Thu, Dec 12, 2024 at 02:02:38PM +0000, Bertrand Drouvot wrote:

On Thu, Dec 12, 2024 at 01:52:03PM +0900, Michael Paquier wrote:
Yeah, but it's needed in pg_stat_get_backend_io() for the stats filtering (to
display only those linked to this backend type), later in the function here:

+    /*
+     * Some combinations of BackendType, IOObject, and IOContext are
+     * not valid for any type of IOOp. In such cases, omit the entire
+     * row from the view.
+     */
+    if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+          continue;

OTOH, now that we get rid of the AuxiliaryPidGetProc() call in
pg_stat_reset_single_backend_io_counters() we can also remove the need to
look for the backend type in this function.

Ah, you need that for the pgstat_tracks_io_object() filtering. I'm
wondering if we should think about making that cheaper at some point.
As a O(N^2) when coupling a call of pg_stat_get_backend_io() with a
scan of pg_stat_activity, perhaps it does not matter much anyway..

Anyway, isn't it possible that this lookup loop finishes by finding
nothing depending on concurrent updates of other beentries? It sounds
to me that this warrants an early exit in the function.

BTW, not related to this particular patch but I realized that pgstat_flush_io()
is called for the walwriter. Indeed, it's coming from the pgstat_report_wal()
call in WalWriterMain(). That can not report any I/O stats activity (as the
walwriter is not part of the I/O stats tracking, see pgstat_tracks_io_bktype()).

So it looks like, we could move pgstat_flush_io() outside of pgstat_report_wal()
and add the pgstat_flush_io() calls only where they need to be made (and so, not
in WalWriterMain()).

Perhaps, yes. pgstat_tracks_io_bktype() has always been discarded
walwriters since pgstat_io.c exists.

But I'm not sure that's worth it until we don't have a need to add more
pending stats per-backend, thoughts?

Okay with your argument here.

+{ oid => '8806', descr => 'statistics: per backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',

Similarly, s/pg_stat_get_backend_io/pg_stat_get_backend_stats/?

I think that's fine to keep pg_stat_get_backend_io(). If we add more per-backend
stats in the future then we could add "dedicated" get functions too and a generic
one retrieving all of them.

Okay about this layer, discarding my remark. You have a point about
that: other stats associated to a single backend may be OK if not
returned as a SRF, and they'll most likely return a different set of
attributes.

+  descr => 'statistics: reset collected IO statistics for a single backend',
+  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 'v', 

And here, pg_stat_reset_backend_stats?

Same as above, we could imagine that in the future the backend would get mutiple
stats and that one would want to reset only the I/O ones for example.

Disagreed about this part. It is slightly simpler to do a full reset
of the stats in a single entry. If another subset of stats is added
to the backend-level entries, we could always introduce a new function
that has more control over what subset of a single backend entry is
reset. And I'm pretty sure that we are going to need the function
that does the full reset anyway.

I
would just add a note in the docs with a query showing how to use it
with pg_stat_activity. An example with LATERAL, doing the same work:
select a.pid, s.* from pg_stat_activity as a,
lateral pg_stat_get_backend_io(a.pid) as s
where pid = pg_backend_pid();

I'm not sure it's worth it. I think that's clear that to get our own stats
then we need to provide our own backend pid. For example pg_stat_get_activity()
does not provide such an example using pg_stat_activity or using something like
pg_stat_get_activity(pg_backend_pid()).

Okay.

As far as I can see, the patch relies entirely on write_to_file to
prevent any entries to be flushed out. It means that we leave in the
dshash entries that may sit idle for as long as the server is up once
a pgproc slot is used at least once. This scales depending on
max_connections. It also means that we skip the sanity check about
dropped entries at shutdown, which may be a good thing to do because
we don't need to loop through them when writing the stats file. Hmm.
Could it be better to be more aggressive with the handling of these
stats, marking them as dropped when their backend exists and cleanup
the dshash, without relying on the write flag to make sure that all
the entries are discarded at shutdown? The point is that we do
shutdown in a controlled manner, with all backends exiting before the
checkpointer writes the stats file after the shutdown checkpoint is
completed. The patch handles things so as entries are reset when a
procnum is reused, leaving past stats around until that happens. We
should perhaps aim for more consistency with the way beentry is
refreshed and be more proactive with the backend entry drop or reset
at backend shutdown (pgstat_beshutdown_hook?), so as what is in the
dshash reflects exactly what's in shared memory for each PGPROC and
beentry.

Not sure that the "_per_" added in the various references of the patch
are good to keep, like pgstat_tracks_per_backend_bktype. These could
be removed, I guess, doing also a PGSTAT_KIND_PER_BACKEND =>
PGSTAT_KIND_BACKEND?
--
Michael

#53Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#52)
1 attachment(s)
Re: per backend I/O statistics

Hi,

On Fri, Dec 13, 2024 at 11:02:53AM +0900, Michael Paquier wrote:

On Thu, Dec 12, 2024 at 02:02:38PM +0000, Bertrand Drouvot wrote:

Anyway, isn't it possible that this lookup loop finishes by finding
nothing depending on concurrent updates of other beentries? It sounds
to me that this warrants an early exit in the function.

Right, done that way in the attached.

Perhaps, yes. pgstat_tracks_io_bktype() has always been discarded
walwriters since pgstat_io.c exists.

Yeap. The comment on top of pgstat_tracks_io_bktype() says that it's not done
"for now". I think that we could update the code as proposed until it's done.

+  descr => 'statistics: reset collected IO statistics for a single backend',
+  proname => 'pg_stat_reset_single_backend_io_counters', provolatile => 'v', 

And here, pg_stat_reset_backend_stats?

Same as above, we could imagine that in the future the backend would get mutiple
stats and that one would want to reset only the I/O ones for example.

Disagreed about this part. It is slightly simpler to do a full reset
of the stats in a single entry. If another subset of stats is added
to the backend-level entries, we could always introduce a new function
that has more control over what subset of a single backend entry is
reset. And I'm pretty sure that we are going to need the function
that does the full reset anyway.

Yeah, would have added it when a new stats subset would be added. It's fine
by me to have it now though, so done that way.

As far as I can see, the patch relies entirely on write_to_file to
prevent any entries to be flushed out.

Yes.

It means that we leave in the
dshash entries that may sit idle for as long as the server is up once
a pgproc slot is used at least once. This scales depending on
max_connections. It also means that we skip the sanity check about
dropped entries at shutdown, which may be a good thing to do because
we don't need to loop through them when writing the stats file.

Agree.

Hmm.
Could it be better to be more aggressive with the handling of these
stats, marking them as dropped when their backend exists and cleanup
the dshash, without relying on the write flag to make sure that all
the entries are discarded at shutdown?

Yeah we can do it to be consistent with other stats kind, done.

The point is that we do
shutdown in a controlled manner, with all backends exiting before the
checkpointer writes the stats file after the shutdown checkpoint is
completed. The patch handles things so as entries are reset when a
procnum is reused, leaving past stats around until that happens. We
should perhaps aim for more consistency with the way beentry is
refreshed and be more proactive with the backend entry drop or reset
at backend shutdown (pgstat_beshutdown_hook?), so as what is in the
dshash reflects exactly what's in shared memory for each PGPROC and
beentry.

That can't be done in pgstat_beshutdown_hook because pgstat_shutdown_hook is called
before and so resets the pgStatLocal.shared_hash during pgstat_detach_shmem().

So, did it in pgstat_shutdown_hook instead.

Not sure that the "_per_" added in the various references of the patch
are good to keep, like pgstat_tracks_per_backend_bktype. These could
be removed, I guess, doing also a PGSTAT_KIND_PER_BACKEND =>
PGSTAT_KIND_BACKEND?

Yeah makes sense, that's consistent with other kinds: done.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v8-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 1b03df28ceea635a2687baee4f8d1b3a3c1ae728 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 28 Oct 2024 12:50:32 +0000
Subject: [PATCH v8] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds the
ability to track and display per backend I/O statistics.

It adds a new statistics kind and 2 new functions:

- pg_stat_reset_backend_stats() to be able to reset the stats for a given
backend pid.
- pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid.

The new KIND is named PGSTAT_KIND_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so using
"write_to_file = false".

Note that per backend I/O statistics are not collected for the checkpointer,
the background writer, the startup process and the autovacuum launcher as those
are already visible in pg_stat_io and there is only one of those.

XXX: Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                     |   8 +-
 doc/src/sgml/monitoring.sgml                 |  37 ++++
 src/backend/catalog/system_functions.sql     |   2 +
 src/backend/utils/activity/Makefile          |   1 +
 src/backend/utils/activity/backend_status.c  |   4 +
 src/backend/utils/activity/meson.build       |   1 +
 src/backend/utils/activity/pgstat.c          |  22 ++-
 src/backend/utils/activity/pgstat_backend.c  | 182 +++++++++++++++++++
 src/backend/utils/activity/pgstat_io.c       |  25 ++-
 src/backend/utils/activity/pgstat_relation.c |   2 +
 src/backend/utils/adt/pgstatfuncs.c          | 169 +++++++++++++++++
 src/include/catalog/pg_proc.dat              |  14 ++
 src/include/pgstat.h                         |  31 +++-
 src/include/utils/pgstat_internal.h          |  14 ++
 src/test/regress/expected/stats.out          |  72 +++++++-
 src/test/regress/sql/stats.sql               |  38 +++-
 src/tools/pgindent/typedefs.list             |   3 +
 17 files changed, 604 insertions(+), 21 deletions(-)
   9.9% doc/src/sgml/
  31.2% src/backend/utils/activity/
  22.1% src/backend/utils/adt/
   4.4% src/include/catalog/
   7.1% src/include/
  12.9% src/test/regress/expected/
  11.5% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0c8325a39..8afca9b110 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8403,9 +8403,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
-        <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
-        is used, in the output of <xref linkend="sql-vacuum"/> when
+        <structname>pg_stat_io</structname></link>, in the output of the
+        <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function, in the
+        output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal>
+        option is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
         for auto-vacuums and auto-analyzes, when <xref
         linkend="guc-log-autovacuum-min-duration"/> is set and by
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..4a9464da61 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4790,6 +4790,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry id="pg-stat-get-backend-io" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4971,6 +4990,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_backend_stats</primary>
+        </indexterm>
+        <function>pg_stat_reset_backend_stats</function> ( <type>integer</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets statistics for a single backend with the specified process ID
+        to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..14eb99cd47 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_backend_stats(integer) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17..24b64a2742 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -20,6 +20,7 @@ OBJS = \
 	backend_status.o \
 	pgstat.o \
 	pgstat_archiver.o \
+	pgstat_backend.o \
 	pgstat_bgwriter.o \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 22c6dc378c..7f74011a00 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -249,6 +249,10 @@ pgstat_beinit(void)
 	Assert(MyProcNumber >= 0 && MyProcNumber < NumBackendStatSlots);
 	MyBEEntry = &BackendStatusArray[MyProcNumber];
 
+	/* Create the backend statistics entry */
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+		pgstat_create_backend_stat(MyProcNumber);
+
 	/* Set up a process-exit hook to clean up */
 	on_shmem_exit(pgstat_beshutdown_hook, 0);
 }
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index f73c22905c..380d3dd70c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'backend_status.c',
   'pgstat.c',
   'pgstat_archiver.c',
+  'pgstat_backend.c',
   'pgstat_bgwriter.c',
   'pgstat_checkpointer.c',
   'pgstat_database.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 18b7d9b47d..563f2443f8 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -77,6 +77,7 @@
  *
  * Each statistics kind is handled in a dedicated file:
  * - pgstat_archiver.c
+ * - pgstat_backend.c
  * - pgstat_bgwriter.c
  * - pgstat_checkpointer.c
  * - pgstat_database.c
@@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_BACKEND] = {
+		.name = "backend",
+
+		.fixed_amount = false,
+		.write_to_file = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_BackendPendingIO),
+
+		.flush_pending_cb = pgstat_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -602,6 +619,9 @@ pgstat_shutdown_hook(int code, Datum arg)
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
 
+	/* drop the backend stats entry */
+	pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber);
+
 	pgstat_detach_shmem();
 
 #ifdef USE_ASSERT_CHECKING
@@ -768,7 +788,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / backend / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush of fixed-numbered stats */
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
new file mode 100644
index 0000000000..e2d83024c2
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -0,0 +1,182 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_backend.c
+ *	  Implementation of backend statistics.
+ *
+ * This file contains the implementation of backend statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_backend.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+/*
+ * Returns backend's IO stats.
+ */
+PgStat_Backend *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_BackendPendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
+/*
+ * Simpler wrapper of pgstat_backend_flush_cb()
+ */
+void
+pgstat_flush_backend(bool nowait)
+{
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_EntryRef *entry_ref;
+
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+		(void) pgstat_backend_flush_cb(entry_ref, nowait);
+	}
+}
+
+/*
+ * Create the backend statistics entry for procnum.
+ */
+void
+pgstat_create_backend_stat(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+}
+
+/*
+ * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ */
+PgStat_BackendPendingIO *
+pgstat_prep_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
+/*
+ * Backend statistics are not collected for all BackendTypes.
+ *
+ * The following BackendTypes do not participate in the backend stats
+ * subsystem:
+ * - The same and for the same reasons as in pgstat_tracks_io_bktype().
+ * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their
+ * I/O stats are already visible in pg_stat_io and there is only one of those.
+ *
+ * Function returns true if BackendType participates in the backend stats
+ * subsystem for IO and false if it does not.
+ *
+ * When adding a new BackendType, also consider adding relevant restrictions to
+ * pgstat_tracks_io_object() and pgstat_tracks_io_op().
+ */
+bool
+pgstat_tracks_backend_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_AUTOVAC_LAUNCHER:
+		case B_DEAD_END_BACKEND:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+		case B_WAL_SUMMARIZER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+			return false;
+
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_STANDALONE_BACKEND:
+		case B_SLOTSYNC_WORKER:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9883af2b3..e0c206a453 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,13 +20,7 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
+typedef PgStat_BackendPendingIO PgStat_PendingIO;
 
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
@@ -87,6 +81,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_PendingIO *entry_ref;
+
+		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+		entry_ref->counts[io_object][io_context][io_op] += cnt;
+	}
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
@@ -148,6 +150,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+
+		if (pgstat_tracks_backend_bktype(MyBackendType))
+		{
+			PgStat_PendingIO *entry_ref;
+
+			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+			INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+						   io_time);
+		}
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index faba8b64d2..85e65557bb 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,6 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
@@ -350,6 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cdf37403e9..b939551d36 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,6 +1474,154 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend *backend_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+	int			pid;
+	PGPROC	   *proc;
+	ProcNumber	procNumber;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	pid = PG_GETARG_INT32(0);
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		return (Datum) 0;
+
+	procNumber = GetNumberFromPGProc(proc);
+	backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+
+	if (!backend_stats)
+		return (Datum) 0;
+
+	bktype = B_INVALID;
+
+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* Looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}
+
+	/* Backend is gone */
+	if (bktype == B_INVALID)
+		return (Datum) 0;
+
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1779,6 +1927,27 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+Datum
+pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0f22c21723..437157ffa3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5913,6 +5913,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6052,6 +6061,11 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '9987',
+  descr => 'statistics: reset statistics for a single backend',
+  proname => 'pg_stat_reset_backend_stats', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_backend_stats' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ebfeef2f46..479773cfd2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_BACKEND	6	/* per-backend statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -362,12 +363,23 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_BackendPendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BackendPendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -549,6 +561,13 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
 extern void pgstat_report_archiver(const char *xlog, bool failed);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern bool pgstat_tracks_backend_bktype(BackendType bktype);
+extern void pgstat_create_backend_stat(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 7338bc1e28..811ed9b005 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -604,6 +609,15 @@ extern void pgstat_archiver_init_shmem_cb(void *stats);
 extern void pgstat_archiver_reset_all_cb(TimestampTz ts);
 extern void pgstat_archiver_snapshot_cb(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern void pgstat_flush_backend(bool nowait);
+
+extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..3447e7b75d 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1259,11 +1259,19 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- be sure of the state of shared buffers at the point the test is run.
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1288,17 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1318,31 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1563,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1579,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+ pg_stat_reset_backend_stats 
+-----------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..9c925005be 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -607,20 +607,32 @@ SELECT pg_stat_get_subscription_stats(NULL);
 
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,6 +642,17 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
 
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
@@ -762,10 +785,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
 
 -- test BRIN index doesn't block HOT update
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..398dd92527 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2121,6 +2121,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2136,6 +2137,8 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
+PgStat_BackendPendingIO
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.34.1

#54Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#53)
Re: per backend I/O statistics

On Fri, Dec 13, 2024 at 09:20:13AM +0000, Bertrand Drouvot wrote:

Yeah makes sense, that's consistent with other kinds: done.

This looks to be taking shape. I don't have much more comments.

Not feeling so sure about the value brought by the backend_type
returned in pg_stat_get_backend_io(), but well..

+	/* drop the backend stats entry */
+	pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber);

Oh, I've missed something. Shouldn't pgstat_request_entry_refs_gc()
be called when this returns false?

The creation of the dshash entry is a bit too early, I think.. How
about delaying it more so as we don't create entries that could be
useless if we fail the last steps of authentication? One spot would
be to delay the creation of the new entry at the end of
pgstat_bestart(), where we know that we are done with authentication
and that the backend is ready to report back to the client connected
to it. It is true that some subsystems could produce stats as of the
early transactions they generate, which is why pgstat_initialize() is
done that early in BaseInit(), but that's not really relevant for this
case?

I'm still feeling a bit uneasy about the drop done in
pgstat_shutdown_hook(); it would be nice to make sure that this
happens in a path that would run just after we're done with the
creation of the entry to limit windows where we have an entry but no
way to drop it, or vice-versa, so as the shutdown assertions would
never trigger. Perhaps there's an argument for an entirely separate
callback that would run before pgstat is plugged off, like a new
before_shmem_exit() callback registered after the entry is created?
--
Michael

#55Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#54)
1 attachment(s)
Re: per backend I/O statistics

Hi,

On Mon, Dec 16, 2024 at 05:07:52PM +0900, Michael Paquier wrote:

On Fri, Dec 13, 2024 at 09:20:13AM +0000, Bertrand Drouvot wrote:

Not feeling so sure about the value brought by the backend_type
returned in pg_stat_get_backend_io(), but well..

It's not necessary per say, but it ensures that pg_stat_get_backend_io() does not
return rows full of "invalid" combinations.

For example:

With the filtering in place, on a "client backend" we would get:

postgres=# select * from pg_stat_get_backend_io(pg_backend_pid());
backend_type | object | context | reads | read_time | writes | write_time | writebacks | writeback_time | extends | extend_time | op_bytes | hits | evictions | reuses | fsyncs | fsync_time | stats_reset
----------------+---------------+-----------+-------+-----------+--------+------------+------------+----------------+---------+-------------+----------+------+-----------+--------+--------+------------+-------------
client backend | relation | bulkread | 0 | 0 | 0 | 0 | 0 | 0 | | | 8192 | 0 | 0 | 0 | | |
client backend | relation | bulkwrite | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
client backend | relation | normal | 86 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 1604 | 0 | | 0 | 0 |
client backend | relation | vacuum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
client backend | temp relation | normal | 0 | 0 | 0 | 0 | | | 0 | 0 | 8192 | 0 | 0 | | | |
(5 rows)

and for example for the walsender:

postgres=# select * from pg_stat_get_backend_io(3982910);
backend_type | object | context | reads | read_time | writes | write_time | writebacks | writeback_time | extends | extend_time | op_bytes | hits | evictions | reuses | fsyncs | fsync_time | stats_reset
--------------+---------------+-----------+-------+-----------+--------+------------+------------+----------------+---------+-------------+----------+------+-----------+--------+--------+------------+-------------
walsender | relation | bulkread | 0 | 0 | 0 | 0 | 0 | 0 | | | 8192 | 0 | 0 | 0 | | |
walsender | relation | bulkwrite | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
walsender | relation | normal | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 54 | 0 | | 0 | 0 |
walsender | relation | vacuum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
walsender | temp relation | normal | 0 | 0 | 0 | 0 | | | 0 | 0 | 8192 | 0 | 0 | | | |
(5 rows)

While, without the filtering we would get:

postgres=# select * from pg_stat_get_backend_io(pg_backend_pid());
backend_type | object | context | reads | read_time | writes | write_time | writebacks | writeback_time | extends | extend_time | op_bytes | hits | evictions | reuses | fsyncs | fsync_time | stats_reset
----------------+---------------+-----------+-------+-----------+--------+------------+------------+----------------+---------+-------------+----------+------+-----------+--------+--------+------------+-------------
client backend | relation | bulkread | 0 | 0 | 0 | 0 | 0 | 0 | | | 8192 | 0 | 0 | 0 | | |
client backend | relation | bulkwrite | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
client backend | relation | normal | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 50 | 0 | | 0 | 0 |
client backend | relation | vacuum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
client backend | temp relation | bulkread | | | | | | | | | 8192 | | | | | |
client backend | temp relation | bulkwrite | | | | | | | | | 8192 | | | | | |
client backend | temp relation | normal | 0 | 0 | 0 | 0 | | | 0 | 0 | 8192 | 0 | 0 | | | |
client backend | temp relation | vacuum | | | | | | | | | 8192 | | | | | |
(8 rows)

and for a walsender:

postgres=# select * from pg_stat_get_backend_io(3981588);
backend_type | object | context | reads | read_time | writes | write_time | writebacks | writeback_time | extends | extend_time | op_bytes | hits | evictions | reuses | fsyncs | fsync_time | stats_reset
--------------+---------------+-----------+-------+-----------+--------+------------+------------+----------------+---------+-------------+----------+------+-----------+--------+--------+------------+-------------
walsender | relation | bulkread | 0 | 0 | 0 | 0 | 0 | 0 | | | 8192 | 0 | 0 | 0 | | |
walsender | relation | bulkwrite | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
walsender | relation | normal | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 48 | 0 | | 0 | 0 |
walsender | relation | vacuum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 8192 | 0 | 0 | 0 | | |
walsender | temp relation | bulkread | | | | | | | | | 8192 | | | | | |
walsender | temp relation | bulkwrite | | | | | | | | | 8192 | | | | | |
walsender | temp relation | normal | 0 | 0 | 0 | 0 | | | 0 | 0 | 8192 | 0 | 0 | | | |
walsender | temp relation | vacuum | | | | | | | | | 8192 | | | | | |
(8 rows)

It would not be possible to remove those "extra 3 rows" with a join on the
backend type on pg_stat_activity because only the C code knows about what the
compatibility is (though pgstat_tracks_io_object).

+	/* drop the backend stats entry */
+	pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber);

Oh, I've missed something. Shouldn't pgstat_request_entry_refs_gc()
be called when this returns false?

I'm not sure that would be possible for a PGSTAT_KIND_BACKEND kind to return
false here but I agree that's better to call pgstat_request_entry_refs_gc()
if that's the case. Done in v9 attached.

The creation of the dshash entry is a bit too early, I think.. How
about delaying it more so as we don't create entries that could be
useless if we fail the last steps of authentication? One spot would
be to delay the creation of the new entry at the end of
pgstat_bestart(), where we know that we are done with authentication
and that the backend is ready to report back to the client connected
to it. It is true that some subsystems could produce stats as of the
early transactions they generate, which is why pgstat_initialize() is
done that early in BaseInit(), but that's not really relevant for this
case?

I think that makes sense to move the stats entry creation in pgstat_bestart(),
done that way in v9.

I'm still feeling a bit uneasy about the drop done in
pgstat_shutdown_hook(); it would be nice to make sure that this
happens in a path that would run just after we're done with the
creation of the entry to limit windows where we have an entry but no
way to drop it, or vice-versa,

I don't believe that's possible as MyProcNumber can't be "reused" (and is
guaranteed to remain valid) until ProcKill() is done (which happens after the
pgstat_shutdown_hook()).

Perhaps there's an argument for an entirely separate
callback that would run before pgstat is plugged off, like a new
before_shmem_exit() callback registered after the entry is created?

As the ProcKill() is run in shmem_exit() (and so after before_shmem_exit()),
I think that the way we currently drop the backend stats entry is fine (unless
I miss something about your concern).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v9-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 5bd09f37d6fdb3db38eff8ce30677d3989fcf64c Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 28 Oct 2024 12:50:32 +0000
Subject: [PATCH v9] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds the
ability to track and display per backend I/O statistics.

It adds a new statistics kind and 2 new functions:

- pg_stat_reset_backend_stats() to be able to reset the stats for a given
backend pid.
- pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid.

The new KIND is named PGSTAT_KIND_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so using
"write_to_file = false".

Note that per backend I/O statistics are not collected for the checkpointer,
the background writer, the startup process and the autovacuum launcher as those
are already visible in pg_stat_io and there is only one of those.

XXX: Bump catalog version needs to be done.
---
 doc/src/sgml/config.sgml                     |   8 +-
 doc/src/sgml/monitoring.sgml                 |  37 ++++
 src/backend/catalog/system_functions.sql     |   2 +
 src/backend/utils/activity/Makefile          |   1 +
 src/backend/utils/activity/backend_status.c  |   4 +
 src/backend/utils/activity/meson.build       |   1 +
 src/backend/utils/activity/pgstat.c          |  23 ++-
 src/backend/utils/activity/pgstat_backend.c  | 182 +++++++++++++++++++
 src/backend/utils/activity/pgstat_io.c       |  25 ++-
 src/backend/utils/activity/pgstat_relation.c |   2 +
 src/backend/utils/adt/pgstatfuncs.c          | 169 +++++++++++++++++
 src/include/catalog/pg_proc.dat              |  14 ++
 src/include/pgstat.h                         |  31 +++-
 src/include/utils/pgstat_internal.h          |  14 ++
 src/test/regress/expected/stats.out          |  72 +++++++-
 src/test/regress/sql/stats.sql               |  38 +++-
 src/tools/pgindent/typedefs.list             |   3 +
 17 files changed, 605 insertions(+), 21 deletions(-)
   9.9% doc/src/sgml/
  31.3% src/backend/utils/activity/
  22.0% src/backend/utils/adt/
   4.4% src/include/catalog/
   7.0% src/include/
  12.8% src/test/regress/expected/
  11.5% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e0c8325a39..8afca9b110 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8403,9 +8403,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
-        <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
-        is used, in the output of <xref linkend="sql-vacuum"/> when
+        <structname>pg_stat_io</structname></link>, in the output of the
+        <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function, in the
+        output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal>
+        option is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
         for auto-vacuums and auto-analyzes, when <xref
         linkend="guc-log-autovacuum-min-duration"/> is set and by
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..4a9464da61 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4790,6 +4790,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry id="pg-stat-get-backend-io" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4971,6 +4990,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_backend_stats</primary>
+        </indexterm>
+        <function>pg_stat_reset_backend_stats</function> ( <type>integer</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets statistics for a single backend with the specified process ID
+        to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..14eb99cd47 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_backend_stats(integer) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17..24b64a2742 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -20,6 +20,7 @@ OBJS = \
 	backend_status.o \
 	pgstat.o \
 	pgstat_archiver.o \
+	pgstat_backend.o \
 	pgstat_bgwriter.o \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 22c6dc378c..e9c3b87ac2 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -426,6 +426,10 @@ pgstat_bestart(void)
 
 	PGSTAT_END_WRITE_ACTIVITY(vbeentry);
 
+	/* Create the backend statistics entry */
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+		pgstat_create_backend_stat(MyProcNumber);
+
 	/* Update app name to current GUC setting */
 	if (application_name)
 		pgstat_report_appname(application_name);
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index f73c22905c..380d3dd70c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'backend_status.c',
   'pgstat.c',
   'pgstat_archiver.c',
+  'pgstat_backend.c',
   'pgstat_bgwriter.c',
   'pgstat_checkpointer.c',
   'pgstat_database.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 18b7d9b47d..9dd9d60b00 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -77,6 +77,7 @@
  *
  * Each statistics kind is handled in a dedicated file:
  * - pgstat_archiver.c
+ * - pgstat_backend.c
  * - pgstat_bgwriter.c
  * - pgstat_checkpointer.c
  * - pgstat_database.c
@@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_BACKEND] = {
+		.name = "backend",
+
+		.fixed_amount = false,
+		.write_to_file = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_BackendPendingIO),
+
+		.flush_pending_cb = pgstat_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -602,6 +619,10 @@ pgstat_shutdown_hook(int code, Datum arg)
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
 
+	/* drop the backend stats entry */
+	if (!pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber))
+		pgstat_request_entry_refs_gc();
+
 	pgstat_detach_shmem();
 
 #ifdef USE_ASSERT_CHECKING
@@ -768,7 +789,7 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush database / relation / function / ... stats */
+	/* flush database / relation / function / backend / ... stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
 	/* flush of fixed-numbered stats */
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
new file mode 100644
index 0000000000..e2d83024c2
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -0,0 +1,182 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_backend.c
+ *	  Implementation of backend statistics.
+ *
+ * This file contains the implementation of backend statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_backend.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+/*
+ * Returns backend's IO stats.
+ */
+PgStat_Backend *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_BackendPendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
+/*
+ * Simpler wrapper of pgstat_backend_flush_cb()
+ */
+void
+pgstat_flush_backend(bool nowait)
+{
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_EntryRef *entry_ref;
+
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+		(void) pgstat_backend_flush_cb(entry_ref, nowait);
+	}
+}
+
+/*
+ * Create the backend statistics entry for procnum.
+ */
+void
+pgstat_create_backend_stat(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+}
+
+/*
+ * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ */
+PgStat_BackendPendingIO *
+pgstat_prep_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
+/*
+ * Backend statistics are not collected for all BackendTypes.
+ *
+ * The following BackendTypes do not participate in the backend stats
+ * subsystem:
+ * - The same and for the same reasons as in pgstat_tracks_io_bktype().
+ * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their
+ * I/O stats are already visible in pg_stat_io and there is only one of those.
+ *
+ * Function returns true if BackendType participates in the backend stats
+ * subsystem for IO and false if it does not.
+ *
+ * When adding a new BackendType, also consider adding relevant restrictions to
+ * pgstat_tracks_io_object() and pgstat_tracks_io_op().
+ */
+bool
+pgstat_tracks_backend_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_AUTOVAC_LAUNCHER:
+		case B_DEAD_END_BACKEND:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+		case B_WAL_SUMMARIZER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+			return false;
+
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_STANDALONE_BACKEND:
+		case B_SLOTSYNC_WORKER:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9883af2b3..e0c206a453 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,13 +20,7 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
+typedef PgStat_BackendPendingIO PgStat_PendingIO;
 
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
@@ -87,6 +81,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_PendingIO *entry_ref;
+
+		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+		entry_ref->counts[io_object][io_context][io_op] += cnt;
+	}
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
@@ -148,6 +150,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+
+		if (pgstat_tracks_backend_bktype(MyBackendType))
+		{
+			PgStat_PendingIO *entry_ref;
+
+			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+			INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+						   io_time);
+		}
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index faba8b64d2..85e65557bb 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,6 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
@@ -350,6 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cdf37403e9..b939551d36 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,6 +1474,154 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend *backend_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+	int			pid;
+	PGPROC	   *proc;
+	ProcNumber	procNumber;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	pid = PG_GETARG_INT32(0);
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		return (Datum) 0;
+
+	procNumber = GetNumberFromPGProc(proc);
+	backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+
+	if (!backend_stats)
+		return (Datum) 0;
+
+	bktype = B_INVALID;
+
+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* Looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}
+
+	/* Backend is gone */
+	if (bktype == B_INVALID)
+		return (Datum) 0;
+
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1779,6 +1927,27 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+Datum
+pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0f22c21723..437157ffa3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5913,6 +5913,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6052,6 +6061,11 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '9987',
+  descr => 'statistics: reset statistics for a single backend',
+  proname => 'pg_stat_reset_backend_stats', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_backend_stats' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ebfeef2f46..479773cfd2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_BACKEND	6	/* per-backend statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -362,12 +363,23 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_BackendPendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BackendPendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -549,6 +561,13 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
 extern void pgstat_report_archiver(const char *xlog, bool failed);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern bool pgstat_tracks_backend_bktype(BackendType bktype);
+extern void pgstat_create_backend_stat(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 7338bc1e28..811ed9b005 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -604,6 +609,15 @@ extern void pgstat_archiver_init_shmem_cb(void *stats);
 extern void pgstat_archiver_reset_all_cb(TimestampTz ts);
 extern void pgstat_archiver_snapshot_cb(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern void pgstat_flush_backend(bool nowait);
+
+extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..3447e7b75d 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1259,11 +1259,19 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- be sure of the state of shared buffers at the point the test is run.
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1288,17 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1318,31 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1563,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1579,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+ pg_stat_reset_backend_stats 
+-----------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..9c925005be 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -607,20 +607,32 @@ SELECT pg_stat_get_subscription_stats(NULL);
 
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,6 +642,17 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
 
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
@@ -762,10 +785,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
 
 -- test BRIN index doesn't block HOT update
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..398dd92527 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2121,6 +2121,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2136,6 +2137,8 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
+PgStat_BackendPendingIO
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.34.1

#56Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#55)
2 attachment(s)
Re: per backend I/O statistics

On Mon, Dec 16, 2024 at 03:42:04PM +0000, Bertrand Drouvot wrote:

It's not necessary per say, but it ensures that pg_stat_get_backend_io() does not
return rows full of "invalid" combinations.

Okay. I am not planning to fight more over this point.

I'm not sure that would be possible for a PGSTAT_KIND_BACKEND kind to return
false here but I agree that's better to call pgstat_request_entry_refs_gc()
if that's the case. Done in v9 attached.

It seems to me that it could be possible if a different backend keeps
a reference to this entry.

Perhaps there's an argument for an entirely separate
callback that would run before pgstat is plugged off, like a new
before_shmem_exit() callback registered after the entry is created?

As the ProcKill() is run in shmem_exit() (and so after before_shmem_exit()),
I think that the way we currently drop the backend stats entry is fine (unless
I miss something about your concern).

Looking closely at this one, I think that you're right as long as the
shutdown callback is registered before the entry is created. So I'm
OK because the drop is a no-op if the entry cannot be found, and we
would not try a pgstat_request_entry_refs_gc().

Anyway, I have put my hands on v9. A couple of notes, while hacking
through it. See v9-0002 for the parts I have modified, that applies
on top of your v9-0001.

You may have noticed fee2b3ea2ecdg, which led me to fix two comments
as these paths are not related to only database objects.

Added more documentation, tweaked quite a bit the comments, a bit less
the docs (no need for the two linkends) and applied an indentation.

pg_stat_get_backend_io() can be simpler, and does not need the loop
with the extra LocalPgBackendStatus. It is possible to call once
pgstat_get_beentry_by_proc_number() and retrieve the PID and the
bktype for the two sanity checks we want to do on them.

s/pgstat_fetch_proc_stat_io/pgstat_fetch_stat_backend/.
s/pgstat_create_backend_stat/pgstat_create_backend/.

I've been wondering for quite a bit about PgStat_BackendPendingIO and
PgStat_PendingIO, and concluded to define both in pgstat.h, with the
former being defined based on the latter to keep the dependency
between both at the same place.
--
Michael

Attachments:

v9-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From d0df4be6cd052fc91970ec99933268f9b55940b7 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 28 Oct 2024 12:50:32 +0000
Subject: [PATCH v9 1/2] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds the
ability to track and display per backend I/O statistics.

It adds a new statistics kind and 2 new functions:

- pg_stat_reset_backend_stats() to be able to reset the stats for a given
backend pid.
- pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid.

The new KIND is named PGSTAT_KIND_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so using
"write_to_file = false".

Note that per backend I/O statistics are not collected for the checkpointer,
the background writer, the startup process and the autovacuum launcher as those
are already visible in pg_stat_io and there is only one of those.

XXX: Bump catalog version needs to be done.
---
 src/include/catalog/pg_proc.dat              |  14 ++
 src/include/pgstat.h                         |  31 +++-
 src/include/utils/pgstat_internal.h          |  14 ++
 src/backend/catalog/system_functions.sql     |   2 +
 src/backend/utils/activity/Makefile          |   1 +
 src/backend/utils/activity/backend_status.c  |   4 +
 src/backend/utils/activity/meson.build       |   1 +
 src/backend/utils/activity/pgstat.c          |  21 +++
 src/backend/utils/activity/pgstat_backend.c  | 182 +++++++++++++++++++
 src/backend/utils/activity/pgstat_io.c       |  25 ++-
 src/backend/utils/activity/pgstat_relation.c |   2 +
 src/backend/utils/adt/pgstatfuncs.c          | 169 +++++++++++++++++
 src/test/regress/expected/stats.out          |  72 +++++++-
 src/test/regress/sql/stats.sql               |  38 +++-
 doc/src/sgml/config.sgml                     |   8 +-
 doc/src/sgml/monitoring.sgml                 |  37 ++++
 src/tools/pgindent/typedefs.list             |   3 +
 17 files changed, 604 insertions(+), 20 deletions(-)
 create mode 100644 src/backend/utils/activity/pgstat_backend.c

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0f22c21723..437157ffa3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5913,6 +5913,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6052,6 +6061,11 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '9987',
+  descr => 'statistics: reset statistics for a single backend',
+  proname => 'pg_stat_reset_backend_stats', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_backend_stats' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ebfeef2f46..479773cfd2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_BACKEND	6	/* per-backend statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -362,12 +363,23 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_BackendPendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BackendPendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -549,6 +561,13 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
 extern void pgstat_report_archiver(const char *xlog, bool failed);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern bool pgstat_tracks_backend_bktype(BackendType bktype);
+extern void pgstat_create_backend_stat(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 7338bc1e28..811ed9b005 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -604,6 +609,15 @@ extern void pgstat_archiver_init_shmem_cb(void *stats);
 extern void pgstat_archiver_reset_all_cb(TimestampTz ts);
 extern void pgstat_archiver_snapshot_cb(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern void pgstat_flush_backend(bool nowait);
+
+extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..14eb99cd47 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_backend_stats(integer) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17..24b64a2742 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -20,6 +20,7 @@ OBJS = \
 	backend_status.o \
 	pgstat.o \
 	pgstat_archiver.o \
+	pgstat_backend.o \
 	pgstat_bgwriter.o \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 22c6dc378c..e9c3b87ac2 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -426,6 +426,10 @@ pgstat_bestart(void)
 
 	PGSTAT_END_WRITE_ACTIVITY(vbeentry);
 
+	/* Create the backend statistics entry */
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+		pgstat_create_backend_stat(MyProcNumber);
+
 	/* Update app name to current GUC setting */
 	if (application_name)
 		pgstat_report_appname(application_name);
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index f73c22905c..380d3dd70c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'backend_status.c',
   'pgstat.c',
   'pgstat_archiver.c',
+  'pgstat_backend.c',
   'pgstat_bgwriter.c',
   'pgstat_checkpointer.c',
   'pgstat_database.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b4e357c8a4..b72c779b2c 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -77,6 +77,7 @@
  *
  * Each statistics kind is handled in a dedicated file:
  * - pgstat_archiver.c
+ * - pgstat_backend.c
  * - pgstat_bgwriter.c
  * - pgstat_checkpointer.c
  * - pgstat_database.c
@@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_BACKEND] = {
+		.name = "backend",
+
+		.fixed_amount = false,
+		.write_to_file = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_BackendPendingIO),
+
+		.flush_pending_cb = pgstat_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -602,6 +619,10 @@ pgstat_shutdown_hook(int code, Datum arg)
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
 
+	/* drop the backend stats entry */
+	if (!pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber))
+		pgstat_request_entry_refs_gc();
+
 	pgstat_detach_shmem();
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
new file mode 100644
index 0000000000..e2d83024c2
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -0,0 +1,182 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_backend.c
+ *	  Implementation of backend statistics.
+ *
+ * This file contains the implementation of backend statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_backend.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+/*
+ * Returns backend's IO stats.
+ */
+PgStat_Backend *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_BackendPendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
+/*
+ * Simpler wrapper of pgstat_backend_flush_cb()
+ */
+void
+pgstat_flush_backend(bool nowait)
+{
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_EntryRef *entry_ref;
+
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+		(void) pgstat_backend_flush_cb(entry_ref, nowait);
+	}
+}
+
+/*
+ * Create the backend statistics entry for procnum.
+ */
+void
+pgstat_create_backend_stat(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+}
+
+/*
+ * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ */
+PgStat_BackendPendingIO *
+pgstat_prep_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
+/*
+ * Backend statistics are not collected for all BackendTypes.
+ *
+ * The following BackendTypes do not participate in the backend stats
+ * subsystem:
+ * - The same and for the same reasons as in pgstat_tracks_io_bktype().
+ * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their
+ * I/O stats are already visible in pg_stat_io and there is only one of those.
+ *
+ * Function returns true if BackendType participates in the backend stats
+ * subsystem for IO and false if it does not.
+ *
+ * When adding a new BackendType, also consider adding relevant restrictions to
+ * pgstat_tracks_io_object() and pgstat_tracks_io_op().
+ */
+bool
+pgstat_tracks_backend_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_AUTOVAC_LAUNCHER:
+		case B_DEAD_END_BACKEND:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+		case B_WAL_SUMMARIZER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+			return false;
+
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_STANDALONE_BACKEND:
+		case B_SLOTSYNC_WORKER:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9883af2b3..e0c206a453 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,13 +20,7 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
+typedef PgStat_BackendPendingIO PgStat_PendingIO;
 
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
@@ -87,6 +81,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_PendingIO *entry_ref;
+
+		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+		entry_ref->counts[io_object][io_context][io_op] += cnt;
+	}
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
@@ -148,6 +150,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+
+		if (pgstat_tracks_backend_bktype(MyBackendType))
+		{
+			PgStat_PendingIO *entry_ref;
+
+			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+			INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+						   io_time);
+		}
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index faba8b64d2..85e65557bb 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,6 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
@@ -350,6 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cdf37403e9..b939551d36 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,6 +1474,154 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend *backend_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+	int			pid;
+	PGPROC	   *proc;
+	ProcNumber	procNumber;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	pid = PG_GETARG_INT32(0);
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		return (Datum) 0;
+
+	procNumber = GetNumberFromPGProc(proc);
+	backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+
+	if (!backend_stats)
+		return (Datum) 0;
+
+	bktype = B_INVALID;
+
+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* Looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}
+
+	/* Backend is gone */
+	if (bktype == B_INVALID)
+		return (Datum) 0;
+
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1779,6 +1927,27 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+Datum
+pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..3447e7b75d 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1259,11 +1259,19 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- be sure of the state of shared buffers at the point the test is run.
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1288,17 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1318,31 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1563,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1579,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+ pg_stat_reset_backend_stats 
+-----------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..9c925005be 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -607,20 +607,32 @@ SELECT pg_stat_get_subscription_stats(NULL);
 
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,6 +642,17 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
 
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
@@ -762,10 +785,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
 
 -- test BRIN index doesn't block HOT update
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24bd504c21..fbdd6ce574 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8403,9 +8403,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
-        <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
-        is used, in the output of <xref linkend="sql-vacuum"/> when
+        <structname>pg_stat_io</structname></link>, in the output of the
+        <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function, in the
+        output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal>
+        option is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
         for auto-vacuums and auto-analyzes, when <xref
         linkend="guc-log-autovacuum-min-duration"/> is set and by
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..4a9464da61 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4790,6 +4790,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry id="pg-stat-get-backend-io" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4971,6 +4990,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_backend_stats</primary>
+        </indexterm>
+        <function>pg_stat_reset_backend_stats</function> ( <type>integer</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets statistics for a single backend with the specified process ID
+        to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..398dd92527 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2121,6 +2121,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2136,6 +2137,8 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
+PgStat_BackendPendingIO
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.45.2

v9-0002-Tweaks-on-top-of-v9.patchtext/x-diff; charset=us-asciiDownload
From 8c7fd33e98ec958308979dba9cb7897787ddc48e Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 17 Dec 2024 15:15:23 +0900
Subject: [PATCH v9 2/2] Tweaks on top of v9

Other notes for commit:
Requires a bump of CATALOG_VERSION_NO.
Requires a bump of PGSTAT_FILE_FORMAT_ID.
---
 src/include/catalog/pg_proc.dat             |  3 +-
 src/include/pgstat.h                        | 11 +++--
 src/backend/utils/activity/backend_status.c |  2 +-
 src/backend/utils/activity/pgstat_backend.c | 38 ++++++++-------
 src/backend/utils/activity/pgstat_io.c      |  2 -
 src/backend/utils/adt/pgstatfuncs.c         | 53 +++++++++------------
 src/test/regress/expected/stats.out         | 28 +++++++----
 src/test/regress/sql/stats.sql              | 12 +++--
 doc/src/sgml/monitoring.sgml                | 10 ++--
 9 files changed, 86 insertions(+), 73 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 437157ffa3..2dcc2d42da 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6061,8 +6061,7 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
-{ oid => '9987',
-  descr => 'statistics: reset statistics for a single backend',
+{ oid => '8807', descr => 'statistics: reset statistics for a single backend',
   proname => 'pg_stat_reset_backend_stats', provolatile => 'v',
   prorettype => 'void', proargtypes => 'int4',
   prosrc => 'pg_stat_reset_backend_stats' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 479773cfd2..56ed1b8bb3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -363,11 +363,11 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
-typedef struct PgStat_BackendPendingIO
+typedef struct PgStat_PendingIO
 {
 	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_BackendPendingIO;
+} PgStat_PendingIO;
 
 typedef struct PgStat_IO
 {
@@ -375,6 +375,9 @@ typedef struct PgStat_IO
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+/* Backend statistics store the same amount of IO data as PGSTAT_KIND_IO */
+typedef PgStat_PendingIO PgStat_BackendPendingIO;
+
 typedef struct PgStat_Backend
 {
 	TimestampTz stat_reset_timestamp;
@@ -565,9 +568,9 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
  * Functions in pgstat_backend.c
  */
 
-extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
 extern bool pgstat_tracks_backend_bktype(BackendType bktype);
-extern void pgstat_create_backend_stat(ProcNumber procnum);
+extern void pgstat_create_backend(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index e9c3b87ac2..bf33e33a4e 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -428,7 +428,7 @@ pgstat_bestart(void)
 
 	/* Create the backend statistics entry */
 	if (pgstat_tracks_backend_bktype(MyBackendType))
-		pgstat_create_backend_stat(MyProcNumber);
+		pgstat_create_backend(MyProcNumber);
 
 	/* Update app name to current GUC setting */
 	if (application_name)
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index e2d83024c2..6b2c9baa8c 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -3,11 +3,17 @@
  * pgstat_backend.c
  *	  Implementation of backend statistics.
  *
- * This file contains the implementation of backend statistics. It is kept
+ * This file contains the implementation of backend statistics.  It is kept
  * separate from pgstat.c to enforce the line between the statistics access /
- * storage implementation and the details about individual types of statistics.
+ * storage implementation and the details about individual types of
+ * statistics.
  *
- * Copyright (c) 2024, PostgreSQL Global Development Group
+ * This statistics kind uses a proc number as object ID for the hash table
+ * of pgstats.  Entries are created each time a process is spawned, and are
+ * dropped when the process exits.  These are not written to the pgstats file
+ * on disk.
+ *
+ * Copyright (c) 2001-2024, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
  *	  src/backend/utils/activity/pgstat_backend.c
@@ -19,10 +25,10 @@
 #include "utils/pgstat_internal.h"
 
 /*
- * Returns backend's IO stats.
+ * Returns statistics of a backend by proc number.
  */
 PgStat_Backend *
-pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+pgstat_fetch_stat_backend(ProcNumber procNumber)
 {
 	PgStat_Backend *backend_entry;
 
@@ -81,21 +87,21 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 void
 pgstat_flush_backend(bool nowait)
 {
-	if (pgstat_tracks_backend_bktype(MyBackendType))
-	{
-		PgStat_EntryRef *entry_ref;
+	PgStat_EntryRef *entry_ref;
 
-		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
-										 MyProcNumber, false, NULL);
-		(void) pgstat_backend_flush_cb(entry_ref, nowait);
-	}
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+									 MyProcNumber, false, NULL);
+	(void) pgstat_backend_flush_cb(entry_ref, nowait);
 }
 
 /*
- * Create the backend statistics entry for procnum.
+ * Create backend statistics entry for proc number.
  */
 void
-pgstat_create_backend_stat(ProcNumber procnum)
+pgstat_create_backend(ProcNumber procnum)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Backend *shstatent;
@@ -113,7 +119,7 @@ pgstat_create_backend_stat(ProcNumber procnum)
 }
 
 /*
- * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
  */
 PgStat_BackendPendingIO *
 pgstat_prep_backend_pending(ProcNumber procnum)
@@ -136,7 +142,7 @@ pgstat_prep_backend_pending(ProcNumber procnum)
  * I/O stats are already visible in pg_stat_io and there is only one of those.
  *
  * Function returns true if BackendType participates in the backend stats
- * subsystem for IO and false if it does not.
+ * subsystem and false if it does not.
  *
  * When adding a new BackendType, also consider adding relevant restrictions to
  * pgstat_tracks_io_object() and pgstat_tracks_io_op().
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index e0c206a453..011a3326da 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,8 +20,6 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-typedef PgStat_BackendPendingIO PgStat_PendingIO;
-
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index b939551d36..640e687604 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,20 +1474,22 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+/*
+ * Returns I/O statistics for a backend with given PID.
+ */
 Datum
 pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 {
 	ReturnSetInfo *rsinfo;
-	PgStat_Backend *backend_stats;
 	Datum		bktype_desc;
-	PgStat_BktypeIO *bktype_stats;
 	BackendType bktype;
 	Datum		reset_time;
-	int			num_backends = pgstat_fetch_stat_numbackends();
-	int			curr_backend;
 	int			pid;
 	PGPROC	   *proc;
 	ProcNumber	procNumber;
+	PgStat_Backend *backend_stats;
+	PgStat_BktypeIO *bktype_stats;
+	PgBackendStatus *beentry;
 
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
@@ -1496,40 +1498,28 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 	proc = BackendPidGetProc(pid);
 
 	/*
-	 * Could be an auxiliary process but would not report any stats due to
-	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
-	 * AuxiliaryPidGetProc().
+	 * This could be an auxiliary process but these do not report backend
+	 * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+	 * for an extra call to AuxiliaryPidGetProc().
 	 */
 	if (!proc)
 		return (Datum) 0;
 
 	procNumber = GetNumberFromPGProc(proc);
-	backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+
+	backend_stats = pgstat_fetch_stat_backend(procNumber);
 
 	if (!backend_stats)
 		return (Datum) 0;
 
-	bktype = B_INVALID;
+	beentry = pgstat_get_beentry_by_proc_number(procNumber);
+	bktype = beentry->st_backendType;
 
-	/* Look for the backend type */
-	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
-	{
-		LocalPgBackendStatus *local_beentry;
-		PgBackendStatus *beentry;
+	/* If PID does not match, leave */
+	if (beentry->st_procpid != pid)
+		return (Datum) 0;
 
-		/* Get the next one in the list */
-		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
-		beentry = &local_beentry->backendStatus;
-
-		/* Looking for specific PID, ignore all the others */
-		if (beentry->st_procpid != pid)
-			continue;
-
-		bktype = beentry->st_backendType;
-		break;
-	}
-
-	/* Backend is gone */
+	/* Backend may be gone, so recheck in case */
 	if (bktype == B_INVALID)
 		return (Datum) 0;
 
@@ -1927,6 +1917,9 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Reset statistics of backend with given PID.
+ */
 Datum
 pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
 {
@@ -1936,9 +1929,9 @@ pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
 	proc = BackendPidGetProc(backend_pid);
 
 	/*
-	 * Could be an auxiliary process but would not report any stats due to
-	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
-	 * AuxiliaryPidGetProc().
+	 * This could be an auxiliary process but these do not report backend
+	 * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+	 * for an extra call to AuxiliaryPidGetProc().
 	 */
 	if (!proc)
 		PG_RETURN_VOID();
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 3447e7b75d..150b6dcf74 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,8 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io and in backend stats:
+-- Test that the following operations are tracked in pg_stat_io and in
+-- backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1335,14 +1336,6 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
--- Don't return any rows if querying other backend's stats that are excluded
--- from the backend stats collection (like the checkpointer).
-SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
- ?column? 
-----------
- t
-(1 row)
-
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1603,6 +1596,23 @@ SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
  t
 (1 row)
 
+-- Check invalid input for pg_stat_get_backend_io()
+SELECT pg_stat_get_backend_io(NULL);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+SELECT pg_stat_get_backend_io(0);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+-- Auxiliary processes return no data.
+SELECT pg_stat_get_backend_io(:checkpointer_pid);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 9c925005be..1e7d0ff665 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,8 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io and in backend stats:
+-- Test that the following operations are tracked in pg_stat_io and in
+-- backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -650,10 +651,6 @@ SELECT current_setting('fsync') = 'off'
   OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
       AND :my_io_sum_shared_after_fsyncs= 0);
 
--- Don't return any rows if querying other backend's stats that are excluded
--- from the backend stats collection (like the checkpointer).
-SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
-
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -801,6 +798,11 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
+-- Check invalid input for pg_stat_get_backend_io()
+SELECT pg_stat_get_backend_io(NULL);
+SELECT pg_stat_get_backend_io(0);
+-- Auxiliary processes return no data.
+SELECT pg_stat_get_backend_io(:checkpointer_pid);
 
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4a9464da61..d0d176cc54 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4801,11 +4801,13 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        <para>
         Returns I/O statistics about the backend with the specified
         process ID. The output fields are exactly the same as the ones in the
-        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
-        view. The function does not return I/O statistics for the checkpointer,
+        <structname>pg_stat_io</structname> view.
+       </para>
+       <para>
+        The function does not return I/O statistics for the checkpointer,
         the background writer, the startup process and the autovacuum launcher
-        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
-        view and there is only one of those.
+        as they are already visible in the <structname>pg_stat_io</structname>
+        view and there is only one of each.
        </para></entry>
       </row>
 
-- 
2.45.2

#57Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#56)
Re: per backend I/O statistics

Hi,

On Tue, Dec 17, 2024 at 03:26:02PM +0900, Michael Paquier wrote:

On Mon, Dec 16, 2024 at 03:42:04PM +0000, Bertrand Drouvot wrote:

Perhaps there's an argument for an entirely separate
callback that would run before pgstat is plugged off, like a new
before_shmem_exit() callback registered after the entry is created?

As the ProcKill() is run in shmem_exit() (and so after before_shmem_exit()),
I think that the way we currently drop the backend stats entry is fine (unless
I miss something about your concern).

Looking closely at this one, I think that you're right as long as the
shutdown callback is registered before the entry is created.

Yeah, which is the case.

Anyway, I have put my hands on v9. A couple of notes, while hacking
through it. See v9-0002 for the parts I have modified, that applies
on top of your v9-0001.

Thanks!

You may have noticed fee2b3ea2ecdg, which led me to fix two comments
as these paths are not related to only database objects.

Yeap ;-)

Added more documentation, tweaked quite a bit the comments, a bit less
the docs (no need for the two linkends) and applied an indentation.

Looks good to me.

pg_stat_get_backend_io() can be simpler, and does not need the loop
with the extra LocalPgBackendStatus. It is possible to call once
pgstat_get_beentry_by_proc_number() and retrieve the PID and the
bktype for the two sanity checks we want to do on them.

Looked at the changes you've done and agree that's simpler.

s/pgstat_fetch_proc_stat_io/pgstat_fetch_stat_backend/.

Not sure about this one as being called in pg_stat_get_backend_io().

s/pgstat_create_backend_stat/pgstat_create_backend/.

LGTM.

I've been wondering for quite a bit about PgStat_BackendPendingIO and
PgStat_PendingIO, and concluded to define both in pgstat.h, with the
former being defined based on the latter to keep the dependency
between both at the same place.

That's fine by me.

Also:

=== 1

the changes in pgstat_flush_backend() make sense to me.

=== 2

- * Create the backend statistics entry for procnum.
+ * Create backend statistics entry for proc number.

Yeah, "proc number" looks better as already used in multiple places.

=== 3

the changes in stats.sql look good to me.

Having said that, v9-0002 looks good to me (except the pgstat_fetch_proc_stat_io
renaming).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#58Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#57)
Re: per backend I/O statistics

On Tue, Dec 17, 2024 at 08:13:59AM +0000, Bertrand Drouvot wrote:

Having said that, v9-0002 looks good to me (except the pgstat_fetch_proc_stat_io
renaming).

This routine returns a PgStat_Backend from a pgstats entry of type
PGSTAT_KIND_BACKEND, so the name in v9-0001 would not be true once
more types of stats data are attached to this structure. The name in
v9-0002 is more consistent long-term.
--
Michael

#59Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#58)
Re: per backend I/O statistics

Hi,

On Tue, Dec 17, 2024 at 06:13:44PM +0900, Michael Paquier wrote:

On Tue, Dec 17, 2024 at 08:13:59AM +0000, Bertrand Drouvot wrote:

Having said that, v9-0002 looks good to me (except the pgstat_fetch_proc_stat_io
renaming).

This routine returns a PgStat_Backend from a pgstats entry of type
PGSTAT_KIND_BACKEND, so the name in v9-0001 would not be true once
more types of stats data are attached to this structure.

Fair enough, let's go with the name change done in v9-0002.

The name in
v9-0002 is more consistent long-term.

Agree, we may need to add parameters to it but we'll see when the time comes.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#60Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#59)
2 attachment(s)
Re: per backend I/O statistics

On Tue, Dec 17, 2024 at 09:35:37AM +0000, Bertrand Drouvot wrote:

Agree, we may need to add parameters to it but we'll see when the time comes.

Another thing that's been itching me: the loop in
pg_stat_get_backend_io() that exists as well in pg_stat_get_io() looks
worth refactoring in a single routine. The only difference between
both is the reset timestamp, still both of them can pass a value for
it depending on their stats kind entry. This shaves a bit more code
in your own patch, even if the check on pgstat_tracks_io_bktype() and
the Assert are not in the inner routine that fills the tuplestore with
the I/O data. See pg_stat_fill_io_data() in v10-0002. If you have a
better name for this routine, feel free..

What do you think about this refactoring? This should come in first,
of course, so as the final patch introducing the backend stats is
easier to parse.
--
Michael

Attachments:

v10-0001-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 75a2549543edf418b6e282acab3f8ce272c5c42a Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 28 Oct 2024 12:50:32 +0000
Subject: [PATCH v10 1/2] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds the
ability to track and display per backend I/O statistics.

It adds a new statistics kind and 2 new functions:

- pg_stat_reset_backend_stats() to be able to reset the stats for a given
backend pid.
- pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid.

The new KIND is named PGSTAT_KIND_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so using
"write_to_file = false".

Note that per backend I/O statistics are not collected for the checkpointer,
the background writer, the startup process and the autovacuum launcher as those
are already visible in pg_stat_io and there is only one of those.

XXX: Bump catalog version needs to be done.
---
 src/include/catalog/pg_proc.dat              |  14 ++
 src/include/pgstat.h                         |  31 +++-
 src/include/utils/pgstat_internal.h          |  14 ++
 src/backend/catalog/system_functions.sql     |   2 +
 src/backend/utils/activity/Makefile          |   1 +
 src/backend/utils/activity/backend_status.c  |   4 +
 src/backend/utils/activity/meson.build       |   1 +
 src/backend/utils/activity/pgstat.c          |  21 +++
 src/backend/utils/activity/pgstat_backend.c  | 182 +++++++++++++++++++
 src/backend/utils/activity/pgstat_io.c       |  25 ++-
 src/backend/utils/activity/pgstat_relation.c |   2 +
 src/backend/utils/adt/pgstatfuncs.c          | 169 +++++++++++++++++
 src/test/regress/expected/stats.out          |  72 +++++++-
 src/test/regress/sql/stats.sql               |  38 +++-
 doc/src/sgml/config.sgml                     |   8 +-
 doc/src/sgml/monitoring.sgml                 |  37 ++++
 src/tools/pgindent/typedefs.list             |   3 +
 17 files changed, 604 insertions(+), 20 deletions(-)
 create mode 100644 src/backend/utils/activity/pgstat_backend.c

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0f22c21723..437157ffa3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5913,6 +5913,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6052,6 +6061,11 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '9987',
+  descr => 'statistics: reset statistics for a single backend',
+  proname => 'pg_stat_reset_backend_stats', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_backend_stats' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ebfeef2f46..479773cfd2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_BACKEND	6	/* per-backend statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -362,12 +363,23 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_BackendPendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_BackendPendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -549,6 +561,13 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
 extern void pgstat_report_archiver(const char *xlog, bool failed);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern bool pgstat_tracks_backend_bktype(BackendType bktype);
+extern void pgstat_create_backend_stat(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 7338bc1e28..811ed9b005 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -604,6 +609,15 @@ extern void pgstat_archiver_init_shmem_cb(void *stats);
 extern void pgstat_archiver_reset_all_cb(TimestampTz ts);
 extern void pgstat_archiver_snapshot_cb(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern void pgstat_flush_backend(bool nowait);
+
+extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..14eb99cd47 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_backend_stats(integer) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17..24b64a2742 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -20,6 +20,7 @@ OBJS = \
 	backend_status.o \
 	pgstat.o \
 	pgstat_archiver.o \
+	pgstat_backend.o \
 	pgstat_bgwriter.o \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 22c6dc378c..e9c3b87ac2 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -426,6 +426,10 @@ pgstat_bestart(void)
 
 	PGSTAT_END_WRITE_ACTIVITY(vbeentry);
 
+	/* Create the backend statistics entry */
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+		pgstat_create_backend_stat(MyProcNumber);
+
 	/* Update app name to current GUC setting */
 	if (application_name)
 		pgstat_report_appname(application_name);
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index f73c22905c..380d3dd70c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'backend_status.c',
   'pgstat.c',
   'pgstat_archiver.c',
+  'pgstat_backend.c',
   'pgstat_bgwriter.c',
   'pgstat_checkpointer.c',
   'pgstat_database.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b4e357c8a4..b72c779b2c 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -77,6 +77,7 @@
  *
  * Each statistics kind is handled in a dedicated file:
  * - pgstat_archiver.c
+ * - pgstat_backend.c
  * - pgstat_bgwriter.c
  * - pgstat_checkpointer.c
  * - pgstat_database.c
@@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_BACKEND] = {
+		.name = "backend",
+
+		.fixed_amount = false,
+		.write_to_file = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_BackendPendingIO),
+
+		.flush_pending_cb = pgstat_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -602,6 +619,10 @@ pgstat_shutdown_hook(int code, Datum arg)
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
 
+	/* drop the backend stats entry */
+	if (!pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber))
+		pgstat_request_entry_refs_gc();
+
 	pgstat_detach_shmem();
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
new file mode 100644
index 0000000000..e2d83024c2
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -0,0 +1,182 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_backend.c
+ *	  Implementation of backend statistics.
+ *
+ * This file contains the implementation of backend statistics. It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of statistics.
+ *
+ * Copyright (c) 2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_backend.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+/*
+ * Returns backend's IO stats.
+ */
+PgStat_Backend *
+pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_BackendPendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
+/*
+ * Simpler wrapper of pgstat_backend_flush_cb()
+ */
+void
+pgstat_flush_backend(bool nowait)
+{
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_EntryRef *entry_ref;
+
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+		(void) pgstat_backend_flush_cb(entry_ref, nowait);
+	}
+}
+
+/*
+ * Create the backend statistics entry for procnum.
+ */
+void
+pgstat_create_backend_stat(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+}
+
+/*
+ * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ */
+PgStat_BackendPendingIO *
+pgstat_prep_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
+/*
+ * Backend statistics are not collected for all BackendTypes.
+ *
+ * The following BackendTypes do not participate in the backend stats
+ * subsystem:
+ * - The same and for the same reasons as in pgstat_tracks_io_bktype().
+ * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their
+ * I/O stats are already visible in pg_stat_io and there is only one of those.
+ *
+ * Function returns true if BackendType participates in the backend stats
+ * subsystem for IO and false if it does not.
+ *
+ * When adding a new BackendType, also consider adding relevant restrictions to
+ * pgstat_tracks_io_object() and pgstat_tracks_io_op().
+ */
+bool
+pgstat_tracks_backend_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_AUTOVAC_LAUNCHER:
+		case B_DEAD_END_BACKEND:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+		case B_WAL_SUMMARIZER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+			return false;
+
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_STANDALONE_BACKEND:
+		case B_SLOTSYNC_WORKER:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9883af2b3..e0c206a453 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,13 +20,7 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
+typedef PgStat_BackendPendingIO PgStat_PendingIO;
 
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
@@ -87,6 +81,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_PendingIO *entry_ref;
+
+		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+		entry_ref->counts[io_object][io_context][io_op] += cnt;
+	}
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
@@ -148,6 +150,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+
+		if (pgstat_tracks_backend_bktype(MyBackendType))
+		{
+			PgStat_PendingIO *entry_ref;
+
+			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+			INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+						   io_time);
+		}
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index faba8b64d2..85e65557bb 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,6 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
@@ -350,6 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cdf37403e9..b939551d36 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1474,6 +1474,154 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_Backend *backend_stats;
+	Datum		bktype_desc;
+	PgStat_BktypeIO *bktype_stats;
+	BackendType bktype;
+	Datum		reset_time;
+	int			num_backends = pgstat_fetch_stat_numbackends();
+	int			curr_backend;
+	int			pid;
+	PGPROC	   *proc;
+	ProcNumber	procNumber;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	pid = PG_GETARG_INT32(0);
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		return (Datum) 0;
+
+	procNumber = GetNumberFromPGProc(proc);
+	backend_stats = pgstat_fetch_proc_stat_io(procNumber);
+
+	if (!backend_stats)
+		return (Datum) 0;
+
+	bktype = B_INVALID;
+
+	/* Look for the backend type */
+	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
+	{
+		LocalPgBackendStatus *local_beentry;
+		PgBackendStatus *beentry;
+
+		/* Get the next one in the list */
+		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
+		beentry = &local_beentry->backendStatus;
+
+		/* Looking for specific PID, ignore all the others */
+		if (beentry->st_procpid != pid)
+			continue;
+
+		bktype = beentry->st_backendType;
+		break;
+	}
+
+	/* Backend is gone */
+	if (bktype == B_INVALID)
+		return (Datum) 0;
+
+	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+	bktype_stats = &backend_stats->stats;
+	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (backend_stats->stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = reset_time;
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1779,6 +1927,27 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+Datum
+pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/*
+	 * Could be an auxiliary process but would not report any stats due to
+	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
+	 * AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..3447e7b75d 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,7 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1259,11 +1259,19 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- be sure of the state of shared buffers at the point the test is run.
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1288,17 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1318,31 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1563,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1579,30 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+ pg_stat_reset_backend_stats 
+-----------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..9c925005be 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,7 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -607,20 +607,32 @@ SELECT pg_stat_get_subscription_stats(NULL);
 
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,6 +642,17 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+
+-- Don't return any rows if querying other backend's stats that are excluded
+-- from the backend stats collection (like the checkpointer).
+SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
 
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
@@ -762,10 +785,21 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
 
 -- test BRIN index doesn't block HOT update
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24bd504c21..fbdd6ce574 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8403,9 +8403,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
-        <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
-        is used, in the output of <xref linkend="sql-vacuum"/> when
+        <structname>pg_stat_io</structname></link>, in the output of the
+        <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function, in the
+        output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal>
+        option is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
         for auto-vacuums and auto-analyzes, when <xref
         linkend="guc-log-autovacuum-min-duration"/> is set and by
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..4a9464da61 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4790,6 +4790,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry id="pg-stat-get-backend-io" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view. The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
+        view and there is only one of those.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4971,6 +4990,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_backend_stats</primary>
+        </indexterm>
+        <function>pg_stat_reset_backend_stats</function> ( <type>integer</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets statistics for a single backend with the specified process ID
+        to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..398dd92527 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2121,6 +2121,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2136,6 +2137,8 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
+PgStat_BackendPendingIO
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.45.2

v10-0002-Tweaks-on-top-of-v9.patchtext/x-diff; charset=us-asciiDownload
From 0e3c2c066698acaaad69824a619e78f2cc16a358 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Wed, 18 Dec 2024 13:48:34 +0900
Subject: [PATCH v10 2/2] Tweaks on top of v9

Other notes for commit:
Requires a bump of CATALOG_VERSION_NO.
Requires a bump of PGSTAT_FILE_FORMAT_ID.
---
 src/include/catalog/pg_proc.dat             |   3 +-
 src/include/pgstat.h                        |  11 +-
 src/backend/utils/activity/backend_status.c |   2 +-
 src/backend/utils/activity/pgstat_backend.c |  38 ++-
 src/backend/utils/activity/pgstat_io.c      |   2 -
 src/backend/utils/adt/pgstatfuncs.c         | 305 ++++++++------------
 src/test/regress/expected/stats.out         |  28 +-
 src/test/regress/sql/stats.sql              |  12 +-
 doc/src/sgml/monitoring.sgml                |  10 +-
 9 files changed, 184 insertions(+), 227 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 437157ffa3..2dcc2d42da 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6061,8 +6061,7 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
-{ oid => '9987',
-  descr => 'statistics: reset statistics for a single backend',
+{ oid => '8807', descr => 'statistics: reset statistics for a single backend',
   proname => 'pg_stat_reset_backend_stats', provolatile => 'v',
   prorettype => 'void', proargtypes => 'int4',
   prosrc => 'pg_stat_reset_backend_stats' },
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 479773cfd2..56ed1b8bb3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -363,11 +363,11 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
-typedef struct PgStat_BackendPendingIO
+typedef struct PgStat_PendingIO
 {
 	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_BackendPendingIO;
+} PgStat_PendingIO;
 
 typedef struct PgStat_IO
 {
@@ -375,6 +375,9 @@ typedef struct PgStat_IO
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+/* Backend statistics store the same amount of IO data as PGSTAT_KIND_IO */
+typedef PgStat_PendingIO PgStat_BackendPendingIO;
+
 typedef struct PgStat_Backend
 {
 	TimestampTz stat_reset_timestamp;
@@ -565,9 +568,9 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
  * Functions in pgstat_backend.c
  */
 
-extern PgStat_Backend *pgstat_fetch_proc_stat_io(ProcNumber procNumber);
+extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
 extern bool pgstat_tracks_backend_bktype(BackendType bktype);
-extern void pgstat_create_backend_stat(ProcNumber procnum);
+extern void pgstat_create_backend(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index e9c3b87ac2..bf33e33a4e 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -428,7 +428,7 @@ pgstat_bestart(void)
 
 	/* Create the backend statistics entry */
 	if (pgstat_tracks_backend_bktype(MyBackendType))
-		pgstat_create_backend_stat(MyProcNumber);
+		pgstat_create_backend(MyProcNumber);
 
 	/* Update app name to current GUC setting */
 	if (application_name)
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index e2d83024c2..6b2c9baa8c 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -3,11 +3,17 @@
  * pgstat_backend.c
  *	  Implementation of backend statistics.
  *
- * This file contains the implementation of backend statistics. It is kept
+ * This file contains the implementation of backend statistics.  It is kept
  * separate from pgstat.c to enforce the line between the statistics access /
- * storage implementation and the details about individual types of statistics.
+ * storage implementation and the details about individual types of
+ * statistics.
  *
- * Copyright (c) 2024, PostgreSQL Global Development Group
+ * This statistics kind uses a proc number as object ID for the hash table
+ * of pgstats.  Entries are created each time a process is spawned, and are
+ * dropped when the process exits.  These are not written to the pgstats file
+ * on disk.
+ *
+ * Copyright (c) 2001-2024, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
  *	  src/backend/utils/activity/pgstat_backend.c
@@ -19,10 +25,10 @@
 #include "utils/pgstat_internal.h"
 
 /*
- * Returns backend's IO stats.
+ * Returns statistics of a backend by proc number.
  */
 PgStat_Backend *
-pgstat_fetch_proc_stat_io(ProcNumber procNumber)
+pgstat_fetch_stat_backend(ProcNumber procNumber)
 {
 	PgStat_Backend *backend_entry;
 
@@ -81,21 +87,21 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
 void
 pgstat_flush_backend(bool nowait)
 {
-	if (pgstat_tracks_backend_bktype(MyBackendType))
-	{
-		PgStat_EntryRef *entry_ref;
+	PgStat_EntryRef *entry_ref;
 
-		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
-										 MyProcNumber, false, NULL);
-		(void) pgstat_backend_flush_cb(entry_ref, nowait);
-	}
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+									 MyProcNumber, false, NULL);
+	(void) pgstat_backend_flush_cb(entry_ref, nowait);
 }
 
 /*
- * Create the backend statistics entry for procnum.
+ * Create backend statistics entry for proc number.
  */
 void
-pgstat_create_backend_stat(ProcNumber procnum)
+pgstat_create_backend(ProcNumber procnum)
 {
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Backend *shstatent;
@@ -113,7 +119,7 @@ pgstat_create_backend_stat(ProcNumber procnum)
 }
 
 /*
- * Find or create a local PgStat_BackendPendingIO entry for procnum.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
  */
 PgStat_BackendPendingIO *
 pgstat_prep_backend_pending(ProcNumber procnum)
@@ -136,7 +142,7 @@ pgstat_prep_backend_pending(ProcNumber procnum)
  * I/O stats are already visible in pg_stat_io and there is only one of those.
  *
  * Function returns true if BackendType participates in the backend stats
- * subsystem for IO and false if it does not.
+ * subsystem and false if it does not.
  *
  * When adding a new BackendType, also consider adding relevant restrictions to
  * pgstat_tracks_io_object() and pgstat_tracks_io_op().
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index e0c206a453..011a3326da 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,8 +20,6 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-typedef PgStat_BackendPendingIO PgStat_PendingIO;
-
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index b939551d36..18c74807a4 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1274,8 +1274,9 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 }
 
 /*
-* When adding a new column to the pg_stat_io view, add a new enum value
-* here above IO_NUM_COLUMNS.
+* When adding a new column to the pg_stat_io view and the
+* pg_stat_get_backend_io() function, add a new enum value here above
+* IO_NUM_COLUMNS.
 */
 typedef enum io_stat_col
 {
@@ -1365,184 +1366,20 @@ pg_stat_us_to_ms(PgStat_Counter val_ms)
 	return val_ms * (double) 0.001;
 }
 
-Datum
-pg_stat_get_io(PG_FUNCTION_ARGS)
+/*
+ * pg_stat_fill_io_data
+ *
+ * Helper routine for pg_stat_get_io() and pg_stat_get_backend_io(),
+ * filling in a result tuplestore with one tuple for each object and each
+ * context supported by the caller, based on the contents of bktype_stats.
+ */
+static void
+pg_stat_fill_io_data(ReturnSetInfo *rsinfo,
+					 PgStat_BktypeIO *bktype_stats,
+					 BackendType bktype,
+					 TimestampTz stat_reset_timestamp)
 {
-	ReturnSetInfo *rsinfo;
-	PgStat_IO  *backends_io_stats;
-	Datum		reset_time;
-
-	InitMaterializedSRF(fcinfo, 0);
-	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-
-	backends_io_stats = pgstat_fetch_stat_io();
-
-	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
-
-	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
-	{
-		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
-		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
-
-		/*
-		 * In Assert builds, we can afford an extra loop through all of the
-		 * counters checking that only expected stats are non-zero, since it
-		 * keeps the non-Assert code cleaner.
-		 */
-		Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
-
-		/*
-		 * For those BackendTypes without IO Operation stats, skip
-		 * representing them in the view altogether.
-		 */
-		if (!pgstat_tracks_io_bktype(bktype))
-			continue;
-
-		for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
-		{
-			const char *obj_name = pgstat_get_io_object_name(io_obj);
-
-			for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
-			{
-				const char *context_name = pgstat_get_io_context_name(io_context);
-
-				Datum		values[IO_NUM_COLUMNS] = {0};
-				bool		nulls[IO_NUM_COLUMNS] = {0};
-
-				/*
-				 * Some combinations of BackendType, IOObject, and IOContext
-				 * are not valid for any type of IOOp. In such cases, omit the
-				 * entire row from the view.
-				 */
-				if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
-					continue;
-
-				values[IO_COL_BACKEND_TYPE] = bktype_desc;
-				values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
-				values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
-				values[IO_COL_RESET_TIME] = reset_time;
-
-				/*
-				 * Hard-code this to the value of BLCKSZ for now. Future
-				 * values could include XLOG_BLCKSZ, once WAL IO is tracked,
-				 * and constant multipliers, once non-block-oriented IO (e.g.
-				 * temporary file IO) is tracked.
-				 */
-				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
-
-				for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
-				{
-					int			op_idx = pgstat_get_io_op_index(io_op);
-					int			time_idx = pgstat_get_io_time_index(io_op);
-
-					/*
-					 * Some combinations of BackendType and IOOp, of IOContext
-					 * and IOOp, and of IOObject and IOOp are not tracked. Set
-					 * these cells in the view NULL.
-					 */
-					if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
-					{
-						PgStat_Counter count =
-							bktype_stats->counts[io_obj][io_context][io_op];
-
-						values[op_idx] = Int64GetDatum(count);
-					}
-					else
-						nulls[op_idx] = true;
-
-					/* not every operation is timed */
-					if (time_idx == IO_COL_INVALID)
-						continue;
-
-					if (!nulls[op_idx])
-					{
-						PgStat_Counter time =
-							bktype_stats->times[io_obj][io_context][io_op];
-
-						values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
-					}
-					else
-						nulls[time_idx] = true;
-				}
-
-				tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
-									 values, nulls);
-			}
-		}
-	}
-
-	return (Datum) 0;
-}
-
-Datum
-pg_stat_get_backend_io(PG_FUNCTION_ARGS)
-{
-	ReturnSetInfo *rsinfo;
-	PgStat_Backend *backend_stats;
-	Datum		bktype_desc;
-	PgStat_BktypeIO *bktype_stats;
-	BackendType bktype;
-	Datum		reset_time;
-	int			num_backends = pgstat_fetch_stat_numbackends();
-	int			curr_backend;
-	int			pid;
-	PGPROC	   *proc;
-	ProcNumber	procNumber;
-
-	InitMaterializedSRF(fcinfo, 0);
-	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-
-	pid = PG_GETARG_INT32(0);
-	proc = BackendPidGetProc(pid);
-
-	/*
-	 * Could be an auxiliary process but would not report any stats due to
-	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
-	 * AuxiliaryPidGetProc().
-	 */
-	if (!proc)
-		return (Datum) 0;
-
-	procNumber = GetNumberFromPGProc(proc);
-	backend_stats = pgstat_fetch_proc_stat_io(procNumber);
-
-	if (!backend_stats)
-		return (Datum) 0;
-
-	bktype = B_INVALID;
-
-	/* Look for the backend type */
-	for (curr_backend = 1; curr_backend <= num_backends; curr_backend++)
-	{
-		LocalPgBackendStatus *local_beentry;
-		PgBackendStatus *beentry;
-
-		/* Get the next one in the list */
-		local_beentry = pgstat_get_local_beentry_by_index(curr_backend);
-		beentry = &local_beentry->backendStatus;
-
-		/* Looking for specific PID, ignore all the others */
-		if (beentry->st_procpid != pid)
-			continue;
-
-		bktype = beentry->st_backendType;
-		break;
-	}
-
-	/* Backend is gone */
-	if (bktype == B_INVALID)
-		return (Datum) 0;
-
-	bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
-	bktype_stats = &backend_stats->stats;
-	reset_time = TimestampTzGetDatum(backend_stats->stat_reset_timestamp);
-
-	/*
-	 * In Assert builds, we can afford an extra loop through all of the
-	 * counters checking that only expected stats are non-zero, since it keeps
-	 * the non-Assert code cleaner.
-	 */
-	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+	Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
 
 	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
 	{
@@ -1566,8 +1403,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 			values[IO_COL_BACKEND_TYPE] = bktype_desc;
 			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
 			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
-			if (backend_stats->stat_reset_timestamp != 0)
-				values[IO_COL_RESET_TIME] = reset_time;
+			if (stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = TimestampTzGetDatum(stat_reset_timestamp);
 			else
 				nulls[IO_COL_RESET_TIME] = true;
 
@@ -1618,7 +1455,104 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 								 values, nulls);
 		}
 	}
+}
 
+Datum
+pg_stat_get_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	PgStat_IO  *backends_io_stats;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	backends_io_stats = pgstat_fetch_stat_io();
+
+	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
+	{
+		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
+
+		/*
+		 * In Assert builds, we can afford an extra loop through all of the
+		 * counters checking that only expected stats are non-zero, since it
+		 * keeps the non-Assert code cleaner.
+		 */
+		Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+		/*
+		 * For those BackendTypes without IO Operation stats, skip
+		 * representing them in the view altogether.
+		 */
+		if (!pgstat_tracks_io_bktype(bktype))
+			continue;
+
+		/* Save tuples with data from this PgStat_BktypeIO. */
+		pg_stat_fill_io_data(rsinfo, bktype_stats, bktype,
+							 backends_io_stats->stat_reset_timestamp);
+	}
+
+	return (Datum) 0;
+}
+
+/*
+ * Returns I/O statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	BackendType bktype;
+	int			pid;
+	PGPROC	   *proc;
+	ProcNumber	procNumber;
+	PgStat_Backend *backend_stats;
+	PgStat_BktypeIO *bktype_stats;
+	PgBackendStatus *beentry;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	pid = PG_GETARG_INT32(0);
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * This could be an auxiliary process but these do not report backend
+	 * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+	 * for an extra call to AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		return (Datum) 0;
+
+	procNumber = GetNumberFromPGProc(proc);
+
+	backend_stats = pgstat_fetch_stat_backend(procNumber);
+
+	if (!backend_stats)
+		return (Datum) 0;
+
+	beentry = pgstat_get_beentry_by_proc_number(procNumber);
+	bktype = beentry->st_backendType;
+
+	/* If PID does not match, leave */
+	if (beentry->st_procpid != pid)
+		return (Datum) 0;
+
+	/* Backend may be gone, so recheck in case */
+	if (bktype == B_INVALID)
+		return (Datum) 0;
+
+	bktype_stats = &backend_stats->stats;
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters checking that only expected stats are non-zero, since it keeps
+	 * the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	/* Save tuples with data from this PgStat_BktypeIO */
+	pg_stat_fill_io_data(rsinfo, bktype_stats, bktype,
+						 backend_stats->stat_reset_timestamp);
 	return (Datum) 0;
 }
 
@@ -1927,6 +1861,9 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Reset statistics of backend with given PID.
+ */
 Datum
 pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
 {
@@ -1936,9 +1873,9 @@ pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
 	proc = BackendPidGetProc(backend_pid);
 
 	/*
-	 * Could be an auxiliary process but would not report any stats due to
-	 * pgstat_tracks_backend_bktype() anyway. So don't need an extra call to
-	 * AuxiliaryPidGetProc().
+	 * This could be an auxiliary process but these do not report backend
+	 * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+	 * for an extra call to AuxiliaryPidGetProc().
 	 */
 	if (!proc)
 		PG_RETURN_VOID();
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 3447e7b75d..150b6dcf74 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,8 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io and in backend stats:
+-- Test that the following operations are tracked in pg_stat_io and in
+-- backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1335,14 +1336,6 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
--- Don't return any rows if querying other backend's stats that are excluded
--- from the backend stats collection (like the checkpointer).
-SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
- ?column? 
-----------
- t
-(1 row)
-
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1603,6 +1596,23 @@ SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
  t
 (1 row)
 
+-- Check invalid input for pg_stat_get_backend_io()
+SELECT pg_stat_get_backend_io(NULL);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+SELECT pg_stat_get_backend_io(0);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+-- Auxiliary processes return no data.
+SELECT pg_stat_get_backend_io(:checkpointer_pid);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 9c925005be..1e7d0ff665 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,8 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io and in backend stats:
+-- Test that the following operations are tracked in pg_stat_io and in
+-- backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -650,10 +651,6 @@ SELECT current_setting('fsync') = 'off'
   OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
       AND :my_io_sum_shared_after_fsyncs= 0);
 
--- Don't return any rows if querying other backend's stats that are excluded
--- from the backend stats collection (like the checkpointer).
-SELECT count(1) = 0 FROM pg_stat_get_backend_io(:checkpointer_pid);
-
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -801,6 +798,11 @@ SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) +
   FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
 
+-- Check invalid input for pg_stat_get_backend_io()
+SELECT pg_stat_get_backend_io(NULL);
+SELECT pg_stat_get_backend_io(0);
+-- Auxiliary processes return no data.
+SELECT pg_stat_get_backend_io(:checkpointer_pid);
 
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4a9464da61..d0d176cc54 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4801,11 +4801,13 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        <para>
         Returns I/O statistics about the backend with the specified
         process ID. The output fields are exactly the same as the ones in the
-        <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
-        view. The function does not return I/O statistics for the checkpointer,
+        <structname>pg_stat_io</structname> view.
+       </para>
+       <para>
+        The function does not return I/O statistics for the checkpointer,
         the background writer, the startup process and the autovacuum launcher
-        as they are already visible in the <link linkend="monitoring-pg-stat-io-view"> <structname>pg_stat_io</structname></link>
-        view and there is only one of those.
+        as they are already visible in the <structname>pg_stat_io</structname>
+        view and there is only one of each.
        </para></entry>
       </row>
 
-- 
2.45.2

#61Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#60)
2 attachment(s)
Re: per backend I/O statistics

Hi,

On Wed, Dec 18, 2024 at 01:57:53PM +0900, Michael Paquier wrote:

On Tue, Dec 17, 2024 at 09:35:37AM +0000, Bertrand Drouvot wrote:

Agree, we may need to add parameters to it but we'll see when the time comes.

Another thing that's been itching me: the loop in
pg_stat_get_backend_io() that exists as well in pg_stat_get_io() looks
worth refactoring in a single routine. The only difference between
both is the reset timestamp, still both of them can pass a value for
it depending on their stats kind entry.

Yeah, I also had something like this in mind (see R2 in [1]). So, +1 for it.

This shaves a bit more code
in your own patch, even if the check on pgstat_tracks_io_bktype() and
the Assert are not in the inner routine that fills the tuplestore with
the I/O data.

Yeah, that's fine by me.

See pg_stat_fill_io_data() in v10-0002. If you have a
better name for this routine, feel free..

I think I prefer pg_stat_io_build_tuples() and used that name in v11
attached.

What do you think about this refactoring?

Makes fully sense to me (as per my first comment above).

This should come in first,
of course, so as the final patch introducing the backend stats is
easier to parse.

Yeah, it's in 0001 attached, with a few changes:

=== 1

s/Save tuples with data from this PgStat_BktypeIO./save tuples with data from this PgStat_BktypeIO/

to be consistent with single line comments around.

=== 2

+  if (stat_reset_timestamp != 0)
+    values[IO_COL_RESET_TIME] = TimestampTzGetDatum(stat_reset_timestamp);
+  else
+    nulls[IO_COL_RESET_TIME] = true;

This is not necessary until the per backend I/O stat is introduced but I
added it in 0001 to ease 0002 parsing.

=== 3

"In Assert builds, we can afford an extra loop through all of the"

I think that this comment is now confusing since the extra loop would be
done in pg_stat_io_build_tuples() and not where the comment is written. One
option could have been to move the assert and the comment in pg_stat_io_build_tuples()
but I think it's better to have the assert before the pgstat_tracks_io_bktype()
call in pg_stat_get_io(), so modified the comment a bit instead.

0002 is the per-backend stats I/O with yours tweaks.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v11-0001-Refactor-pg_stat_get_io-to-extract-tuple-buildin.patchtext/x-diff; charset=us-asciiDownload
From 636d82991ec029c9d792d54cfdb2c665063424cf Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 18 Dec 2024 05:45:49 +0000
Subject: [PATCH v11 1/2] Refactor pg_stat_get_io() to extract tuple building
 logic

Adding pg_stat_io_build_tuples(), a helper for pg_stat_get_io() filling in a
result tuplestore. The tuple building logic is now isolated in its own function,
so that it could be reused elsewhere (like in the following backend I/O stats
commit).
---
 src/backend/utils/adt/pgstatfuncs.c | 174 ++++++++++++++++------------
 1 file changed, 97 insertions(+), 77 deletions(-)
 100.0% src/backend/utils/adt/

diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cdf37403e9..efd35f700a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1365,29 +1365,117 @@ pg_stat_us_to_ms(PgStat_Counter val_ms)
 	return val_ms * (double) 0.001;
 }
 
+/*
+ * pg_stat_io_build_tuples
+ *
+ * Helper routine for pg_stat_get_io() filling in a result tuplestore with one
+ * tuple for each object and each context supported by the caller, based on the
+ * contents of bktype_stats.
+ */
+static void
+pg_stat_io_build_tuples(ReturnSetInfo *rsinfo,
+						PgStat_BktypeIO *bktype_stats,
+						BackendType bktype,
+						TimestampTz stat_reset_timestamp)
+{
+	Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
+
+	for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
+	{
+		const char *obj_name = pgstat_get_io_object_name(io_obj);
+
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			const char *context_name = pgstat_get_io_context_name(io_context);
+
+			Datum		values[IO_NUM_COLUMNS] = {0};
+			bool		nulls[IO_NUM_COLUMNS] = {0};
+
+			/*
+			 * Some combinations of BackendType, IOObject, and IOContext are
+			 * not valid for any type of IOOp. In such cases, omit the entire
+			 * row from the view.
+			 */
+			if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
+				continue;
+
+			values[IO_COL_BACKEND_TYPE] = bktype_desc;
+			values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
+			values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
+			if (stat_reset_timestamp != 0)
+				values[IO_COL_RESET_TIME] = TimestampTzGetDatum(stat_reset_timestamp);
+			else
+				nulls[IO_COL_RESET_TIME] = true;
+
+			/*
+			 * Hard-code this to the value of BLCKSZ for now. Future values
+			 * could include XLOG_BLCKSZ, once WAL IO is tracked, and constant
+			 * multipliers, once non-block-oriented IO (e.g. temporary file
+			 * IO) is tracked.
+			 */
+			values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
+
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				int			op_idx = pgstat_get_io_op_index(io_op);
+				int			time_idx = pgstat_get_io_time_index(io_op);
+
+				/*
+				 * Some combinations of BackendType and IOOp, of IOContext and
+				 * IOOp, and of IOObject and IOOp are not tracked. Set these
+				 * cells in the view NULL.
+				 */
+				if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
+				{
+					PgStat_Counter count =
+						bktype_stats->counts[io_obj][io_context][io_op];
+
+					values[op_idx] = Int64GetDatum(count);
+				}
+				else
+					nulls[op_idx] = true;
+
+				/* not every operation is timed */
+				if (time_idx == IO_COL_INVALID)
+					continue;
+
+				if (!nulls[op_idx])
+				{
+					PgStat_Counter time =
+						bktype_stats->times[io_obj][io_context][io_op];
+
+					values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
+				}
+				else
+					nulls[time_idx] = true;
+			}
+
+			tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
+								 values, nulls);
+		}
+	}
+}
+
 Datum
 pg_stat_get_io(PG_FUNCTION_ARGS)
 {
 	ReturnSetInfo *rsinfo;
 	PgStat_IO  *backends_io_stats;
-	Datum		reset_time;
 
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 
 	backends_io_stats = pgstat_fetch_stat_io();
 
-	reset_time = TimestampTzGetDatum(backends_io_stats->stat_reset_timestamp);
-
 	for (int bktype = 0; bktype < BACKEND_NUM_TYPES; bktype++)
 	{
-		Datum		bktype_desc = CStringGetTextDatum(GetBackendTypeDesc(bktype));
 		PgStat_BktypeIO *bktype_stats = &backends_io_stats->stats[bktype];
 
 		/*
 		 * In Assert builds, we can afford an extra loop through all of the
-		 * counters checking that only expected stats are non-zero, since it
-		 * keeps the non-Assert code cleaner.
+		 * counters (in pg_stat_io_build_tuples()), checking that only
+		 * expected stats are non-zero, since it keeps the non-Assert code
+		 * cleaner.
 		 */
 		Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
 
@@ -1398,77 +1486,9 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 		if (!pgstat_tracks_io_bktype(bktype))
 			continue;
 
-		for (int io_obj = 0; io_obj < IOOBJECT_NUM_TYPES; io_obj++)
-		{
-			const char *obj_name = pgstat_get_io_object_name(io_obj);
-
-			for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
-			{
-				const char *context_name = pgstat_get_io_context_name(io_context);
-
-				Datum		values[IO_NUM_COLUMNS] = {0};
-				bool		nulls[IO_NUM_COLUMNS] = {0};
-
-				/*
-				 * Some combinations of BackendType, IOObject, and IOContext
-				 * are not valid for any type of IOOp. In such cases, omit the
-				 * entire row from the view.
-				 */
-				if (!pgstat_tracks_io_object(bktype, io_obj, io_context))
-					continue;
-
-				values[IO_COL_BACKEND_TYPE] = bktype_desc;
-				values[IO_COL_CONTEXT] = CStringGetTextDatum(context_name);
-				values[IO_COL_OBJECT] = CStringGetTextDatum(obj_name);
-				values[IO_COL_RESET_TIME] = reset_time;
-
-				/*
-				 * Hard-code this to the value of BLCKSZ for now. Future
-				 * values could include XLOG_BLCKSZ, once WAL IO is tracked,
-				 * and constant multipliers, once non-block-oriented IO (e.g.
-				 * temporary file IO) is tracked.
-				 */
-				values[IO_COL_CONVERSION] = Int64GetDatum(BLCKSZ);
-
-				for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
-				{
-					int			op_idx = pgstat_get_io_op_index(io_op);
-					int			time_idx = pgstat_get_io_time_index(io_op);
-
-					/*
-					 * Some combinations of BackendType and IOOp, of IOContext
-					 * and IOOp, and of IOObject and IOOp are not tracked. Set
-					 * these cells in the view NULL.
-					 */
-					if (pgstat_tracks_io_op(bktype, io_obj, io_context, io_op))
-					{
-						PgStat_Counter count =
-							bktype_stats->counts[io_obj][io_context][io_op];
-
-						values[op_idx] = Int64GetDatum(count);
-					}
-					else
-						nulls[op_idx] = true;
-
-					/* not every operation is timed */
-					if (time_idx == IO_COL_INVALID)
-						continue;
-
-					if (!nulls[op_idx])
-					{
-						PgStat_Counter time =
-							bktype_stats->times[io_obj][io_context][io_op];
-
-						values[time_idx] = Float8GetDatum(pg_stat_us_to_ms(time));
-					}
-					else
-						nulls[time_idx] = true;
-				}
-
-				tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
-									 values, nulls);
-			}
-		}
+		/* save tuples with data from this PgStat_BktypeIO */
+		pg_stat_io_build_tuples(rsinfo, bktype_stats, bktype,
+								backends_io_stats->stat_reset_timestamp);
 	}
 
 	return (Datum) 0;
-- 
2.34.1

v11-0002-per-backend-I-O-statistics.patchtext/x-diff; charset=us-asciiDownload
From 6447dbe922e1db379dfd65f8041bc55cf3a3c573 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 18 Dec 2024 06:39:02 +0000
Subject: [PATCH v11 2/2] per backend I/O statistics

While pg_stat_io provides cluster-wide I/O statistics, this commit adds the
ability to track and display per backend I/O statistics.

It adds a new statistics kind and 2 new functions:

- pg_stat_reset_backend_stats() to be able to reset the stats for a given
backend pid.
- pg_stat_get_backend_io() to retrieve I/O statistics for a given backend pid.

The new KIND is named PGSTAT_KIND_BACKEND as it could be used in the future
to store other statistics (than the I/O ones) per backend. The new KIND is
a variable-numbered one and has an automatic cap on the maximum number of
entries (as its hash key contains the proc number).

There is no need to write the per backend I/O stats to disk (no point to
see stats for backends that do not exist anymore after a re-start), so using
"write_to_file = false".

Note that per backend I/O statistics are not collected for the checkpointer,
the background writer, the startup process and the autovacuum launcher as those
are already visible in pg_stat_io and there is only one of those.

XXX: Requires a bump of CATALOG_VERSION_NO.
XXX: Requires a bump of PGSTAT_FILE_FORMAT_ID.
---
 doc/src/sgml/config.sgml                     |   8 +-
 doc/src/sgml/monitoring.sgml                 |  39 ++++
 src/backend/catalog/system_functions.sql     |   2 +
 src/backend/utils/activity/Makefile          |   1 +
 src/backend/utils/activity/backend_status.c  |   4 +
 src/backend/utils/activity/meson.build       |   1 +
 src/backend/utils/activity/pgstat.c          |  21 +++
 src/backend/utils/activity/pgstat_backend.c  | 188 +++++++++++++++++++
 src/backend/utils/activity/pgstat_io.c       |  25 ++-
 src/backend/utils/activity/pgstat_relation.c |   2 +
 src/backend/utils/adt/pgstatfuncs.c          |  97 +++++++++-
 src/include/catalog/pg_proc.dat              |  13 ++
 src/include/pgstat.h                         |  34 +++-
 src/include/utils/pgstat_internal.h          |  14 ++
 src/test/regress/expected/stats.out          |  82 +++++++-
 src/test/regress/sql/stats.sql               |  42 ++++-
 src/tools/pgindent/typedefs.list             |   3 +
 17 files changed, 549 insertions(+), 27 deletions(-)
  10.3% doc/src/sgml/
  34.1% src/backend/utils/activity/
  14.7% src/backend/utils/adt/
   4.7% src/include/catalog/
   8.1% src/include/
  14.6% src/test/regress/expected/
  12.4% src/test/regress/sql/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 24bd504c21..fbdd6ce574 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8403,9 +8403,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of
-        <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal> option
-        is used, in the output of <xref linkend="sql-vacuum"/> when
+        <structname>pg_stat_io</structname></link>, in the output of the
+        <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function, in the
+        output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal>
+        option is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
         for auto-vacuums and auto-analyzes, when <xref
         linkend="guc-log-autovacuum-min-duration"/> is set and by
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 840d7f8161..d0d176cc54 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4790,6 +4790,27 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry id="pg-stat-get-backend-io" role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_get_backend_io</primary>
+        </indexterm>
+        <function>pg_stat_get_backend_io</function> ( <type>integer</type> )
+        <returnvalue>setof record</returnvalue>
+       </para>
+       <para>
+        Returns I/O statistics about the backend with the specified
+        process ID. The output fields are exactly the same as the ones in the
+        <structname>pg_stat_io</structname> view.
+       </para>
+       <para>
+        The function does not return I/O statistics for the checkpointer,
+        the background writer, the startup process and the autovacuum launcher
+        as they are already visible in the <structname>pg_stat_io</structname>
+        view and there is only one of each.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4971,6 +4992,24 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_stat_reset_backend_stats</primary>
+        </indexterm>
+        <function>pg_stat_reset_backend_stats</function> ( <type>integer</type> )
+        <returnvalue>void</returnvalue>
+       </para>
+       <para>
+        Resets statistics for a single backend with the specified process ID
+        to zero.
+       </para>
+       <para>
+        This function is restricted to superusers by default, but other users
+        can be granted EXECUTE to run the function.
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
diff --git a/src/backend/catalog/system_functions.sql b/src/backend/catalog/system_functions.sql
index c51dfca802..14eb99cd47 100644
--- a/src/backend/catalog/system_functions.sql
+++ b/src/backend/catalog/system_functions.sql
@@ -711,6 +711,8 @@ REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_table_counters(oid) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_single_function_counters(oid) FROM public;
 
+REVOKE EXECUTE ON FUNCTION pg_stat_reset_backend_stats(integer) FROM public;
+
 REVOKE EXECUTE ON FUNCTION pg_stat_reset_replication_slot(text) FROM public;
 
 REVOKE EXECUTE ON FUNCTION pg_stat_have_stats(text, oid, int8) FROM public;
diff --git a/src/backend/utils/activity/Makefile b/src/backend/utils/activity/Makefile
index b9fd66ea17..24b64a2742 100644
--- a/src/backend/utils/activity/Makefile
+++ b/src/backend/utils/activity/Makefile
@@ -20,6 +20,7 @@ OBJS = \
 	backend_status.o \
 	pgstat.o \
 	pgstat_archiver.o \
+	pgstat_backend.o \
 	pgstat_bgwriter.o \
 	pgstat_checkpointer.o \
 	pgstat_database.o \
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 22c6dc378c..bf33e33a4e 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -426,6 +426,10 @@ pgstat_bestart(void)
 
 	PGSTAT_END_WRITE_ACTIVITY(vbeentry);
 
+	/* Create the backend statistics entry */
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+		pgstat_create_backend(MyProcNumber);
+
 	/* Update app name to current GUC setting */
 	if (application_name)
 		pgstat_report_appname(application_name);
diff --git a/src/backend/utils/activity/meson.build b/src/backend/utils/activity/meson.build
index f73c22905c..380d3dd70c 100644
--- a/src/backend/utils/activity/meson.build
+++ b/src/backend/utils/activity/meson.build
@@ -5,6 +5,7 @@ backend_sources += files(
   'backend_status.c',
   'pgstat.c',
   'pgstat_archiver.c',
+  'pgstat_backend.c',
   'pgstat_bgwriter.c',
   'pgstat_checkpointer.c',
   'pgstat_database.c',
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index b4e357c8a4..b72c779b2c 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -77,6 +77,7 @@
  *
  * Each statistics kind is handled in a dedicated file:
  * - pgstat_archiver.c
+ * - pgstat_backend.c
  * - pgstat_bgwriter.c
  * - pgstat_checkpointer.c
  * - pgstat_database.c
@@ -358,6 +359,22 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
+	[PGSTAT_KIND_BACKEND] = {
+		.name = "backend",
+
+		.fixed_amount = false,
+		.write_to_file = false,
+
+		.accessed_across_databases = true,
+
+		.shared_size = sizeof(PgStatShared_Backend),
+		.shared_data_off = offsetof(PgStatShared_Backend, stats),
+		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
+		.pending_size = sizeof(PgStat_BackendPendingIO),
+
+		.flush_pending_cb = pgstat_backend_flush_cb,
+		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
+	},
 
 	/* stats for fixed-numbered (mostly 1) objects */
 
@@ -602,6 +619,10 @@ pgstat_shutdown_hook(int code, Datum arg)
 	Assert(dlist_is_empty(&pgStatPending));
 	dlist_init(&pgStatPending);
 
+	/* drop the backend stats entry */
+	if (!pgstat_drop_entry(PGSTAT_KIND_BACKEND, InvalidOid, MyProcNumber))
+		pgstat_request_entry_refs_gc();
+
 	pgstat_detach_shmem();
 
 #ifdef USE_ASSERT_CHECKING
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
new file mode 100644
index 0000000000..6b2c9baa8c
--- /dev/null
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -0,0 +1,188 @@
+/* -------------------------------------------------------------------------
+ *
+ * pgstat_backend.c
+ *	  Implementation of backend statistics.
+ *
+ * This file contains the implementation of backend statistics.  It is kept
+ * separate from pgstat.c to enforce the line between the statistics access /
+ * storage implementation and the details about individual types of
+ * statistics.
+ *
+ * This statistics kind uses a proc number as object ID for the hash table
+ * of pgstats.  Entries are created each time a process is spawned, and are
+ * dropped when the process exits.  These are not written to the pgstats file
+ * on disk.
+ *
+ * Copyright (c) 2001-2024, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/activity/pgstat_backend.c
+ * -------------------------------------------------------------------------
+ */
+
+#include "postgres.h"
+
+#include "utils/pgstat_internal.h"
+
+/*
+ * Returns statistics of a backend by proc number.
+ */
+PgStat_Backend *
+pgstat_fetch_stat_backend(ProcNumber procNumber)
+{
+	PgStat_Backend *backend_entry;
+
+	backend_entry = (PgStat_Backend *) pgstat_fetch_entry(PGSTAT_KIND_BACKEND,
+														  InvalidOid, procNumber);
+
+	return backend_entry;
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+	PgStatShared_Backend *shbackendioent;
+	PgStat_BackendPendingIO *pendingent;
+	PgStat_BktypeIO *bktype_shstats;
+
+	if (!pgstat_lock_entry(entry_ref, nowait))
+		return false;
+
+	shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
+	bktype_shstats = &shbackendioent->stats.stats;
+	pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+
+	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
+	{
+		for (int io_context = 0; io_context < IOCONTEXT_NUM_TYPES; io_context++)
+		{
+			for (int io_op = 0; io_op < IOOP_NUM_TYPES; io_op++)
+			{
+				instr_time	time;
+
+				bktype_shstats->counts[io_object][io_context][io_op] +=
+					pendingent->counts[io_object][io_context][io_op];
+
+				time = pendingent->pending_times[io_object][io_context][io_op];
+
+				bktype_shstats->times[io_object][io_context][io_op] +=
+					INSTR_TIME_GET_MICROSEC(time);
+			}
+		}
+	}
+
+	pgstat_unlock_entry(entry_ref);
+
+	return true;
+}
+
+/*
+ * Simpler wrapper of pgstat_backend_flush_cb()
+ */
+void
+pgstat_flush_backend(bool nowait)
+{
+	PgStat_EntryRef *entry_ref;
+
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+									 MyProcNumber, false, NULL);
+	(void) pgstat_backend_flush_cb(entry_ref, nowait);
+}
+
+/*
+ * Create backend statistics entry for proc number.
+ */
+void
+pgstat_create_backend(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+	PgStatShared_Backend *shstatent;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
+
+	/*
+	 * NB: need to accept that there might be stats from an older backend,
+	 * e.g. if we previously used this proc number.
+	 */
+	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+}
+
+/*
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
+ */
+PgStat_BackendPendingIO *
+pgstat_prep_backend_pending(ProcNumber procnum)
+{
+	PgStat_EntryRef *entry_ref;
+
+	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
+										  procnum, NULL);
+
+	return entry_ref->pending;
+}
+
+/*
+ * Backend statistics are not collected for all BackendTypes.
+ *
+ * The following BackendTypes do not participate in the backend stats
+ * subsystem:
+ * - The same and for the same reasons as in pgstat_tracks_io_bktype().
+ * - B_BG_WRITER, B_CHECKPOINTER, B_STARTUP and B_AUTOVAC_LAUNCHER because their
+ * I/O stats are already visible in pg_stat_io and there is only one of those.
+ *
+ * Function returns true if BackendType participates in the backend stats
+ * subsystem and false if it does not.
+ *
+ * When adding a new BackendType, also consider adding relevant restrictions to
+ * pgstat_tracks_io_object() and pgstat_tracks_io_op().
+ */
+bool
+pgstat_tracks_backend_bktype(BackendType bktype)
+{
+	/*
+	 * List every type so that new backend types trigger a warning about
+	 * needing to adjust this switch.
+	 */
+	switch (bktype)
+	{
+		case B_INVALID:
+		case B_AUTOVAC_LAUNCHER:
+		case B_DEAD_END_BACKEND:
+		case B_ARCHIVER:
+		case B_LOGGER:
+		case B_WAL_RECEIVER:
+		case B_WAL_WRITER:
+		case B_WAL_SUMMARIZER:
+		case B_BG_WRITER:
+		case B_CHECKPOINTER:
+		case B_STARTUP:
+			return false;
+
+		case B_AUTOVAC_WORKER:
+		case B_BACKEND:
+		case B_BG_WORKER:
+		case B_STANDALONE_BACKEND:
+		case B_SLOTSYNC_WORKER:
+		case B_WAL_SENDER:
+			return true;
+	}
+
+	return false;
+}
+
+void
+pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts)
+{
+	((PgStatShared_Backend *) header)->stats.stat_reset_timestamp = ts;
+}
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9883af2b3..011a3326da 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -20,14 +20,6 @@
 #include "storage/bufmgr.h"
 #include "utils/pgstat_internal.h"
 
-
-typedef struct PgStat_PendingIO
-{
-	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
-} PgStat_PendingIO;
-
-
 static PgStat_PendingIO PendingIOStats;
 static bool have_iostats = false;
 
@@ -87,6 +79,14 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
 	Assert((unsigned int) io_op < IOOP_NUM_TYPES);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
+	if (pgstat_tracks_backend_bktype(MyBackendType))
+	{
+		PgStat_PendingIO *entry_ref;
+
+		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+		entry_ref->counts[io_object][io_context][io_op] += cnt;
+	}
+
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 
 	have_iostats = true;
@@ -148,6 +148,15 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
+
+		if (pgstat_tracks_backend_bktype(MyBackendType))
+		{
+			PgStat_PendingIO *entry_ref;
+
+			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+			INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+						   io_time);
+		}
 	}
 
 	pgstat_count_io_op_n(io_object, io_context, io_op, cnt);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index faba8b64d2..85e65557bb 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,6 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
@@ -350,6 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
+	pgstat_flush_backend(false);
 }
 
 /*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index efd35f700a..8387e11a93 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1274,8 +1274,9 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 }
 
 /*
-* When adding a new column to the pg_stat_io view, add a new enum value
-* here above IO_NUM_COLUMNS.
+* When adding a new column to the pg_stat_io view and the
+* pg_stat_get_backend_io() function, add a new enum value here above
+* IO_NUM_COLUMNS.
 */
 typedef enum io_stat_col
 {
@@ -1368,9 +1369,9 @@ pg_stat_us_to_ms(PgStat_Counter val_ms)
 /*
  * pg_stat_io_build_tuples
  *
- * Helper routine for pg_stat_get_io() filling in a result tuplestore with one
- * tuple for each object and each context supported by the caller, based on the
- * contents of bktype_stats.
+ * Helper routine for pg_stat_get_io() and pg_stat_get_backend_io(),
+ * filling in a result tuplestore with one tuple for each object and each
+ * context supported by the caller, based on the contents of bktype_stats.
  */
 static void
 pg_stat_io_build_tuples(ReturnSetInfo *rsinfo,
@@ -1494,6 +1495,68 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	return (Datum) 0;
 }
 
+/*
+ * Returns I/O statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_io(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo *rsinfo;
+	BackendType bktype;
+	int			pid;
+	PGPROC	   *proc;
+	ProcNumber	procNumber;
+	PgStat_Backend *backend_stats;
+	PgStat_BktypeIO *bktype_stats;
+	PgBackendStatus *beentry;
+
+	InitMaterializedSRF(fcinfo, 0);
+	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+
+	pid = PG_GETARG_INT32(0);
+	proc = BackendPidGetProc(pid);
+
+	/*
+	 * This could be an auxiliary process but these do not report backend
+	 * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+	 * for an extra call to AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		return (Datum) 0;
+
+	procNumber = GetNumberFromPGProc(proc);
+
+	backend_stats = pgstat_fetch_stat_backend(procNumber);
+
+	if (!backend_stats)
+		return (Datum) 0;
+
+	beentry = pgstat_get_beentry_by_proc_number(procNumber);
+	bktype = beentry->st_backendType;
+
+	/* if PID does not match, leave */
+	if (beentry->st_procpid != pid)
+		return (Datum) 0;
+
+	/* backend may be gone, so recheck in case */
+	if (bktype == B_INVALID)
+		return (Datum) 0;
+
+	bktype_stats = &backend_stats->stats;
+
+	/*
+	 * In Assert builds, we can afford an extra loop through all of the
+	 * counters (in pg_stat_io_build_tuples()), checking that only expected
+	 * stats are non-zero, since it keeps the non-Assert code cleaner.
+	 */
+	Assert(pgstat_bktype_io_stats_valid(bktype_stats, bktype));
+
+	/* save tuples with data from this PgStat_BktypeIO */
+	pg_stat_io_build_tuples(rsinfo, bktype_stats, bktype,
+							backend_stats->stat_reset_timestamp);
+	return (Datum) 0;
+}
+
 /*
  * Returns statistics of WAL activity
  */
@@ -1799,6 +1862,30 @@ pg_stat_reset_single_function_counters(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Reset statistics of backend with given PID.
+ */
+Datum
+pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
+{
+	PGPROC	   *proc;
+	int			backend_pid = PG_GETARG_INT32(0);
+
+	proc = BackendPidGetProc(backend_pid);
+
+	/*
+	 * This could be an auxiliary process but these do not report backend
+	 * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+	 * for an extra call to AuxiliaryPidGetProc().
+	 */
+	if (!proc)
+		PG_RETURN_VOID();
+
+	pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+
+	PG_RETURN_VOID();
+}
+
 /* Reset SLRU counters (a specific one or all of them). */
 Datum
 pg_stat_reset_slru(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0f22c21723..2dcc2d42da 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5913,6 +5913,15 @@
   proargnames => '{backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
   prosrc => 'pg_stat_get_io' },
 
+{ oid => '8806', descr => 'statistics: backend IO statistics',
+  proname => 'pg_stat_get_backend_io', prorows => '5', proretset => 't',
+  provolatile => 'v', proparallel => 'r', prorettype => 'record',
+  proargtypes => 'int4',
+  proallargtypes => '{int4,text,text,text,int8,float8,int8,float8,int8,float8,int8,float8,int8,int8,int8,int8,int8,float8,timestamptz}',
+  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{backend_pid,backend_type,object,context,reads,read_time,writes,write_time,writebacks,writeback_time,extends,extend_time,op_bytes,hits,evictions,reuses,fsyncs,fsync_time,stats_reset}',
+  prosrc => 'pg_stat_get_backend_io' },
+
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
@@ -6052,6 +6061,10 @@
   proname => 'pg_stat_reset_single_function_counters', provolatile => 'v',
   prorettype => 'void', proargtypes => 'oid',
   prosrc => 'pg_stat_reset_single_function_counters' },
+{ oid => '8807', descr => 'statistics: reset statistics for a single backend',
+  proname => 'pg_stat_reset_backend_stats', provolatile => 'v',
+  prorettype => 'void', proargtypes => 'int4',
+  prosrc => 'pg_stat_reset_backend_stats' },
 { oid => '2307',
   descr => 'statistics: reset collected statistics for a single SLRU',
   proname => 'pg_stat_reset_slru', proisstrict => 'f', provolatile => 'v',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index ebfeef2f46..56ed1b8bb3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -49,14 +49,15 @@
 #define PGSTAT_KIND_FUNCTION	3	/* per-function statistics */
 #define PGSTAT_KIND_REPLSLOT	4	/* per-slot statistics */
 #define PGSTAT_KIND_SUBSCRIPTION	5	/* per-subscription statistics */
+#define PGSTAT_KIND_BACKEND	6	/* per-backend statistics */
 
 /* stats for fixed-numbered objects */
-#define PGSTAT_KIND_ARCHIVER	6
-#define PGSTAT_KIND_BGWRITER	7
-#define PGSTAT_KIND_CHECKPOINTER	8
-#define PGSTAT_KIND_IO	9
-#define PGSTAT_KIND_SLRU	10
-#define PGSTAT_KIND_WAL	11
+#define PGSTAT_KIND_ARCHIVER	7
+#define PGSTAT_KIND_BGWRITER	8
+#define PGSTAT_KIND_CHECKPOINTER	9
+#define PGSTAT_KIND_IO	10
+#define PGSTAT_KIND_SLRU	11
+#define PGSTAT_KIND_WAL	12
 
 #define PGSTAT_KIND_BUILTIN_MIN PGSTAT_KIND_DATABASE
 #define PGSTAT_KIND_BUILTIN_MAX PGSTAT_KIND_WAL
@@ -362,12 +363,26 @@ typedef struct PgStat_BktypeIO
 	PgStat_Counter times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
 } PgStat_BktypeIO;
 
+typedef struct PgStat_PendingIO
+{
+	PgStat_Counter counts[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+	instr_time	pending_times[IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES];
+} PgStat_PendingIO;
+
 typedef struct PgStat_IO
 {
 	TimestampTz stat_reset_timestamp;
 	PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
 } PgStat_IO;
 
+/* Backend statistics store the same amount of IO data as PGSTAT_KIND_IO */
+typedef PgStat_PendingIO PgStat_BackendPendingIO;
+
+typedef struct PgStat_Backend
+{
+	TimestampTz stat_reset_timestamp;
+	PgStat_BktypeIO stats;
+} PgStat_Backend;
 
 typedef struct PgStat_StatDBEntry
 {
@@ -549,6 +564,13 @@ extern bool pgstat_have_entry(PgStat_Kind kind, Oid dboid, uint64 objid);
 extern void pgstat_report_archiver(const char *xlog, bool failed);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
+extern bool pgstat_tracks_backend_bktype(BackendType bktype);
+extern void pgstat_create_backend(ProcNumber procnum);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 7338bc1e28..811ed9b005 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -450,6 +450,11 @@ typedef struct PgStatShared_ReplSlot
 	PgStat_StatReplSlotEntry stats;
 } PgStatShared_ReplSlot;
 
+typedef struct PgStatShared_Backend
+{
+	PgStatShared_Common header;
+	PgStat_Backend stats;
+} PgStatShared_Backend;
 
 /*
  * Central shared memory entry for the cumulative stats system.
@@ -604,6 +609,15 @@ extern void pgstat_archiver_init_shmem_cb(void *stats);
 extern void pgstat_archiver_reset_all_cb(TimestampTz ts);
 extern void pgstat_archiver_snapshot_cb(void);
 
+/*
+ * Functions in pgstat_backend.c
+ */
+
+extern void pgstat_flush_backend(bool nowait);
+
+extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 56771f83ed..150b6dcf74 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1249,7 +1249,8 @@ SELECT pg_stat_get_subscription_stats(NULL);
  
 (1 row)
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in
+-- backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -1259,11 +1260,19 @@ SELECT pg_stat_get_subscription_stats(NULL);
 -- be sure of the state of shared buffers at the point the test is run.
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
@@ -1280,8 +1289,17 @@ SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
+ ?column? 
+----------
+ t
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -1301,6 +1319,23 @@ SELECT current_setting('fsync') = 'off'
  t
 (1 row)
 
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
+ ?column? 
+----------
+ t
+(1 row)
+
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
 SELECT sum(reads) AS io_sum_shared_before_reads
@@ -1521,6 +1556,8 @@ SELECT pg_stat_have_stats('io', 0, 0);
 
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
  pg_stat_reset_shared 
 ----------------------
@@ -1535,6 +1572,47 @@ SELECT :io_stats_post_reset < :io_stats_pre_reset;
  t
 (1 row)
 
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+ pg_stat_reset_backend_stats 
+-----------------------------
+ 
+(1 row)
+
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- Check invalid input for pg_stat_get_backend_io()
+SELECT pg_stat_get_backend_io(NULL);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+SELECT pg_stat_get_backend_io(0);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
+-- Auxiliary processes return no data.
+SELECT pg_stat_get_backend_io(:checkpointer_pid);
+ pg_stat_get_backend_io 
+------------------------
+(0 rows)
+
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
   id  integer PRIMARY KEY,
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 7147cc2f89..1e7d0ff665 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -595,7 +595,8 @@ SELECT pg_stat_get_replication_slot(NULL);
 SELECT pg_stat_get_subscription_stats(NULL);
 
 
--- Test that the following operations are tracked in pg_stat_io:
+-- Test that the following operations are tracked in pg_stat_io and in
+-- backend stats:
 -- - reads of target blocks into shared buffers
 -- - writes of shared buffers to permanent storage
 -- - extends of relations using shared buffers
@@ -607,20 +608,32 @@ SELECT pg_stat_get_subscription_stats(NULL);
 
 -- Create a regular table and insert some data to generate IOCONTEXT_NORMAL
 -- extends.
+SELECT pid AS checkpointer_pid FROM pg_stat_activity
+  WHERE backend_type = 'checkpointer' \gset
 SELECT sum(extends) AS io_sum_shared_before_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
+SELECT sum(extends) AS my_io_sum_shared_before_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
 SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   FROM pg_stat_io
   WHERE object = 'relation' \gset io_sum_shared_before_
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_before_
 CREATE TABLE test_io_shared(a int);
 INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
 SELECT pg_stat_force_next_flush();
 SELECT sum(extends) AS io_sum_shared_after_extends
   FROM pg_stat_io WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :io_sum_shared_after_extends > :io_sum_shared_before_extends;
+SELECT sum(extends) AS my_io_sum_shared_after_extends
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE context = 'normal' AND object = 'relation' \gset
+SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
--- and fsyncs.
+-- and fsyncs in the global stats (not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
 CHECKPOINT;
 CHECKPOINT;
@@ -630,6 +643,13 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
 SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
   OR :io_sum_shared_after_fsyncs > :io_sum_shared_before_fsyncs;
+SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
+  FROM pg_stat_get_backend_io(pg_backend_pid())
+  WHERE object = 'relation' \gset my_io_sum_shared_after_
+SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
+SELECT current_setting('fsync') = 'off'
+  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
+      AND :my_io_sum_shared_after_fsyncs= 0);
 
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
@@ -762,11 +782,27 @@ SELECT :io_sum_bulkwrite_strategy_extends_after > :io_sum_bulkwrite_strategy_ext
 SELECT pg_stat_have_stats('io', 0, 0);
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_pre_reset
   FROM pg_stat_io \gset
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_pre_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
 SELECT pg_stat_reset_shared('io');
 SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS io_stats_post_reset
   FROM pg_stat_io \gset
 SELECT :io_stats_post_reset < :io_stats_pre_reset;
-
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+-- pg_stat_reset_shared() did not reset backend IO stats
+SELECT :my_io_stats_pre_reset <= :my_io_stats_post_reset;
+-- but pg_stat_reset_backend_stats() does
+SELECT pg_stat_reset_backend_stats(pg_backend_pid());
+SELECT sum(evictions) + sum(reuses) + sum(extends) + sum(fsyncs) + sum(reads) + sum(writes) + sum(writebacks) + sum(hits) AS my_io_stats_post_backend_reset
+  FROM pg_stat_get_backend_io(pg_backend_pid()) \gset
+SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
+
+-- Check invalid input for pg_stat_get_backend_io()
+SELECT pg_stat_get_backend_io(NULL);
+SELECT pg_stat_get_backend_io(0);
+-- Auxiliary processes return no data.
+SELECT pg_stat_get_backend_io(:checkpointer_pid);
 
 -- test BRIN index doesn't block HOT update
 CREATE TABLE brin_hot (
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index ce33e55bf1..398dd92527 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2121,6 +2121,7 @@ PgFdwSamplingMethod
 PgFdwScanState
 PgIfAddrCallback
 PgStatShared_Archiver
+PgStatShared_Backend
 PgStatShared_BgWriter
 PgStatShared_Checkpointer
 PgStatShared_Common
@@ -2136,6 +2137,8 @@ PgStatShared_SLRU
 PgStatShared_Subscription
 PgStatShared_Wal
 PgStat_ArchiverStats
+PgStat_Backend
+PgStat_BackendPendingIO
 PgStat_BackendSubEntry
 PgStat_BgWriterStats
 PgStat_BktypeIO
-- 
2.34.1

#62Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#61)
Re: per backend I/O statistics

On Wed, Dec 18, 2024 at 08:11:55AM +0000, Bertrand Drouvot wrote:

Yeah, I also had something like this in mind (see R2 in [1]). So, +1 for it.

[1]: being: /messages/by-id/Zy4bmvgHqGjcK1pI@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#63Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#61)
Re: per backend I/O statistics

On Wed, Dec 18, 2024 at 08:11:55AM +0000, Bertrand Drouvot wrote:

I think I prefer pg_stat_io_build_tuples() and used that name in v11
attached.

I'll rely to your naming sense than mine, then :D

I think that this comment is now confusing since the extra loop would be
done in pg_stat_io_build_tuples() and not where the comment is written. One
option could have been to move the assert and the comment in pg_stat_io_build_tuples()
but I think it's better to have the assert before the pgstat_tracks_io_bktype()
call in pg_stat_get_io(), so modified the comment a bit instead.

0002 is the per-backend stats I/O with yours tweaks.

While doing more tests with backends exiting concurrently with their
stats scanned, I have detected one path in pg_stat_get_backend_io()
after calling pgstat_get_beentry_by_proc_number() where we should also
check that it does not return NULL, or we would crash on a pointer
dereference when reading the backend type.

Fixed that, bumped the two version counters, and done.
--
Michael

#64Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#63)
Re: per backend I/O statistics

Hi,

On Thu, Dec 19, 2024 at 01:21:54PM +0900, Michael Paquier wrote:

While doing more tests with backends exiting concurrently with their
stats scanned, I have detected one path in pg_stat_get_backend_io()
after calling pgstat_get_beentry_by_proc_number() where we should also
check that it does not return NULL, or we would crash on a pointer
dereference when reading the backend type.

Fixed that,

Oh right, indeed all the others pgstat_get_beentry_by_proc_number() callers are
checking for a NULL returned value.

bumped the two version counters, and done.

Thanks!

I think I'll start a dedicated thread to discuss the stats_fetch_consistency/'snapshot'
point (will be easier to follow than resuming the discussion in this thread).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#65Alexander Lakhin
exclusion@gmail.com
In reply to: Michael Paquier (#63)
Re: per backend I/O statistics

Hello Michael,

19.12.2024 06:21, Michael Paquier wrote:

Fixed that, bumped the two version counters, and done.

Could you, please, look at recent failures produced by grassquit (which
has fsync = on in it's config), on an added test case? For instance, [1]:
--- /home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/stats.out 2024-12-19 04:44:08.779311933 +0000
+++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out 2024-12-19 
16:37:41.351784840 +0000
@@ -1333,7 +1333,7 @@
        AND :my_io_sum_shared_after_fsyncs= 0);
   ?column?
  ----------
- t
+ f
  (1 row)

The complete query is:
SELECT current_setting('fsync') = 'off'
  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
      AND :my_io_sum_shared_after_fsyncs= 0);

And the corresponding query in 027_stream_regress_primary.log is:
2024-12-19 16:37:39.907 UTC [4027467][client backend][15/1980:0] LOG:  statement: SELECT current_setting('fsync') = 'off'
      OR (1 = 1
          AND 1= 0);

(I can reproduce this locally with an asan-enabled build too.)

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&amp;dt=2024-12-19%2016%3A28%3A58

Best regards,
Alexander

#66Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Alexander Lakhin (#65)
1 attachment(s)
Re: per backend I/O statistics

Hi,

On Fri, Dec 20, 2024 at 08:00:00AM +0200, Alexander Lakhin wrote:

Hello Michael,

19.12.2024 06:21, Michael Paquier wrote:

Fixed that, bumped the two version counters, and done.

Could you, please, look at recent failures produced by grassquit (which
has fsync = on in it's config), on an added test case? For instance, [1]:
--- /home/bf/bf-build/grassquit/HEAD/pgsql/src/test/regress/expected/stats.out 2024-12-19 04:44:08.779311933 +0000
+++ /home/bf/bf-build/grassquit/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out
2024-12-19 16:37:41.351784840 +0000
@@ -1333,7 +1333,7 @@
������ AND :my_io_sum_shared_after_fsyncs= 0);
� ?column?
�----------
- t
+ f
�(1 row)

The complete query is:
SELECT current_setting('fsync') = 'off'
� OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
����� AND :my_io_sum_shared_after_fsyncs= 0);

And the corresponding query in 027_stream_regress_primary.log is:
2024-12-19 16:37:39.907 UTC [4027467][client backend][15/1980:0] LOG:� statement: SELECT current_setting('fsync') = 'off'
����� OR (1 = 1
��������� AND 1= 0);

(I can reproduce this locally with an asan-enabled build too.)

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&amp;dt=2024-12-19%2016%3A28%3A58

Thanks for the report! I was not able able to reproduce (even with asan-enabled)
but I think the test is wrong. Indeed the backend could fsync too during the test
(see register_dirty_segment() and the case where the request queue is full).

I think the test should look like the attached instead, thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Fix-per-backend-IO-stats-regression-test.patchtext/x-diff; charset=us-asciiDownload
From a83def4ca95c154e824378452cb15f100887a56c Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Fri, 20 Dec 2024 08:59:53 +0000
Subject: [PATCH v1] Fix per-backend IO stats regression test

The tests added in 9aea73fc61 did not take into account that the backend can
fsync in some circumstances.
---
 src/test/regress/expected/stats.out | 3 +--
 src/test/regress/sql/stats.sql      | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)
  50.0% src/test/regress/expected/
  50.0% src/test/regress/sql/

diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 150b6dcf74..d18fc7c081 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -1329,8 +1329,7 @@ SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
 (1 row)
 
 SELECT current_setting('fsync') = 'off'
-  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
-      AND :my_io_sum_shared_after_fsyncs= 0);
+  OR :my_io_sum_shared_after_fsyncs >= :my_io_sum_shared_before_fsyncs;
  ?column? 
 ----------
  t
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 1e7d0ff665..fc0d2fde31 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -648,8 +648,7 @@ SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
   WHERE object = 'relation' \gset my_io_sum_shared_after_
 SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;
 SELECT current_setting('fsync') = 'off'
-  OR (:my_io_sum_shared_after_fsyncs = :my_io_sum_shared_before_fsyncs
-      AND :my_io_sum_shared_after_fsyncs= 0);
+  OR :my_io_sum_shared_after_fsyncs >= :my_io_sum_shared_before_fsyncs;
 
 -- Change the tablespace so that the table is rewritten directly, then SELECT
 -- from it to cause it to be read back into shared buffers.
-- 
2.34.1

#67Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#66)
Re: per backend I/O statistics

On Fri, Dec 20, 2024 at 09:09:00AM +0000, Bertrand Drouvot wrote:

Thanks for the report! I was not able able to reproduce (even with asan-enabled)
but I think the test is wrong. Indeed the backend could fsync too during the test
(see register_dirty_segment() and the case where the request queue is full).

I think the test should look like the attached instead, thoughts?

Hmm. I cannot reproduce that here either. FWIW, I use these settings
by default in my local builds, similarly to the CI which did not
complain:
ASAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0"
UBSAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:verbosity=2"
CFLAGS+="-fsanitize=undefined "
LDFLAGS+="-fsanitize=undefined "

grassquit uses something a bit different, which don't allow me to see
a problem with 027 even with fsync enabled:
ASAN_OPTIONS="print_stacktrace=1:disable_coredump=0:abort_on_error=1:detect_leaks=0:detect_stack_use_after_return=0"
CFLAGS+="-fsanitize=address -fno-sanitize-recover=all "
LDFLAGS+="-fsanitize=address -fno-sanitize-recover=all "

Anyway, I was arriving at the same conclusion as you, because this is
just telling us that there could be "some" sync activity. As
rewritten, this would just check that the after-the-fact counter is
never lower than the before-the-fact counter, which is still better
than removing the query. I was thinking about doing the latter and
remove the query, but I'm also OK with what you have here, keeping the
query with a relaxed check. I'll go adjust that in a bit..
--
Michael

#68Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#64)
Re: per backend I/O statistics

Hi,

On Thu, Dec 19, 2024 at 06:12:04AM +0000, Bertrand Drouvot wrote:

On Thu, Dec 19, 2024 at 01:21:54PM +0900, Michael Paquier wrote:

bumped the two version counters, and done.

The existing structure could be expanded in the
future to add more information about other statistics related to
backends, depending on requirements or ideas.

BTW, now that the per backend I/O statistics is done, I'll start working on per
backend wal statistics.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#69Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#68)
Re: per backend I/O statistics

On Fri, Dec 20, 2024 at 09:57:19AM +0000, Bertrand Drouvot wrote:

BTW, now that the per backend I/O statistics is done, I'll start working on per
backend wal statistics.

I think that this is a good idea. It does not actually overlap the
proposal in [1]https://commitfest.postgresql.org/51/4950/ -- Michael as the stats persistency is not the same. What this
thread has taught me is that you could just plug in the stats of the
new backend-level structure a PgStat_WalStats and retrieve them with a
new function that returns a single tuple with the WAL stats
attributes. That should be rather straight-forward to achieve.

[1]: https://commitfest.postgresql.org/51/4950/ -- Michael
--
Michael

#70Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#69)
Re: per backend I/O statistics

Hi,

On Tue, Dec 24, 2024 at 02:35:16PM +0900, Michael Paquier wrote:

On Fri, Dec 20, 2024 at 09:57:19AM +0000, Bertrand Drouvot wrote:

BTW, now that the per backend I/O statistics is done, I'll start working on per
backend wal statistics.

I think that this is a good idea.

Thanks for sharing your thoughts.

It does not actually overlap the
proposal in [1] as the stats persistency is not the same. What this
thread has taught me is that you could just plug in the stats of the
new backend-level structure a PgStat_WalStats and retrieve them with a
new function that returns a single tuple with the WAL stats
attributes. That should be rather straight-forward to achieve.

I started to look at it and should be able to share a patch next week.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#71Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#70)
Re: per backend I/O statistics

Hi,

On Fri, Jan 03, 2025 at 10:48:41AM +0000, Bertrand Drouvot wrote:

I started to look at it and should be able to share a patch next week.

A dedicated thread and a patch have been created, see [1]/messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal.

[1]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#72Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#71)
Re: per backend I/O statistics

On Tue, Jan 07, 2025 at 08:54:59AM +0000, Bertrand Drouvot wrote:

A dedicated thread and a patch have been created, see [1].

[1]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal

Hmm. I can see that you have some refactoring pieces in the lines of
some stuff I wanted to structure for the sake of this thread, but I
have discarded them to simplify the final result at the end and it
did not change much.

Will look at all that on this other thread, no need to start a
separate thread just for the sake of the refactoring bits.
--
Michael

#73Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#64)
Re: per backend I/O statistics

Hi,

On Thu, Dec 19, 2024 at 06:12:04AM +0000, Bertrand Drouvot wrote:

I think I'll start a dedicated thread to discuss the stats_fetch_consistency/'snapshot'
point (will be easier to follow than resuming the discussion in this thread).

I gave more thoughts on it and did some research in the mailing list.

I've read those comments, from [1]/messages/by-id/20220404210610.6dgbwzpbiveyl5y3@alap3.anarazel.de:

"
We had a lot of discussion around what consistency
model we want, and Tom was
adamant that there needs to be a mode that behaves
like the current consistency
model (which is what snapshot behaves like, with
very minor differences)
"

and from [2]/messages/by-id/20220329072651.imjqtzxwk6xycojv@alap3.anarazel.de:

"
One thing we definitely need to add documentation for is the
stats_fetch_consistency GUC. I think we should change its default to 'cache',
because that still gives the ability to "self-join", without the cost of the
current method.
"

So now, I think that it is not a good idea for the per backend stats to
behave differently than the other stats kinds. I think we should get rid of
the "snapshot" for all of them or for none of them. Reading the above comments,
it looks like it has already been discussed and that the outcome is to keep
it for all of them.

Regards,

[1]: /messages/by-id/20220404210610.6dgbwzpbiveyl5y3@alap3.anarazel.de
[2]: /messages/by-id/20220329072651.imjqtzxwk6xycojv@alap3.anarazel.de

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#74Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#73)
Re: per backend I/O statistics

On Fri, Jan 10, 2025 at 01:56:01PM +0000, Bertrand Drouvot wrote:

So now, I think that it is not a good idea for the per backend stats to
behave differently than the other stats kinds. I think we should get rid of
the "snapshot" for all of them or for none of them. Reading the above comments,
it looks like it has already been discussed and that the outcome is to keep
it for all of them.

Thanks a lot for diving into these details! I did not recall that
there was a strict argument for the current behavior, even if I was
pretty sure that introducing an exception for the backend stats was
not quite right compared to the other stats kinds. The consistency
line prevails here, then.
--
Michael

#75Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Michael Paquier (#63)
Re: per backend I/O statistics

Hi,

On Thu, 19 Dec 2024 at 07:22, Michael Paquier <michael@paquier.xyz> wrote:

Fixed that, bumped the two version counters, and done.

I encountered a problem while trying to add WAL stats to pg_stat_io
and I wanted to hear your thoughts.

Right now, pgstat_prep_backend_pending() is called in both
pgstat_count_io_op() and pgstat_count_io_op_time() to create a local
PgStat_BackendPending entry. In that process,
pgstat_prep_pending_entry() -> MemoryContextAllocZero() is called. The
problem is that MemoryContextAllocZero() can not be called in the
critical sections.

For example, here is what happens in the walsender backend:

'''
... ->
exec_replication_command() ->
SendBaseBackup() ->
... ->
XLogInsertRecord() ->
START_CRIT_SECTION() /* Now we are in the critical section */ ->
... ->
XLogWrite() ->
pgstat_count_io_op_time() for the pg_pwrite() IO ->
pgstat_prep_backend_pending() ->
pgstat_prep_pending_entry() ->
MemoryContextAllocZero() ->
Failed at Assert("CritSectionCount == 0 || (context)->allowInCritSection")
'''

With this commit it may not be possible to count IOs in the critical
sections. I think the problem happens only if the local
PgStat_BackendPending entry is being created for the first time for
this backend in the critical section.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#76Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nazir Bilal Yavuz (#75)
Re: per backend I/O statistics

Hi,

On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote:

Hi,

On Thu, 19 Dec 2024 at 07:22, Michael Paquier <michael@paquier.xyz> wrote:

Fixed that, bumped the two version counters, and done.

I encountered a problem while trying to add WAL stats to pg_stat_io
and I wanted to hear your thoughts.

Right now, pgstat_prep_backend_pending() is called in both
pgstat_count_io_op() and pgstat_count_io_op_time() to create a local
PgStat_BackendPending entry. In that process,
pgstat_prep_pending_entry() -> MemoryContextAllocZero() is called. The
problem is that MemoryContextAllocZero() can not be called in the
critical sections.

For example, here is what happens in the walsender backend:

'''
... ->
exec_replication_command() ->
SendBaseBackup() ->
... ->
XLogInsertRecord() ->
START_CRIT_SECTION() /* Now we are in the critical section */ ->
... ->
XLogWrite() ->
pgstat_count_io_op_time() for the pg_pwrite() IO ->
pgstat_prep_backend_pending() ->
pgstat_prep_pending_entry() ->
MemoryContextAllocZero() ->
Failed at Assert("CritSectionCount == 0 || (context)->allowInCritSection")
'''

With this commit it may not be possible to count IOs in the critical
sections. I think the problem happens only if the local
PgStat_BackendPending entry is being created for the first time for
this backend in the critical section.

Yeah, I encountered the exact same thing and mentioned it in [1]/messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal (see R1.).

In [1]/messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal�I did propose to use a new PendingBackendWalStats variable to "bypass" the
pgstat_prep_backend_pending() usage.

Michael mentioned in [2]/messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz that is not really consistent with the rest (what I
agree with) and that "we should rethink a bit the way pending entries are
retrieved". I did not think about it yet but that might be the way to
go, thoughts?

[1]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#77Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#76)
Re: per backend I/O statistics

On Wed, Jan 15, 2025 at 08:30:20AM +0000, Bertrand Drouvot wrote:

On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote:

With this commit it may not be possible to count IOs in the critical
sections. I think the problem happens only if the local
PgStat_BackendPending entry is being created for the first time for
this backend in the critical section.

Yeah, I encountered the exact same thing and mentioned it in [1] (see R1.).

In [1] I did propose to use a new PendingBackendWalStats variable to "bypass" the
pgstat_prep_backend_pending() usage.

Michael mentioned in [2] that is not really consistent with the rest (what I
agree with) and that "we should rethink a bit the way pending entries are
retrieved". I did not think about it yet but that might be the way to
go, thoughts?

[1]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz

My problem is that this is not only related to backend stats, but to
all variable-numbered stats kinds that require this behavior. One
other case where I've seen this as being an issue is injection point
stats, for example, while enabling these stats for all callbacks and
some of them are run in critical sections.

A generic solution to the problem would be to allow
pgStatPendingContext to have MemoryContextAllowInCriticalSection() set
temporarily for the allocation of the pending data.
Perhaps not for all stats kinds, just for these where we're OK with
the potential memory footprint so we could have a flag in
PgStat_KindInfo. Giving to all stats kinds the responsibility to
allocate a pending entry beforehand outside a critical section is
another option, but that means going through different tweaks for all
stats kinds that require them.

Andres, do you have any thoughts about that? Not sure what would be
your view on this matter. MemoryContextAllowInCriticalSection() for
stats kinds that allow the case would be OK for me I guess, we should
not be talking about a lot of memory.
--
Michael

#78Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#77)
Re: per backend I/O statistics

Hi,

On Wed, Jan 15, 2025 at 06:27:22PM +0900, Michael Paquier wrote:

On Wed, Jan 15, 2025 at 08:30:20AM +0000, Bertrand Drouvot wrote:

On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote:

With this commit it may not be possible to count IOs in the critical
sections. I think the problem happens only if the local
PgStat_BackendPending entry is being created for the first time for
this backend in the critical section.

Yeah, I encountered the exact same thing and mentioned it in [1] (see R1.).

In [1]�I did propose to use a new PendingBackendWalStats variable to "bypass" the
pgstat_prep_backend_pending() usage.

Michael mentioned in [2] that is not really consistent with the rest (what I
agree with) and that "we should rethink a bit the way pending entries are
retrieved". I did not think about it yet but that might be the way to
go, thoughts?

[1]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz

My problem is that this is not only related to backend stats, but to
all variable-numbered stats kinds that require this behavior.

Yeah, agree that it's better to have a "generic" solution.

A generic solution to the problem would be to allow
pgStatPendingContext to have MemoryContextAllowInCriticalSection() set
temporarily for the allocation of the pending data.
Perhaps not for all stats kinds, just for these where we're OK with
the potential memory footprint so we could have a flag in
PgStat_KindInfo. Giving to all stats kinds the responsibility to
allocate a pending entry beforehand outside a critical section is
another option, but that means going through different tweaks for all
stats kinds that require them.

Andres, do you have any thoughts about that? Not sure what would be
your view on this matter. MemoryContextAllowInCriticalSection() for
stats kinds that allow the case would be OK for me I guess,

Same here, that looks ok to me too.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#79Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Michael Paquier (#77)
Re: per backend I/O statistics

Hi,

On Wed, 15 Jan 2025 at 12:27, Michael Paquier <michael@paquier.xyz> wrote:

On Wed, Jan 15, 2025 at 08:30:20AM +0000, Bertrand Drouvot wrote:

On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote:

With this commit it may not be possible to count IOs in the critical
sections. I think the problem happens only if the local
PgStat_BackendPending entry is being created for the first time for
this backend in the critical section.

Yeah, I encountered the exact same thing and mentioned it in [1] (see R1.).

In [1] I did propose to use a new PendingBackendWalStats variable to "bypass" the
pgstat_prep_backend_pending() usage.

Michael mentioned in [2] that is not really consistent with the rest (what I
agree with) and that "we should rethink a bit the way pending entries are
retrieved". I did not think about it yet but that might be the way to
go, thoughts?

[1]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz

Thanks! I must have missed these emails.

My problem is that this is not only related to backend stats, but to
all variable-numbered stats kinds that require this behavior. One
other case where I've seen this as being an issue is injection point
stats, for example, while enabling these stats for all callbacks and
some of them are run in critical sections.

A generic solution to the problem would be to allow
pgStatPendingContext to have MemoryContextAllowInCriticalSection() set
temporarily for the allocation of the pending data.
Perhaps not for all stats kinds, just for these where we're OK with
the potential memory footprint so we could have a flag in
PgStat_KindInfo. Giving to all stats kinds the responsibility to
allocate a pending entry beforehand outside a critical section is
another option, but that means going through different tweaks for all
stats kinds that require them.

I think allowing only pgStatPendingContext to have
MemoryContextAllowInCriticalSection() is not enough. We need to allow
at least pgStatSharedRefContext as well to have
MemoryContextAllowInCriticalSection() as it can be allocated too.

'''
pgstat_prep_pending_entry() ->
pgstat_get_entry_ref() ->
pgstat_get_entry_ref_cached() ->
MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef))
'''

Also, I encountered another problem. I did not write it in the first
email because I thought I fixed it but it seems I did not.

While doing the initdb, we are restoring stats with the
pgstat_restore_stats() and we do not expect any pending stats. The
problem goes like that:

'''
pgstat_restore_stats() ->
pgstat_read_statsfile() ->
pgstat_reset_after_failure() ->
pgstat_drop_all_entries() ->
pgstat_drop_entry_internal() ->
We have an assertion there which checks if there is a pending stat entry:

/* should already have released local reference */
if (pgStatEntryRefHash)
Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key));
'''

And we have this at the same time:

'''
BootstrapModeMain() ->
InitPostgres() ->
StartupXLOG() ->
ReadCheckpointRecord() ->
InitWalRecovery() ->
... ->
XLogReadAhead() ->
XLogDecodeNextRecord() ->
ReadPageInternal() ->
state->routine.page_read = XLogPageRead() then WAL read happens
'''

So, this assert fails because we have pending stats for the
PGSTAT_KIND_BACKEND. My fix was simply not restoring stats when the
bootstrap is happening; but it did not work. Because, we reset all
fixed-numbered stats 'pgstat_reset_after_failure() ->
kind_info->reset_all_cb()' there, so if we skip it; we have some NULL
stats, for example reset timestamps. Some of the tests do not expect
them to be NULL at the start like below:

SELECT stats_reset AS archiver_reset_ts FROM pg_stat_archiver \gset
SELECT pg_stat_reset_shared('archiver');
SELECT stats_reset > :'archiver_reset_ts'::timestamptz FROM pg_stat_archiver;

This test fails because archiver_reset_ts is NULL when the server is started.

I was a bit surprised that Bertrand did not encounter the same problem
while working on the 'per backend WAL statistics' patch. Then I found
the reason, it is because this problem happens only for WAL read and
WAL init IOs which are starting to be tracked in my patch. By saying
that, I could not decide which thread to write about this problem,
migrating WAL stats thread or this thread. Since this thread is active
and other stats may cause the same problem, I decided to write here.
Please warn me if you think I need to write this to the migrating WAL
stats thread.

[1]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index bf3dbda901d..0538859cc7f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -5681,7 +5681,7 @@ StartupXLOG(void)
      */
     if (didCrash)
         pgstat_discard_stats();
-    else
+    else if (!IsBootstrapProcessingMode())
         pgstat_restore_stats(checkPoint.redo);

--
Regards,
Nazir Bilal Yavuz
Microsoft

#80Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Nazir Bilal Yavuz (#79)
Re: per backend I/O statistics

Hi,

On Wed, Jan 15, 2025 at 05:20:57PM +0300, Nazir Bilal Yavuz wrote:

While doing the initdb, we are restoring stats with the
pgstat_restore_stats() and we do not expect any pending stats. The
problem goes like that:

I was a bit surprised that Bertrand did not encounter the same problem
while working on the 'per backend WAL statistics' patch. Then I found
the reason, it is because this problem happens only for WAL read and
WAL init IOs which are starting to be tracked in my patch. By saying
that, I could not decide which thread to write about this problem,
migrating WAL stats thread or this thread. Since this thread is active
and other stats may cause the same problem, I decided to write here.
Please warn me if you think I need to write this to the migrating WAL
stats thread.

Thanks for the report! As it looks like that you hit the issue for "WAL read and
WAL init IOs which are starting to be tracked in my patch", then I think it would
make more sense to mention the issue in the "migrating WAL stats thread" so that
one could try to reproduce it (with a patch producing the issue at hands).

So it seems to me that the "migrating WAL stats thread" is a better place to
discuss about it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#81Michael Paquier
michael@paquier.xyz
In reply to: Nazir Bilal Yavuz (#79)
Re: per backend I/O statistics

On Wed, Jan 15, 2025 at 05:20:57PM +0300, Nazir Bilal Yavuz wrote:

I think allowing only pgStatPendingContext to have
MemoryContextAllowInCriticalSection() is not enough. We need to allow
at least pgStatSharedRefContext as well to have
MemoryContextAllowInCriticalSection() as it can be allocated too.

'''
pgstat_prep_pending_entry() ->
pgstat_get_entry_ref() ->
pgstat_get_entry_ref_cached() ->
MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef))
'''

Yep, I was pretty sure that we have a bit more going on. Last time I
began digging into the issue I was loading injection_points in
shared_preload_libraries with stats enabled to see how much I could
break, and this was one pattern once I've forced the pending part. I
didn't get through the whole exercise.

While doing the initdb, we are restoring stats with the
pgstat_restore_stats() and we do not expect any pending stats. The
problem goes like that:

'''
pgstat_restore_stats() ->
pgstat_read_statsfile() ->
pgstat_reset_after_failure() ->
pgstat_drop_all_entries() ->
pgstat_drop_entry_internal() ->
We have an assertion there which checks if there is a pending stat entry:

/* should already have released local reference */
if (pgStatEntryRefHash)
Assert(!pgstat_entry_ref_hash_lookup(pgStatEntryRefHash, shent->key));
'''

You mean that this is not something that happens on HEAD, but as an
effect of trying to add WAL stats to pg_stat_io, right?

I was a bit surprised that Bertrand did not encounter the same problem
while working on the 'per backend WAL statistics' patch. Then I found
the reason, it is because this problem happens only for WAL read and
WAL init IOs which are starting to be tracked in my patch. By saying
that, I could not decide which thread to write about this problem,
migrating WAL stats thread or this thread. Since this thread is active
and other stats may cause the same problem, I decided to write here.
Please warn me if you think I need to write this to the migrating WAL
stats thread.

I'd rather treat that on a separate thread (please add me in CC if I'm
not there yet!). I am OK to discuss any issues related to backend
statistics here if HEAD is impacted based on how the feature is
limited currently now, though.

Another, simpler, idea would be to force more calls
pgstat_prep_backend_pending() where this is problematic, like before
entering a critical section when generating a WAL record. I am not
sure that it would take care of everything, as you mention; that would
leave some holes depending on the kind of interactions with the
pgstats dshash we do in these paths.
--
Michael

#82Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#81)
Re: per backend I/O statistics

Hi,

On Thu, Jan 16, 2025 at 09:55:10AM +0900, Michael Paquier wrote:

On Wed, Jan 15, 2025 at 05:20:57PM +0300, Nazir Bilal Yavuz wrote:

I think allowing only pgStatPendingContext to have
MemoryContextAllowInCriticalSection() is not enough. We need to allow
at least pgStatSharedRefContext as well to have
MemoryContextAllowInCriticalSection() as it can be allocated too.

'''
pgstat_prep_pending_entry() ->
pgstat_get_entry_ref() ->
pgstat_get_entry_ref_cached() ->
MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef))
'''

Yep, I was pretty sure that we have a bit more going on. Last time I
began digging into the issue I was loading injection_points in
shared_preload_libraries with stats enabled to see how much I could
break, and this was one pattern once I've forced the pending part. I
didn't get through the whole exercise.

I'll look at it and come back with a proposal as part of [1]/messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz.

[1]: /messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#83Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Bertrand Drouvot (#82)
Re: per backend I/O statistics

Hi,

On Thu, Jan 16, 2025 at 06:48:58AM +0000, Bertrand Drouvot wrote:

Hi,

On Thu, Jan 16, 2025 at 09:55:10AM +0900, Michael Paquier wrote:

On Wed, Jan 15, 2025 at 05:20:57PM +0300, Nazir Bilal Yavuz wrote:

I think allowing only pgStatPendingContext to have
MemoryContextAllowInCriticalSection() is not enough. We need to allow
at least pgStatSharedRefContext as well to have
MemoryContextAllowInCriticalSection() as it can be allocated too.

'''
pgstat_prep_pending_entry() ->
pgstat_get_entry_ref() ->
pgstat_get_entry_ref_cached() ->
MemoryContextAlloc(pgStatSharedRefContext, sizeof(PgStat_EntryRef))
'''

Yep, I was pretty sure that we have a bit more going on. Last time I
began digging into the issue I was loading injection_points in
shared_preload_libraries with stats enabled to see how much I could
break, and this was one pattern once I've forced the pending part. I
didn't get through the whole exercise.

I'll look at it and come back with a proposal as part of [1].

I looked at it and shared the outcome in [1]/messages/by-id/Z4ks4+wnAz8eEyX9@ip-10-97-1-34.eu-west-3.compute.internal. It looks like it's more complicated
as we could also hit a failed assertion in MemoryContextCreate(), like:

TRAP: failed Assert("CritSectionCount == 0"), File: "mcxt.c", Line: 1107, PID: 3295726
pg18/bin/postgres(ExceptionalCondition+0xbb)[0x59668bee1f6d]
pg18/bin/postgres(MemoryContextCreate+0x46)[0x59668bf2a8fe]
pg18/bin/postgres(AllocSetContextCreateInternal+0x1df)[0x59668bf1bb11]
pg18/bin/postgres(pgstat_prep_pending_entry+0x86)[0x59668bcff8cc]
pg18/bin/postgres(pgstat_prep_backend_pending+0x2b)[0x59668bd024a9]

I propose that as of now we discuss the issue in [1]/messages/by-id/Z4ks4+wnAz8eEyX9@ip-10-97-1-34.eu-west-3.compute.internal instead of here.

[1]: /messages/by-id/Z4ks4+wnAz8eEyX9@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#84Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#77)
Re: per backend I/O statistics

Hi,

On 2025-01-15 18:27:22 +0900, Michael Paquier wrote:

On Wed, Jan 15, 2025 at 08:30:20AM +0000, Bertrand Drouvot wrote:

On Wed, Jan 15, 2025 at 11:03:54AM +0300, Nazir Bilal Yavuz wrote:

With this commit it may not be possible to count IOs in the critical
sections. I think the problem happens only if the local
PgStat_BackendPending entry is being created for the first time for
this backend in the critical section.

Yeah, I encountered the exact same thing and mentioned it in [1] (see R1.).

In [1]�I did propose to use a new PendingBackendWalStats variable to "bypass" the
pgstat_prep_backend_pending() usage.

Michael mentioned in [2] that is not really consistent with the rest (what I
agree with) and that "we should rethink a bit the way pending entries are
retrieved". I did not think about it yet but that might be the way to
go, thoughts?

[1]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/Z4dRlNuhSQ3hPPv2@paquier.xyz

My problem is that this is not only related to backend stats, but to
all variable-numbered stats kinds that require this behavior.

The issue here imo is that recently IO stats were turned into a variable
numbered stat, where it previously wasn't. It seems like a bad idea that now
we regularly do a somewhat sizable memory allocation for backend stats. For
table stats allocating the pending memory is somewhat a necessity - there
could be a lot of different tables after all. And later transactions won't
necessarily access the same set of tables, so we can't just keep the memory
for pending around - but that's not true for IO.

One other case where I've seen this as being an issue is injection point
stats, for example, while enabling these stats for all callbacks and some of
them are run in critical sections.

That seems somewhat fundamental - you could have arbitrary numbers of
injection points and thus need arbitrary amounts of memory. You can't just do
that in a critical section. And it's not just about a potential memory
allocation for pending stats, the DSM allocation can't be done in a critical
section either.

A generic solution to the problem would be to allow
pgStatPendingContext to have MemoryContextAllowInCriticalSection() set
temporarily for the allocation of the pending data.
Perhaps not for all stats kinds, just for these where we're OK with
the potential memory footprint so we could have a flag in
PgStat_KindInfo.

That seems like an *incredibly* bad idea. That restriction exists for a
reason.

Andres, do you have any thoughts about that? Not sure what would be
your view on this matter. MemoryContextAllowInCriticalSection() for
stats kinds that allow the case would be OK for me I guess, we should
not be talking about a lot of memory.

My view is that for IO stats no memory allocation should be required - that
used to be the case and should be the case again. And for something like
injection points (not really convinced that it's a good thing to have stats
for, but ...), allocation the shared memory stats object outside of the
critical section and not having any pending stats seems to be the right
approach.

Greetings,

Andres Freund

#85Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#84)
Re: per backend I/O statistics

On Thu, Jan 16, 2025 at 11:28:43AM -0500, Andres Freund wrote:

On 2025-01-15 18:27:22 +0900, Michael Paquier wrote:

My problem is that this is not only related to backend stats, but to
all variable-numbered stats kinds that require this behavior.

The issue here imo is that recently IO stats were turned into a variable
numbered stat, where it previously wasn't. It seems like a bad idea that now
we regularly do a somewhat sizable memory allocation for backend stats. For
table stats allocating the pending memory is somewhat a necessity - there
could be a lot of different tables after all. And later transactions won't
necessarily access the same set of tables, so we can't just keep the memory
for pending around - but that's not true for IO.

Yep, for the reason that backend stats require that with one entry in
the pgstats dshash for each backend as these are sized depending on
max_connections & co using a proc number for the hash key.

Andres, do you have any thoughts about that? Not sure what would be
your view on this matter. MemoryContextAllowInCriticalSection() for
stats kinds that allow the case would be OK for me I guess, we should
not be talking about a lot of memory.

My view is that for IO stats no memory allocation should be required - that
used to be the case and should be the case again. And for something like
injection points (not really convinced that it's a good thing to have stats
for, but ...), allocation the shared memory stats object outside of the
critical section and not having any pending stats seems to be the right
approach.

Not sure to completely agree on that as this pushes the responsibility
of how to deal with the critical section handling to each stats kind,
but I won't fight over it for this case, either. It would not be the
first time we have specific logic to do allocations beforehand when
dealing with critical sections (38c579b08988 is the most recent
example coming in mind, there are many others).

Anyway, let's just switch pgstat_backend.c so as we use a static
PgStat_BackendPending (pending name) to store the pending IO stats
that could be reused for also the WAL bits. It does not seem that
complicated as far as I can see, removing pgstat_prep_backend_pending
and changing pgstat_flush_backend_entry_io to use the static pending
data. The patch set posted here is doing that within its
pgstat_flush_backend_entry_wal():
/messages/by-id/Z4DrFhdlO6Xf13Xd@ip-10-97-1-34.eu-west-3.compute.internal

I could tweak that around the beginning of next week with a proposal
of patch. Bertrand, perhaps you'd prefer hack on this one?
--
Michael

#86Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#85)
Re: per backend I/O statistics

Hi,

On Fri, Jan 17, 2025 at 09:08:02AM +0900, Michael Paquier wrote:

Anyway, let's just switch pgstat_backend.c so as we use a static
PgStat_BackendPending (pending name) to store the pending IO stats
that could be reused for also the WAL bits. It does not seem that
complicated as far as I can see, removing pgstat_prep_backend_pending
and changing pgstat_flush_backend_entry_io to use the static pending
data. The patch set posted here is doing that within its
pgstat_flush_backend_entry_wal():
/messages/by-id/Z4DrFhdlO6Xf13Xd@ip-10-97-1-34.eu-west-3.compute.internal

I could tweak that around the beginning of next week with a proposal
of patch. Bertrand, perhaps you'd prefer hack on this one?

Yeah, I had in mind to work on it (in the exact same line as you are describing
above). I'll work on it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#87Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#86)
Re: per backend I/O statistics

On Fri, Jan 17, 2025 at 06:06:35AM +0000, Bertrand Drouvot wrote:

On Fri, Jan 17, 2025 at 09:08:02AM +0900, Michael Paquier wrote:

I could tweak that around the beginning of next week with a proposal
of patch. Bertrand, perhaps you'd prefer hack on this one?

Yeah, I had in mind to work on it (in the exact same line as you are describing
above). I'll work on it.

Thanks.
--
Michael

#88Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#87)
1 attachment(s)
Re: per backend I/O statistics

Hi,

On Fri, Jan 17, 2025 at 03:12:36PM +0900, Michael Paquier wrote:

On Fri, Jan 17, 2025 at 06:06:35AM +0000, Bertrand Drouvot wrote:

On Fri, Jan 17, 2025 at 09:08:02AM +0900, Michael Paquier wrote:

I could tweak that around the beginning of next week with a proposal
of patch. Bertrand, perhaps you'd prefer hack on this one?

Yeah, I had in mind to work on it (in the exact same line as you are describing
above). I'll work on it.

Please find attached a patch implementing the ideas above, meaning:

- It creates a new PendingBackendStats variable
- It uses this variable to increment and flush per backend pending IO statistics

That way we get rid of the memory allocation for pending IO statistics.

One remark: a special case has been added in pgstat_flush_pending_entries(). The
reason is that while the per backend stats are "variable-numbered" stats, it could
be that their pending stats are not part of pgStatPending. This is currently the
case (with this patch) as the per backend pending IO stats are not tracked with
pgstat_prep_backend_pending() and friends anymore.

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Rework-per-backend-pending-stats.patchtext/x-diff; charset=us-asciiDownload
From 48d715a4a8b667628bb8ccf573e0003eb8f54633 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Fri, 17 Jan 2025 06:44:47 +0000
Subject: [PATCH v1] Rework per backend pending stats

9aea73fc61d added per backend pending statistics but not all of them are well
suited to memory allocation and have to be outside of critical sections.

For those (the only current one is I/O statistics but WAL statistics is under
discussion), let's rely on a new PendingBackendStats instead.
---
 src/backend/utils/activity/pgstat.c         | 27 ++++++++++++++++++-
 src/backend/utils/activity/pgstat_backend.c | 29 +++++++++++++++------
 src/backend/utils/activity/pgstat_io.c      | 14 +++-------
 src/include/pgstat.h                        | 12 +++++++++
 4 files changed, 62 insertions(+), 20 deletions(-)
  89.0% src/backend/utils/activity/
  10.9% src/include/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 34520535d54..27aa0491001 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1369,6 +1369,7 @@ static bool
 pgstat_flush_pending_entries(bool nowait)
 {
 	bool		have_pending = false;
+	bool		backend_have_pending = false;
 	dlist_node *cur = NULL;
 
 	/*
@@ -1416,9 +1417,33 @@ pgstat_flush_pending_entries(bool nowait)
 		cur = next;
 	}
 
+	/*
+	 * There is a special case for some pending stats that are tracked in
+	 * PendingBackendStats. It's possible that those have not been flushed
+	 * above, hence the extra check here.
+	 */
+	if (!pg_memory_is_all_zeros(&PendingBackendStats,
+								sizeof(struct PgStat_BackendPending)))
+	{
+		PgStat_EntryRef *entry_ref;
+
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+
+		if (entry_ref)
+		{
+			PgStat_HashKey key = entry_ref->shared_entry->key;
+			PgStat_Kind kind = key.kind;
+			const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
+
+			if (!kind_info->flush_pending_cb(entry_ref, nowait))
+				backend_have_pending = true;
+		}
+	}
+
 	Assert(dlist_is_empty(&pgStatPending) == !have_pending);
 
-	return have_pending;
+	return (have_pending || backend_have_pending);
 }
 
 
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 79e4d0a3053..ca5cec348b2 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -22,8 +22,11 @@
 
 #include "postgres.h"
 
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
+PgStat_BackendPending PendingBackendStats = {0};
+
 /*
  * Returns statistics of a backend by proc number.
  */
@@ -46,14 +49,20 @@ static void
 pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 {
 	PgStatShared_Backend *shbackendent;
-	PgStat_BackendPending *pendingent;
 	PgStat_BktypeIO *bktype_shstats;
-	PgStat_PendingIO *pending_io;
+	PgStat_PendingIO pending_io;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid unnecessarily modifying the stats entry.
+	 */
+	if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
+							   sizeof(struct PgStat_PendingIO)))
+		return;
 
 	shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
-	pendingent = (PgStat_BackendPending *) entry_ref->pending;
 	bktype_shstats = &shbackendent->stats.io_stats;
-	pending_io = &pendingent->pending_io;
+	pending_io = PendingBackendStats.pending_io;
 
 	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
@@ -64,17 +73,21 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 				instr_time	time;
 
 				bktype_shstats->counts[io_object][io_context][io_op] +=
-					pending_io->counts[io_object][io_context][io_op];
+					pending_io.counts[io_object][io_context][io_op];
 				bktype_shstats->bytes[io_object][io_context][io_op] +=
-					pending_io->bytes[io_object][io_context][io_op];
-
-				time = pending_io->pending_times[io_object][io_context][io_op];
+					pending_io.bytes[io_object][io_context][io_op];
+				time = pending_io.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 027aad8b24e..b8f19515ae1 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -75,11 +75,8 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 
 	if (pgstat_tracks_backend_bktype(MyBackendType))
 	{
-		PgStat_BackendPending *entry_ref;
-
-		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-		entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
-		entry_ref->pending_io.bytes[io_object][io_context][io_op] += bytes;
+		PendingBackendStats.pending_io.counts[io_object][io_context][io_op] += cnt;
+		PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
 	}
 
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
@@ -146,13 +143,8 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 					   io_time);
 
 		if (pgstat_tracks_backend_bktype(MyBackendType))
-		{
-			PgStat_BackendPending *entry_ref;
-
-			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-			INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
+			INSTR_TIME_ADD(PendingBackendStats.pending_io.pending_times[io_object][io_context][io_op],
 						   io_time);
-		}
 	}
 
 	pgstat_count_io_op(io_object, io_context, io_op, cnt, bytes);
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2d40fe3e70f..3eb7b6614fe 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -826,5 +826,17 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 /* updated directly by backends and background processes */
 extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
 
+/*
+ * Variables in pgstat_backend.c
+ */
+
+/* updated directly by backends */
+
+/*
+ * Some pending statistics are not well suited to memory allocation and have
+ * to be outside of critical sections. For those, let's rely on this variable
+ * instead.
+ */
+extern PGDLLIMPORT PgStat_BackendPending PendingBackendStats;
 
 #endif							/* PGSTAT_H */
-- 
2.34.1

#89Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#88)
Re: per backend I/O statistics

On Fri, Jan 17, 2025 at 10:23:48AM +0000, Bertrand Drouvot wrote:

Please find attached a patch implementing the ideas above, meaning:

- It creates a new PendingBackendStats variable
- It uses this variable to increment and flush per backend pending IO statistics

That way we get rid of the memory allocation for pending IO statistics.

One remark: a special case has been added in pgstat_flush_pending_entries(). The
reason is that while the per backend stats are "variable-numbered" stats, it could
be that their pending stats are not part of pgStatPending. This is currently the
case (with this patch) as the per backend pending IO stats are not tracked with
pgstat_prep_backend_pending() and friends anymore.

+    /*
+     * There is a special case for some pending stats that are tracked in
+     * PendingBackendStats. It's possible that those have not been flushed
+     * above, hence the extra check here.
+     */
+    if (!pg_memory_is_all_zeros(&PendingBackendStats,
+                                sizeof(struct PgStat_BackendPending)))
+    {
+        PgStat_EntryRef *entry_ref;
+
+        entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+                                         MyProcNumber, false, NULL);

Hmm. Such special complexities in pgstat.c are annoying. There is
a stupid thing I am wondering here. For the WAL stats, why couldn't
we place some calls of pgstat_prep_backend_pending() in strategic
places like XLogBeginInsert() to force all the allocation steps of the
pending entry to happen before we would enter the critical sections
when doing a WAL insertion? As far as I can see, there is a special
case with 2PC where XLogBeginInsert() could be called in a critical
section, but that seems to be the only one at quick glance.
--
Michael

#90Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#89)
2 attachment(s)
Re: per backend I/O statistics

On Sat, Jan 18, 2025 at 05:53:31PM +0900, Michael Paquier wrote:

Hmm. Such special complexities in pgstat.c are annoying. There is
a stupid thing I am wondering here. For the WAL stats, why couldn't
we place some calls of pgstat_prep_backend_pending() in strategic
places like XLogBeginInsert() to force all the allocation steps of the
pending entry to happen before we would enter the critical sections
when doing a WAL insertion? As far as I can see, there is a special
case with 2PC where XLogBeginInsert() could be called in a critical
section, but that seems to be the only one at quick glance.

I've looked first at this idea over the week-end with a quick hack,
and it did not finish well.

And then it struck me that with a bit of redesign of the callbacks, so
as the existing flush_fixed_cb and have_fixed_pending_cb are usable
with variable-numbered stats, we should be able to avoid the exception
your first version of the patch has introduced. Attached is a patch
to achieve what I have in mind, which is more generic than your
previous patch. I've built my stuff as 0002 on top of your 0001.

One key choice I have made is to hide PendingBackendStats within
pgstat_backend.c so as it is possible to enforce some sanity checks
there.
--
Michael

Attachments:

v2-0001-Rework-per-backend-pending-stats.patchtext/x-diff; charset=us-asciiDownload
From 90ffa6d32a9cd0034a2fd97c31991199f2089698 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Fri, 17 Jan 2025 06:44:47 +0000
Subject: [PATCH v2 1/2] Rework per backend pending stats

9aea73fc61d added per backend pending statistics but not all of them are well
suited to memory allocation and have to be outside of critical sections.

For those (the only current one is I/O statistics but WAL statistics is under
discussion), let's rely on a new PendingBackendStats instead.
---
 src/include/pgstat.h                        | 12 +++++++++
 src/backend/utils/activity/pgstat.c         | 27 ++++++++++++++++++-
 src/backend/utils/activity/pgstat_backend.c | 29 +++++++++++++++------
 src/backend/utils/activity/pgstat_io.c      | 14 +++-------
 4 files changed, 62 insertions(+), 20 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2d40fe3e70..3eb7b6614f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -826,5 +826,17 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 /* updated directly by backends and background processes */
 extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
 
+/*
+ * Variables in pgstat_backend.c
+ */
+
+/* updated directly by backends */
+
+/*
+ * Some pending statistics are not well suited to memory allocation and have
+ * to be outside of critical sections. For those, let's rely on this variable
+ * instead.
+ */
+extern PGDLLIMPORT PgStat_BackendPending PendingBackendStats;
 
 #endif							/* PGSTAT_H */
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 34520535d5..27aa049100 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -1369,6 +1369,7 @@ static bool
 pgstat_flush_pending_entries(bool nowait)
 {
 	bool		have_pending = false;
+	bool		backend_have_pending = false;
 	dlist_node *cur = NULL;
 
 	/*
@@ -1416,9 +1417,33 @@ pgstat_flush_pending_entries(bool nowait)
 		cur = next;
 	}
 
+	/*
+	 * There is a special case for some pending stats that are tracked in
+	 * PendingBackendStats. It's possible that those have not been flushed
+	 * above, hence the extra check here.
+	 */
+	if (!pg_memory_is_all_zeros(&PendingBackendStats,
+								sizeof(struct PgStat_BackendPending)))
+	{
+		PgStat_EntryRef *entry_ref;
+
+		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+										 MyProcNumber, false, NULL);
+
+		if (entry_ref)
+		{
+			PgStat_HashKey key = entry_ref->shared_entry->key;
+			PgStat_Kind kind = key.kind;
+			const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
+
+			if (!kind_info->flush_pending_cb(entry_ref, nowait))
+				backend_have_pending = true;
+		}
+	}
+
 	Assert(dlist_is_empty(&pgStatPending) == !have_pending);
 
-	return have_pending;
+	return (have_pending || backend_have_pending);
 }
 
 
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 79e4d0a305..ca5cec348b 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -22,8 +22,11 @@
 
 #include "postgres.h"
 
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
+PgStat_BackendPending PendingBackendStats = {0};
+
 /*
  * Returns statistics of a backend by proc number.
  */
@@ -46,14 +49,20 @@ static void
 pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 {
 	PgStatShared_Backend *shbackendent;
-	PgStat_BackendPending *pendingent;
 	PgStat_BktypeIO *bktype_shstats;
-	PgStat_PendingIO *pending_io;
+	PgStat_PendingIO pending_io;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid unnecessarily modifying the stats entry.
+	 */
+	if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
+							   sizeof(struct PgStat_PendingIO)))
+		return;
 
 	shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
-	pendingent = (PgStat_BackendPending *) entry_ref->pending;
 	bktype_shstats = &shbackendent->stats.io_stats;
-	pending_io = &pendingent->pending_io;
+	pending_io = PendingBackendStats.pending_io;
 
 	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
@@ -64,17 +73,21 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 				instr_time	time;
 
 				bktype_shstats->counts[io_object][io_context][io_op] +=
-					pending_io->counts[io_object][io_context][io_op];
+					pending_io.counts[io_object][io_context][io_op];
 				bktype_shstats->bytes[io_object][io_context][io_op] +=
-					pending_io->bytes[io_object][io_context][io_op];
-
-				time = pending_io->pending_times[io_object][io_context][io_op];
+					pending_io.bytes[io_object][io_context][io_op];
+				time = pending_io.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 027aad8b24..b8f19515ae 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -75,11 +75,8 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 
 	if (pgstat_tracks_backend_bktype(MyBackendType))
 	{
-		PgStat_BackendPending *entry_ref;
-
-		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-		entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
-		entry_ref->pending_io.bytes[io_object][io_context][io_op] += bytes;
+		PendingBackendStats.pending_io.counts[io_object][io_context][io_op] += cnt;
+		PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
 	}
 
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
@@ -146,13 +143,8 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 					   io_time);
 
 		if (pgstat_tracks_backend_bktype(MyBackendType))
-		{
-			PgStat_BackendPending *entry_ref;
-
-			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-			INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
+			INSTR_TIME_ADD(PendingBackendStats.pending_io.pending_times[io_object][io_context][io_op],
 						   io_time);
-		}
 	}
 
 	pgstat_count_io_op(io_object, io_context, io_op, cnt, bytes);
-- 
2.47.1

v2-0002-More-work-to-allow-backend-statistics-in-critical.patchtext/x-diff; charset=us-asciiDownload
From 99561e65c3e266157ac21588097c8222a1df1d28 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 20 Jan 2025 15:31:57 +0900
Subject: [PATCH v2 2/2] More work to allow backend statistics in critical
 sections

Callbacks are refactored so as variable-numbered stats can use them if
they don't want to use pending data, like the backend stats.
---
 src/include/pgstat.h                         |  21 ++-
 src/include/utils/pgstat_internal.h          |  29 ++--
 src/backend/utils/activity/pgstat.c          |  67 +++------
 src/backend/utils/activity/pgstat_backend.c  | 142 ++++++++++++-------
 src/backend/utils/activity/pgstat_io.c       |  15 +-
 src/backend/utils/activity/pgstat_relation.c |   4 +-
 6 files changed, 140 insertions(+), 138 deletions(-)

diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3eb7b6614f..d0d4515097 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -540,6 +540,15 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
  * Functions in pgstat_backend.c
  */
 
+/* used by pgstat_io.c for I/O stats tracked in backends */
+extern void pgstat_count_backend_io_op_time(IOObject io_object,
+											IOContext io_context,
+											IOOp io_op,
+											instr_time io_time);
+extern void pgstat_count_backend_io_op(IOObject io_object,
+									   IOContext io_context,
+									   IOOp io_op, uint32 cnt,
+									   uint64 bytes);
 extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
 extern bool pgstat_tracks_backend_bktype(BackendType bktype);
 extern void pgstat_create_backend(ProcNumber procnum);
@@ -826,17 +835,5 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 /* updated directly by backends and background processes */
 extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
 
-/*
- * Variables in pgstat_backend.c
- */
-
-/* updated directly by backends */
-
-/*
- * Some pending statistics are not well suited to memory allocation and have
- * to be outside of critical sections. For those, let's rely on this variable
- * instead.
- */
-extern PGDLLIMPORT PgStat_BackendPending PendingBackendStats;
 
 #endif							/* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 4bb8e5c53a..000ed5b36f 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -259,8 +259,8 @@ typedef struct PgStat_KindInfo
 	void		(*init_backend_cb) (void);
 
 	/*
-	 * For variable-numbered stats: flush pending stats. Required if pending
-	 * data is used.  See flush_fixed_cb for fixed-numbered stats.
+	 * For variable-numbered stats: flush pending stats within the dshash.
+	 * Required if pending data interacts with the pgstats dshash.
 	 */
 	bool		(*flush_pending_cb) (PgStat_EntryRef *sr, bool nowait);
 
@@ -289,17 +289,19 @@ typedef struct PgStat_KindInfo
 	void		(*init_shmem_cb) (void *stats);
 
 	/*
-	 * For fixed-numbered statistics: Flush pending stats. Returns true if
-	 * some of the stats could not be flushed, due to lock contention for
-	 * example. Optional.
+	 * For fixed-numbered or variable-numbered statistics: Check for pending
+	 * stats in need of flush, when these do not use the pgstats dshash.
+	 * Returns true if there are any stats pending for flush, triggering
+	 * flush_cb. Optional.
 	 */
-	bool		(*flush_fixed_cb) (bool nowait);
+	bool		(*have_pending_cb) (void);
 
 	/*
-	 * For fixed-numbered statistics: Check for pending stats in need of
-	 * flush. Returns true if there are any stats pending for flush. Optional.
+	 * For fixed-numbered or variable-numbered statistics: Flush pending
+	 * stats. Returns true if some of the stats could not be flushed, due
+	 * to lock contention for example. Optional.
 	 */
-	bool		(*have_fixed_pending_cb) (void);
+	bool		(*flush_cb) (bool nowait);
 
 	/*
 	 * For fixed-numbered statistics: Reset All.
@@ -617,10 +619,11 @@ extern void pgstat_archiver_snapshot_cb(void);
 #define PGSTAT_BACKEND_FLUSH_IO		(1 << 0)	/* Flush I/O statistics */
 #define PGSTAT_BACKEND_FLUSH_ALL	(PGSTAT_BACKEND_FLUSH_IO)
 
-extern void pgstat_flush_backend(bool nowait, bits32 flags);
-extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
-extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
-extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
+extern bool pgstat_flush_backend(bool nowait, bits32 flags);
+extern bool pgstat_backend_flush_cb(bool nowait);
+extern bool pgstat_backend_have_pending_cb(void);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header,
+											  TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 27aa049100..bd1ffad6f4 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -372,7 +372,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
 		.pending_size = sizeof(PgStat_BackendPending),
 
-		.flush_pending_cb = pgstat_backend_flush_cb,
+		.have_pending_cb = pgstat_backend_have_pending_cb,
+		.flush_cb = pgstat_backend_flush_cb,
 		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
 	},
 
@@ -437,8 +438,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_IO, stats),
 		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
 
-		.flush_fixed_cb = pgstat_io_flush_cb,
-		.have_fixed_pending_cb = pgstat_io_have_pending_cb,
+		.flush_cb = pgstat_io_flush_cb,
+		.have_pending_cb = pgstat_io_have_pending_cb,
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
 		.snapshot_cb = pgstat_io_snapshot_cb,
@@ -455,8 +456,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_SLRU, stats),
 		.shared_data_len = sizeof(((PgStatShared_SLRU *) 0)->stats),
 
-		.flush_fixed_cb = pgstat_slru_flush_cb,
-		.have_fixed_pending_cb = pgstat_slru_have_pending_cb,
+		.flush_cb = pgstat_slru_flush_cb,
+		.have_pending_cb = pgstat_slru_have_pending_cb,
 		.init_shmem_cb = pgstat_slru_init_shmem_cb,
 		.reset_all_cb = pgstat_slru_reset_all_cb,
 		.snapshot_cb = pgstat_slru_snapshot_cb,
@@ -474,8 +475,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Wal *) 0)->stats),
 
 		.init_backend_cb = pgstat_wal_init_backend_cb,
-		.flush_fixed_cb = pgstat_wal_flush_cb,
-		.have_fixed_pending_cb = pgstat_wal_have_pending_cb,
+		.flush_cb = pgstat_wal_flush_cb,
+		.have_pending_cb = pgstat_wal_have_pending_cb,
 		.init_shmem_cb = pgstat_wal_init_shmem_cb,
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
@@ -713,22 +714,17 @@ pgstat_report_stat(bool force)
 	{
 		bool		do_flush = false;
 
-		/* Check for pending fixed-numbered stats */
+		/* Check for pending stats */
 		for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
 		{
 			const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
 			if (!kind_info)
 				continue;
-			if (!kind_info->fixed_amount)
-			{
-				Assert(kind_info->have_fixed_pending_cb == NULL);
-				continue;
-			}
-			if (!kind_info->have_fixed_pending_cb)
+			if (!kind_info->have_pending_cb)
 				continue;
 
-			if (kind_info->have_fixed_pending_cb())
+			if (kind_info->have_pending_cb())
 			{
 				do_flush = true;
 				break;
@@ -792,22 +788,20 @@ pgstat_report_stat(bool force)
 	/* flush of variable-numbered stats */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
-	/* flush of fixed-numbered stats */
+	/*
+	 * Flush of other stats, which could be variable-numbered or
+	 * fixed-numbered.
+	 */
 	for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
 	{
 		const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
 		if (!kind_info)
 			continue;
-		if (!kind_info->fixed_amount)
-		{
-			Assert(kind_info->flush_fixed_cb == NULL);
-			continue;
-		}
-		if (!kind_info->flush_fixed_cb)
+		if (!kind_info->flush_cb)
 			continue;
 
-		partial_flush |= kind_info->flush_fixed_cb(nowait);
+		partial_flush |= kind_info->flush_cb(nowait);
 	}
 
 	last_flush = now;
@@ -1369,7 +1363,6 @@ static bool
 pgstat_flush_pending_entries(bool nowait)
 {
 	bool		have_pending = false;
-	bool		backend_have_pending = false;
 	dlist_node *cur = NULL;
 
 	/*
@@ -1417,33 +1410,9 @@ pgstat_flush_pending_entries(bool nowait)
 		cur = next;
 	}
 
-	/*
-	 * There is a special case for some pending stats that are tracked in
-	 * PendingBackendStats. It's possible that those have not been flushed
-	 * above, hence the extra check here.
-	 */
-	if (!pg_memory_is_all_zeros(&PendingBackendStats,
-								sizeof(struct PgStat_BackendPending)))
-	{
-		PgStat_EntryRef *entry_ref;
-
-		entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
-										 MyProcNumber, false, NULL);
-
-		if (entry_ref)
-		{
-			PgStat_HashKey key = entry_ref->shared_entry->key;
-			PgStat_Kind kind = key.kind;
-			const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
-
-			if (!kind_info->flush_pending_cb(entry_ref, nowait))
-				backend_have_pending = true;
-		}
-	}
-
 	Assert(dlist_is_empty(&pgStatPending) == !have_pending);
 
-	return (have_pending || backend_have_pending);
+	return have_pending;
 }
 
 
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index ca5cec348b..fb7abf64a0 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -11,7 +11,9 @@
  * This statistics kind uses a proc number as object ID for the hash table
  * of pgstats.  Entries are created each time a process is spawned, and are
  * dropped when the process exits.  These are not written to the pgstats file
- * on disk.
+ * on disk.  Pending statistics are managed without direct interactions with
+ * the pgstats dshash, relying on PendingBackendStats instead so as it is
+ * possible to report data within critical sections.
  *
  * Copyright (c) 2001-2025, PostgreSQL Global Development Group
  *
@@ -22,10 +24,53 @@
 
 #include "postgres.h"
 
+#include "storage/bufmgr.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
-PgStat_BackendPending PendingBackendStats = {0};
+/*
+ * Backend statistics counts waiting to be flushed out.  We assume this variable
+ * inits to zeroes.  These counters may be reported within critical sections so
+ * we use static memory in order to avoid memory allocation.
+ */
+static PgStat_BackendPending PendingBackendStats = {0};
+static bool have_backendstats = false;
+
+/*
+ * Utility routines to report I/O stats for backends, kept here to avoid
+ * exposing PendingBackendStats to the outside world.
+ */
+void
+pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
+								IOOp io_op, instr_time io_time)
+{
+	Assert(track_io_timing);
+
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
+
+	INSTR_TIME_ADD(PendingBackendStats.pending_io.pending_times[io_object][io_context][io_op],
+				   io_time);
+
+	have_backendstats = true;
+}
+
+void
+pgstat_count_backend_io_op(IOObject io_object, IOContext io_context,
+						   IOOp io_op, uint32 cnt, uint64 bytes)
+{
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
+
+	PendingBackendStats.pending_io.counts[io_object][io_context][io_op] += cnt;
+	PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
+
+	have_backendstats = true;
+}
 
 /*
  * Returns statistics of a backend by proc number.
@@ -53,8 +98,9 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 	PgStat_PendingIO pending_io;
 
 	/*
-	 * This function can be called even if nothing at all has happened. In
-	 * this case, avoid unnecessarily modifying the stats entry.
+	 * This function can be called even if nothing at all has happened for
+	 * IO statistics.  In this case, avoid unnecessarily modifying the stats
+	 * entry.
 	 */
 	if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
 							   sizeof(struct PgStat_PendingIO)))
@@ -83,25 +129,29 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 			}
 		}
 	}
-
-	/*
-	 * Clear out the statistics buffer, so it can be re-used.
-	 */
-	MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
 }
 
 /*
- * Wrapper routine to flush backend statistics.
+ * Flush out locally pending backend statistics
+ *
+ * "flags" parameter controls which statistics to flush.  Returns true
+ * if some statistics could not be flushed.
  */
-static bool
-pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
-						   bits32 flags)
+bool
+pgstat_flush_backend(bool nowait, bits32 flags)
 {
+	PgStat_EntryRef *entry_ref;
+
+	if (!have_backendstats)
+		return false;
+
 	if (!pgstat_tracks_backend_bktype(MyBackendType))
 		return false;
 
-	if (!pgstat_lock_entry(entry_ref, nowait))
-		return false;
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
+											MyProcNumber, nowait);
+	if (!entry_ref)
+		return true;
 
 	/* Flush requested statistics */
 	if (flags & PGSTAT_BACKEND_FLUSH_IO)
@@ -109,36 +159,33 @@ pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
 
 	pgstat_unlock_entry(entry_ref);
 
-	return true;
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+	have_backendstats = false;
+	return false;
+}
+
+/*
+ * Check if there are any backend stats waiting for flush.
+ */
+bool
+pgstat_backend_have_pending_cb(void)
+{
+	return have_backendstats;
 }
 
 /*
  * Callback to flush out locally pending backend statistics.
  *
- * If no stats have been recorded, this function returns false.
+ * If some stats could not be flushed, return true.
  */
 bool
-pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+pgstat_backend_flush_cb(bool nowait)
 {
-	return pgstat_flush_backend_entry(entry_ref, nowait, PGSTAT_BACKEND_FLUSH_ALL);
-}
-
-/*
- * Flush out locally pending backend statistics
- *
- * "flags" parameter controls which statistics to flush.
- */
-void
-pgstat_flush_backend(bool nowait, bits32 flags)
-{
-	PgStat_EntryRef *entry_ref;
-
-	if (!pgstat_tracks_backend_bktype(MyBackendType))
-		return;
-
-	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
-									 MyProcNumber, false, NULL);
-	(void) pgstat_flush_backend_entry(entry_ref, nowait, flags);
+	return pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_ALL);
 }
 
 /*
@@ -150,9 +197,8 @@ pgstat_create_backend(ProcNumber procnum)
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Backend *shstatent;
 
-	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
-										  procnum, NULL);
-
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
+											MyProcNumber, false);
 	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
 
 	/*
@@ -160,20 +206,10 @@ pgstat_create_backend(ProcNumber procnum)
 	 * e.g. if we previously used this proc number.
 	 */
 	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
-}
+	pgstat_unlock_entry(entry_ref);
 
-/*
- * Find or create a local PgStat_BackendPending entry for proc number.
- */
-PgStat_BackendPending *
-pgstat_prep_backend_pending(ProcNumber procnum)
-{
-	PgStat_EntryRef *entry_ref;
-
-	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
-										  procnum, NULL);
-
-	return entry_ref->pending;
+	MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+	have_backendstats = false;
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index b8f19515ae..6ff5d9e96a 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -73,15 +73,12 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 	Assert(pgstat_is_ioop_tracked_in_bytes(io_op) || bytes == 0);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
-	if (pgstat_tracks_backend_bktype(MyBackendType))
-	{
-		PendingBackendStats.pending_io.counts[io_object][io_context][io_op] += cnt;
-		PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
-	}
-
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 	PendingIOStats.bytes[io_object][io_context][io_op] += bytes;
 
+	/* Add the per-backend counts */
+	pgstat_count_backend_io_op(io_object, io_context, io_op, cnt, bytes);
+
 	have_iostats = true;
 }
 
@@ -142,9 +139,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
 
-		if (pgstat_tracks_backend_bktype(MyBackendType))
-			INSTR_TIME_ADD(PendingBackendStats.pending_io.pending_times[io_object][io_context][io_op],
-						   io_time);
+		/* Add the per-backend count */
+		pgstat_count_backend_io_op_time(io_object, io_context, io_op,
+										io_time);
 	}
 
 	pgstat_count_io_op(io_object, io_context, io_op, cnt, bytes);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 09247ba097..965a7fe2c6 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
-	pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+	(void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
 }
 
 /*
@@ -351,7 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
-	pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+	(void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
 }
 
 /*
-- 
2.47.1

#91Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#90)
3 attachment(s)
Re: per backend I/O statistics

Hi,

On Mon, Jan 20, 2025 at 03:34:41PM +0900, Michael Paquier wrote:

On Sat, Jan 18, 2025 at 05:53:31PM +0900, Michael Paquier wrote:

Hmm. Such special complexities in pgstat.c are annoying. There is
a stupid thing I am wondering here. For the WAL stats, why couldn't
we place some calls of pgstat_prep_backend_pending() in strategic
places like XLogBeginInsert() to force all the allocation steps of the
pending entry to happen before we would enter the critical sections
when doing a WAL insertion? As far as I can see, there is a special
case with 2PC where XLogBeginInsert() could be called in a critical
section, but that seems to be the only one at quick glance.

I've looked first at this idea over the week-end with a quick hack,
and it did not finish well.

Yeah, that would probably also mean moving the current WAL pending increments to
the same "new" place (to be consistent) which does not seem great.

And then it struck me that with a bit of redesign of the callbacks, so

Thanks!

as the existing flush_fixed_cb and have_fixed_pending_cb are usable
with variable-numbered stats, we should be able to avoid the exception
your first version of the patch has introduced. Attached is a patch
to achieve what I have in mind, which is more generic than your
previous patch. I've built my stuff as 0002 on top of your 0001.

Indeed. Another idea that I had would have been to create another callback,
but I prefer your idea.

=== 1

I think that flush_pending_cb, flush_cb and have_pending_cb are now somehow
confusing (one could think that have_pending_cb is only linked to
flush_pending_cb).

I think that it would be better to make the distinction based on "local/static"
vs "dynamic memory" pending stats instead: I did so in v3 attached, using:

.flush_dynamic_cb(): flushes pending entries tracked in dynamic memory
.flush_static_cb(): flushes pending stats from static/global variable

Also I reworded the comments a bit.

=== 2

- /*
- * Clear out the statistics buffer, so it can be re-used.
- */
- MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));

The inital intent was to clear *only" the pending IO stats, so that each
flush (IO, WAL once implemented,...) could clear the pending stats it is
responsible for.

=== 3

+       /*
+        * Clear out the statistics buffer, so it can be re-used.
+        */
+       MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));

Not sure about this one, see above. I mean it is currently Ok but once we'll
introduce the WAL part then it will not be correct depending of the flag value
being passed.

So, I did put back the previous logic in place (setting to zero only the stats
the flush callback is responsible for) in v3 attached.

=== 4

+ * Backend statistics counts waiting to be flushed out.  We assume this variable
+ * inits to zeroes.  These counters may be reported within critical sections so
+ * we use static memory in order to avoid memory allocation.
+ */
+static PgStat_BackendPending PendingBackendStats = {0};

I now think we should memset to zero to avoid any padding issues (as more structs
will be added to PgStat_BackendPending). Oh and it's already done in
pgstat_create_backend(), so removing the {0} assignement in the attached.

=== 5

+ have_backendstats = false;

I don't think setting have_backendstats to false in pgstat_flush_backend()
is correct. I mean, it is correct currently but once we'll add the WAL part it
won't necessary be correct if the flags != PGSTAT_BACKEND_FLUSH_ALL. So, using a
pg_memory_is_all_zeros check on PendingBackendStats instead in the attached.

One key choice I have made is to hide PendingBackendStats within
pgstat_backend.c so as it is possible to enforce some sanity checks
there.

Yeah, better, thanks!

=== 6

In passing, I realized that:

"
* Copyright (c) 2001-2025, PostgreSQL Global Development Group
"

in pgstat_backend.c is incorrect (fixing it in 0002).

PFA:

0001: which is full rebase
0002: the Copyright fix
.txt: the changes I've made on top of your 0002 (to not confuse the cfbot and
to ease your review).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Rework-per-backend-pending-stats.patchtext/x-diff; charset=us-asciiDownload
From c945a5800554ec90f42091d8175ed50e00ead45f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Fri, 17 Jan 2025 06:44:47 +0000
Subject: [PATCH v3 1/2] Rework per backend pending stats

9aea73fc61d added per backend pending statistics but not all of them are well
suited to memory allocation and have to be outside of critical sections.

For those (the only current one is I/O statistics but WAL statistics is under
discussion), let's rely on a new PendingBackendStats instead.
---
 src/backend/utils/activity/pgstat.c           |  53 +++----
 src/backend/utils/activity/pgstat_backend.c   | 142 ++++++++++++------
 src/backend/utils/activity/pgstat_io.c        |  23 +--
 src/backend/utils/activity/pgstat_relation.c  |   4 +-
 src/include/pgstat.h                          |   9 ++
 src/include/utils/pgstat_internal.h           |  36 +++--
 .../injection_points/injection_stats.c        |   2 +-
 7 files changed, 153 insertions(+), 116 deletions(-)
  75.8% src/backend/utils/activity/
  19.6% src/include/utils/
   3.6% src/include/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 34520535d54..0a4b687f045 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -292,7 +292,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Database *) 0)->stats),
 		.pending_size = sizeof(PgStat_StatDBEntry),
 
-		.flush_pending_cb = pgstat_database_flush_cb,
+		.flush_dynamic_cb = pgstat_database_flush_cb,
 		.reset_timestamp_cb = pgstat_database_reset_timestamp_cb,
 	},
 
@@ -307,7 +307,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Relation *) 0)->stats),
 		.pending_size = sizeof(PgStat_TableStatus),
 
-		.flush_pending_cb = pgstat_relation_flush_cb,
+		.flush_dynamic_cb = pgstat_relation_flush_cb,
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
@@ -322,7 +322,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
 		.pending_size = sizeof(PgStat_FunctionCounts),
 
-		.flush_pending_cb = pgstat_function_flush_cb,
+		.flush_dynamic_cb = pgstat_function_flush_cb,
 	},
 
 	[PGSTAT_KIND_REPLSLOT] = {
@@ -355,7 +355,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Subscription *) 0)->stats),
 		.pending_size = sizeof(PgStat_BackendSubEntry),
 
-		.flush_pending_cb = pgstat_subscription_flush_cb,
+		.flush_dynamic_cb = pgstat_subscription_flush_cb,
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
@@ -372,7 +372,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
 		.pending_size = sizeof(PgStat_BackendPending),
 
-		.flush_pending_cb = pgstat_backend_flush_cb,
+		.have_pending_cb = pgstat_backend_have_pending_cb,
+		.flush_static_cb = pgstat_backend_flush_cb,
 		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
 	},
 
@@ -437,8 +438,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_IO, stats),
 		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
 
-		.flush_fixed_cb = pgstat_io_flush_cb,
-		.have_fixed_pending_cb = pgstat_io_have_pending_cb,
+		.flush_static_cb = pgstat_io_flush_cb,
+		.have_pending_cb = pgstat_io_have_pending_cb,
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
 		.snapshot_cb = pgstat_io_snapshot_cb,
@@ -455,8 +456,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_SLRU, stats),
 		.shared_data_len = sizeof(((PgStatShared_SLRU *) 0)->stats),
 
-		.flush_fixed_cb = pgstat_slru_flush_cb,
-		.have_fixed_pending_cb = pgstat_slru_have_pending_cb,
+		.flush_static_cb = pgstat_slru_flush_cb,
+		.have_pending_cb = pgstat_slru_have_pending_cb,
 		.init_shmem_cb = pgstat_slru_init_shmem_cb,
 		.reset_all_cb = pgstat_slru_reset_all_cb,
 		.snapshot_cb = pgstat_slru_snapshot_cb,
@@ -474,8 +475,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Wal *) 0)->stats),
 
 		.init_backend_cb = pgstat_wal_init_backend_cb,
-		.flush_fixed_cb = pgstat_wal_flush_cb,
-		.have_fixed_pending_cb = pgstat_wal_have_pending_cb,
+		.flush_static_cb = pgstat_wal_flush_cb,
+		.have_pending_cb = pgstat_wal_have_pending_cb,
 		.init_shmem_cb = pgstat_wal_init_shmem_cb,
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
@@ -713,22 +714,17 @@ pgstat_report_stat(bool force)
 	{
 		bool		do_flush = false;
 
-		/* Check for pending fixed-numbered stats */
+		/* Check for pending stats */
 		for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
 		{
 			const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
 			if (!kind_info)
 				continue;
-			if (!kind_info->fixed_amount)
-			{
-				Assert(kind_info->have_fixed_pending_cb == NULL);
-				continue;
-			}
-			if (!kind_info->have_fixed_pending_cb)
+			if (!kind_info->have_pending_cb)
 				continue;
 
-			if (kind_info->have_fixed_pending_cb())
+			if (kind_info->have_pending_cb())
 			{
 				do_flush = true;
 				break;
@@ -789,25 +785,20 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush of variable-numbered stats */
+	/* flush of variable-numbered stats tracked in pending entries list */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
-	/* flush of fixed-numbered stats */
+	/* flush stats for each registered kind that has a flush static callback */
 	for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
 	{
 		const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
 		if (!kind_info)
 			continue;
-		if (!kind_info->fixed_amount)
-		{
-			Assert(kind_info->flush_fixed_cb == NULL);
-			continue;
-		}
-		if (!kind_info->flush_fixed_cb)
+		if (!kind_info->flush_static_cb)
 			continue;
 
-		partial_flush |= kind_info->flush_fixed_cb(nowait);
+		partial_flush |= kind_info->flush_static_cb(nowait);
 	}
 
 	last_flush = now;
@@ -1297,7 +1288,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid, bool *creat
 	PgStat_EntryRef *entry_ref;
 
 	/* need to be able to flush out */
-	Assert(pgstat_get_kind_info(kind)->flush_pending_cb != NULL);
+	Assert(pgstat_get_kind_info(kind)->flush_dynamic_cb != NULL);
 
 	if (unlikely(!pgStatPendingContext))
 	{
@@ -1394,10 +1385,10 @@ pgstat_flush_pending_entries(bool nowait)
 		dlist_node *next;
 
 		Assert(!kind_info->fixed_amount);
-		Assert(kind_info->flush_pending_cb != NULL);
+		Assert(kind_info->flush_dynamic_cb != NULL);
 
 		/* flush the stats, if possible */
-		did_flush = kind_info->flush_pending_cb(entry_ref, nowait);
+		did_flush = kind_info->flush_dynamic_cb(entry_ref, nowait);
 
 		Assert(did_flush || nowait);
 
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 79e4d0a3053..bcae6a78169 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -11,7 +11,9 @@
  * This statistics kind uses a proc number as object ID for the hash table
  * of pgstats.  Entries are created each time a process is spawned, and are
  * dropped when the process exits.  These are not written to the pgstats file
- * on disk.
+ * on disk.  Pending statistics are managed without direct interactions with
+ * the pgstats dshash, relying on PendingBackendStats instead so as it is
+ * possible to report data within critical sections.
  *
  * Copyright (c) 2001-2025, PostgreSQL Global Development Group
  *
@@ -22,8 +24,49 @@
 
 #include "postgres.h"
 
+#include "storage/bufmgr.h"
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
+/*
+ * Backend statistics counts waiting to be flushed out. These counters may be
+ * reported within critical sections so we use static memory in order to avoid
+ * memory allocation.
+ */
+static PgStat_BackendPending PendingBackendStats;
+
+/*
+ * Utility routines to report I/O stats for backends, kept here to avoid
+ * exposing PendingBackendStats to the outside world.
+ */
+void
+pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
+								IOOp io_op, instr_time io_time)
+{
+	Assert(track_io_timing);
+
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
+
+	INSTR_TIME_ADD(PendingBackendStats.pending_io.pending_times[io_object][io_context][io_op],
+				   io_time);
+}
+
+void
+pgstat_count_backend_io_op(IOObject io_object, IOContext io_context,
+						   IOOp io_op, uint32 cnt, uint64 bytes)
+{
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
+
+	PendingBackendStats.pending_io.counts[io_object][io_context][io_op] += cnt;
+	PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
+}
+
 /*
  * Returns statistics of a backend by proc number.
  */
@@ -46,14 +89,21 @@ static void
 pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 {
 	PgStatShared_Backend *shbackendent;
-	PgStat_BackendPending *pendingent;
 	PgStat_BktypeIO *bktype_shstats;
-	PgStat_PendingIO *pending_io;
+	PgStat_PendingIO pending_io;
+
+	/*
+	 * This function can be called even if nothing at all has happened for IO
+	 * statistics.  In this case, avoid unnecessarily modifying the stats
+	 * entry.
+	 */
+	if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
+							   sizeof(struct PgStat_PendingIO)))
+		return;
 
 	shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
-	pendingent = (PgStat_BackendPending *) entry_ref->pending;
 	bktype_shstats = &shbackendent->stats.io_stats;
-	pending_io = &pendingent->pending_io;
+	pending_io = PendingBackendStats.pending_io;
 
 	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
@@ -64,68 +114,74 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 				instr_time	time;
 
 				bktype_shstats->counts[io_object][io_context][io_op] +=
-					pending_io->counts[io_object][io_context][io_op];
+					pending_io.counts[io_object][io_context][io_op];
 				bktype_shstats->bytes[io_object][io_context][io_op] +=
-					pending_io->bytes[io_object][io_context][io_op];
-
-				time = pending_io->pending_times[io_object][io_context][io_op];
+					pending_io.bytes[io_object][io_context][io_op];
+				time = pending_io.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
 }
 
 /*
- * Wrapper routine to flush backend statistics.
+ * Flush out locally pending backend statistics
+ *
+ * "flags" parameter controls which statistics to flush.  Returns true
+ * if some statistics could not be flushed.
  */
-static bool
-pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
-						   bits32 flags)
+bool
+pgstat_flush_backend(bool nowait, bits32 flags)
 {
-	if (!pgstat_tracks_backend_bktype(MyBackendType))
+	PgStat_EntryRef *entry_ref;
+
+	if (pg_memory_is_all_zeros(&PendingBackendStats,
+							   sizeof(struct PgStat_BackendPending)))
 		return false;
 
-	if (!pgstat_lock_entry(entry_ref, nowait))
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
 		return false;
 
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
+											MyProcNumber, nowait);
+	if (!entry_ref)
+		return true;
+
 	/* Flush requested statistics */
 	if (flags & PGSTAT_BACKEND_FLUSH_IO)
 		pgstat_flush_backend_entry_io(entry_ref);
 
 	pgstat_unlock_entry(entry_ref);
 
-	return true;
+	return false;
 }
 
 /*
- * Callback to flush out locally pending backend statistics.
- *
- * If no stats have been recorded, this function returns false.
+ * Check if there are any backend stats waiting for flush.
  */
 bool
-pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+pgstat_backend_have_pending_cb(void)
 {
-	return pgstat_flush_backend_entry(entry_ref, nowait, PGSTAT_BACKEND_FLUSH_ALL);
+	return (!pg_memory_is_all_zeros(&PendingBackendStats,
+									sizeof(struct PgStat_BackendPending)));
 }
 
 /*
- * Flush out locally pending backend statistics
+ * Callback to flush out locally pending backend statistics.
  *
- * "flags" parameter controls which statistics to flush.
+ * If some stats could not be flushed, return true.
  */
-void
-pgstat_flush_backend(bool nowait, bits32 flags)
+bool
+pgstat_backend_flush_cb(bool nowait)
 {
-	PgStat_EntryRef *entry_ref;
-
-	if (!pgstat_tracks_backend_bktype(MyBackendType))
-		return;
-
-	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
-									 MyProcNumber, false, NULL);
-	(void) pgstat_flush_backend_entry(entry_ref, nowait, flags);
+	return pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_ALL);
 }
 
 /*
@@ -137,9 +193,8 @@ pgstat_create_backend(ProcNumber procnum)
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Backend *shstatent;
 
-	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
-										  procnum, NULL);
-
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
+											MyProcNumber, false);
 	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
 
 	/*
@@ -147,20 +202,9 @@ pgstat_create_backend(ProcNumber procnum)
 	 * e.g. if we previously used this proc number.
 	 */
 	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
-}
-
-/*
- * Find or create a local PgStat_BackendPending entry for proc number.
- */
-PgStat_BackendPending *
-pgstat_prep_backend_pending(ProcNumber procnum)
-{
-	PgStat_EntryRef *entry_ref;
-
-	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
-										  procnum, NULL);
+	pgstat_unlock_entry(entry_ref);
 
-	return entry_ref->pending;
+	MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 027aad8b24e..6ff5d9e96a1 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -73,18 +73,12 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 	Assert(pgstat_is_ioop_tracked_in_bytes(io_op) || bytes == 0);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
-	if (pgstat_tracks_backend_bktype(MyBackendType))
-	{
-		PgStat_BackendPending *entry_ref;
-
-		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-		entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
-		entry_ref->pending_io.bytes[io_object][io_context][io_op] += bytes;
-	}
-
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 	PendingIOStats.bytes[io_object][io_context][io_op] += bytes;
 
+	/* Add the per-backend counts */
+	pgstat_count_backend_io_op(io_object, io_context, io_op, cnt, bytes);
+
 	have_iostats = true;
 }
 
@@ -145,14 +139,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
 
-		if (pgstat_tracks_backend_bktype(MyBackendType))
-		{
-			PgStat_BackendPending *entry_ref;
-
-			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-			INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
-						   io_time);
-		}
+		/* Add the per-backend count */
+		pgstat_count_backend_io_op_time(io_object, io_context, io_op,
+										io_time);
 	}
 
 	pgstat_count_io_op(io_object, io_context, io_op, cnt, bytes);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 09247ba0971..965a7fe2c64 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
-	pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+	(void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
 }
 
 /*
@@ -351,7 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
-	pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+	(void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
 }
 
 /*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2d40fe3e70f..d0d45150977 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -540,6 +540,15 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
  * Functions in pgstat_backend.c
  */
 
+/* used by pgstat_io.c for I/O stats tracked in backends */
+extern void pgstat_count_backend_io_op_time(IOObject io_object,
+											IOContext io_context,
+											IOOp io_op,
+											instr_time io_time);
+extern void pgstat_count_backend_io_op(IOObject io_object,
+									   IOContext io_context,
+									   IOOp io_op, uint32 cnt,
+									   uint64 bytes);
 extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
 extern bool pgstat_tracks_backend_bktype(BackendType bktype);
 extern void pgstat_create_backend(ProcNumber procnum);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 4bb8e5c53ab..7222b414779 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -156,8 +156,8 @@ typedef struct PgStat_EntryRef
 	 * Pending statistics data that will need to be flushed to shared memory
 	 * stats eventually. Each stats kind utilizing pending data defines what
 	 * format its pending data has and needs to provide a
-	 * PgStat_KindInfo->flush_pending_cb callback to merge pending into shared
-	 * stats.
+	 * PgStat_KindInfo->flush_dynamic_cb callback to merge pending entries
+	 * that are in dynamic memory into shared stats.
 	 */
 	void	   *pending;
 	dlist_node	pending_node;	/* membership in pgStatPending list */
@@ -259,10 +259,11 @@ typedef struct PgStat_KindInfo
 	void		(*init_backend_cb) (void);
 
 	/*
-	 * For variable-numbered stats: flush pending stats. Required if pending
-	 * data is used.  See flush_fixed_cb for fixed-numbered stats.
+	 * For variable-numbered stats: flush pending stats entries in dynamic
+	 * memory within the dshash. Required if pending data interacts with the
+	 * pgstats dshash.
 	 */
-	bool		(*flush_pending_cb) (PgStat_EntryRef *sr, bool nowait);
+	bool		(*flush_dynamic_cb) (PgStat_EntryRef *sr, bool nowait);
 
 	/*
 	 * For variable-numbered stats: delete pending stats. Optional.
@@ -289,17 +290,19 @@ typedef struct PgStat_KindInfo
 	void		(*init_shmem_cb) (void *stats);
 
 	/*
-	 * For fixed-numbered statistics: Flush pending stats. Returns true if
-	 * some of the stats could not be flushed, due to lock contention for
-	 * example. Optional.
+	 * For fixed-numbered or variable-numbered statistics: Check for pending
+	 * stats in need of flush, when these do not use the pgstats dshash.
+	 * Returns true if there are any stats pending for flush, triggering
+	 * flush_cb. Optional.
 	 */
-	bool		(*flush_fixed_cb) (bool nowait);
+	bool		(*have_pending_cb) (void);
 
 	/*
-	 * For fixed-numbered statistics: Check for pending stats in need of
-	 * flush. Returns true if there are any stats pending for flush. Optional.
+	 * For fixed-numbered or variable-numbered statistics: Flush pending
+	 * static stats. Returns true if some of the stats could not be flushed,
+	 * due to lock contention for example. Optional.
 	 */
-	bool		(*have_fixed_pending_cb) (void);
+	bool		(*flush_static_cb) (bool nowait);
 
 	/*
 	 * For fixed-numbered statistics: Reset All.
@@ -617,10 +620,11 @@ extern void pgstat_archiver_snapshot_cb(void);
 #define PGSTAT_BACKEND_FLUSH_IO		(1 << 0)	/* Flush I/O statistics */
 #define PGSTAT_BACKEND_FLUSH_ALL	(PGSTAT_BACKEND_FLUSH_IO)
 
-extern void pgstat_flush_backend(bool nowait, bits32 flags);
-extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
-extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
-extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
+extern bool pgstat_flush_backend(bool nowait, bits32 flags);
+extern bool pgstat_backend_flush_cb(bool nowait);
+extern bool pgstat_backend_have_pending_cb(void);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header,
+											  TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index 5db62bca66f..4f3691c702b 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -48,7 +48,7 @@ static const PgStat_KindInfo injection_stats = {
 	.shared_data_off = offsetof(PgStatShared_InjectionPoint, stats),
 	.shared_data_len = sizeof(((PgStatShared_InjectionPoint *) 0)->stats),
 	.pending_size = sizeof(PgStat_StatInjEntry),
-	.flush_pending_cb = injection_stats_flush_cb,
+	.flush_dynamic_cb = injection_stats_flush_cb,
 };
 
 /*
-- 
2.34.1

v3-0002-Fixing-pgstat_backend.c-Copyright.patchtext/x-diff; charset=us-asciiDownload
From 33dbb87cd92e29a71ea2012476aca1d760c8c557 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 20 Jan 2025 10:39:50 +0000
Subject: [PATCH v3 2/2] Fixing pgstat_backend.c Copyright

The file has been introduced in 9aea73fc61d, in 2024. So fixing its Copyright
accordingly.
---
 src/backend/utils/activity/pgstat_backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 100.0% src/backend/utils/activity/

diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index bcae6a78169..b0184b3293e 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -15,7 +15,7 @@
  * the pgstats dshash, relying on PendingBackendStats instead so as it is
  * possible to report data within critical sections.
  *
- * Copyright (c) 2001-2025, PostgreSQL Global Development Group
+ * Copyright (c) 2024-2025, PostgreSQL Global Development Group
  *
  * IDENTIFICATION
  *	  src/backend/utils/activity/pgstat_backend.c
-- 
2.34.1

on_top_of_michael_0002.txttext/plain; charset=us-asciiDownload
commit af786645e63b34e0d4eefbef7a8dd0794d358d0f
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date:   Mon Jan 20 09:57:26 2025 +0000

    Change flush callbacks names
    
    and the way pending backend stats are checks and set to zero.

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index bd1ffad6f46..0a4b687f045 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -292,7 +292,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Database *) 0)->stats),
 		.pending_size = sizeof(PgStat_StatDBEntry),
 
-		.flush_pending_cb = pgstat_database_flush_cb,
+		.flush_dynamic_cb = pgstat_database_flush_cb,
 		.reset_timestamp_cb = pgstat_database_reset_timestamp_cb,
 	},
 
@@ -307,7 +307,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Relation *) 0)->stats),
 		.pending_size = sizeof(PgStat_TableStatus),
 
-		.flush_pending_cb = pgstat_relation_flush_cb,
+		.flush_dynamic_cb = pgstat_relation_flush_cb,
 		.delete_pending_cb = pgstat_relation_delete_pending_cb,
 	},
 
@@ -322,7 +322,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Function *) 0)->stats),
 		.pending_size = sizeof(PgStat_FunctionCounts),
 
-		.flush_pending_cb = pgstat_function_flush_cb,
+		.flush_dynamic_cb = pgstat_function_flush_cb,
 	},
 
 	[PGSTAT_KIND_REPLSLOT] = {
@@ -355,7 +355,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Subscription *) 0)->stats),
 		.pending_size = sizeof(PgStat_BackendSubEntry),
 
-		.flush_pending_cb = pgstat_subscription_flush_cb,
+		.flush_dynamic_cb = pgstat_subscription_flush_cb,
 		.reset_timestamp_cb = pgstat_subscription_reset_timestamp_cb,
 	},
 
@@ -373,7 +373,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.pending_size = sizeof(PgStat_BackendPending),
 
 		.have_pending_cb = pgstat_backend_have_pending_cb,
-		.flush_cb = pgstat_backend_flush_cb,
+		.flush_static_cb = pgstat_backend_flush_cb,
 		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
 	},
 
@@ -438,7 +438,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_IO, stats),
 		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
 
-		.flush_cb = pgstat_io_flush_cb,
+		.flush_static_cb = pgstat_io_flush_cb,
 		.have_pending_cb = pgstat_io_have_pending_cb,
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
@@ -456,7 +456,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_SLRU, stats),
 		.shared_data_len = sizeof(((PgStatShared_SLRU *) 0)->stats),
 
-		.flush_cb = pgstat_slru_flush_cb,
+		.flush_static_cb = pgstat_slru_flush_cb,
 		.have_pending_cb = pgstat_slru_have_pending_cb,
 		.init_shmem_cb = pgstat_slru_init_shmem_cb,
 		.reset_all_cb = pgstat_slru_reset_all_cb,
@@ -475,7 +475,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Wal *) 0)->stats),
 
 		.init_backend_cb = pgstat_wal_init_backend_cb,
-		.flush_cb = pgstat_wal_flush_cb,
+		.flush_static_cb = pgstat_wal_flush_cb,
 		.have_pending_cb = pgstat_wal_have_pending_cb,
 		.init_shmem_cb = pgstat_wal_init_shmem_cb,
 		.reset_all_cb = pgstat_wal_reset_all_cb,
@@ -785,23 +785,20 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush of variable-numbered stats */
+	/* flush of variable-numbered stats tracked in pending entries list */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
-	/*
-	 * Flush of other stats, which could be variable-numbered or
-	 * fixed-numbered.
-	 */
+	/* flush stats for each registered kind that has a flush static callback */
 	for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
 	{
 		const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
 		if (!kind_info)
 			continue;
-		if (!kind_info->flush_cb)
+		if (!kind_info->flush_static_cb)
 			continue;
 
-		partial_flush |= kind_info->flush_cb(nowait);
+		partial_flush |= kind_info->flush_static_cb(nowait);
 	}
 
 	last_flush = now;
@@ -1291,7 +1288,7 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid, bool *creat
 	PgStat_EntryRef *entry_ref;
 
 	/* need to be able to flush out */
-	Assert(pgstat_get_kind_info(kind)->flush_pending_cb != NULL);
+	Assert(pgstat_get_kind_info(kind)->flush_dynamic_cb != NULL);
 
 	if (unlikely(!pgStatPendingContext))
 	{
@@ -1388,10 +1385,10 @@ pgstat_flush_pending_entries(bool nowait)
 		dlist_node *next;
 
 		Assert(!kind_info->fixed_amount);
-		Assert(kind_info->flush_pending_cb != NULL);
+		Assert(kind_info->flush_dynamic_cb != NULL);
 
 		/* flush the stats, if possible */
-		did_flush = kind_info->flush_pending_cb(entry_ref, nowait);
+		did_flush = kind_info->flush_dynamic_cb(entry_ref, nowait);
 
 		Assert(did_flush || nowait);
 
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index fb7abf64a0b..bcae6a78169 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -29,12 +29,11 @@
 #include "utils/pgstat_internal.h"
 
 /*
- * Backend statistics counts waiting to be flushed out.  We assume this variable
- * inits to zeroes.  These counters may be reported within critical sections so
- * we use static memory in order to avoid memory allocation.
+ * Backend statistics counts waiting to be flushed out. These counters may be
+ * reported within critical sections so we use static memory in order to avoid
+ * memory allocation.
  */
-static PgStat_BackendPending PendingBackendStats = {0};
-static bool have_backendstats = false;
+static PgStat_BackendPending PendingBackendStats;
 
 /*
  * Utility routines to report I/O stats for backends, kept here to avoid
@@ -53,8 +52,6 @@ pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
 
 	INSTR_TIME_ADD(PendingBackendStats.pending_io.pending_times[io_object][io_context][io_op],
 				   io_time);
-
-	have_backendstats = true;
 }
 
 void
@@ -68,8 +65,6 @@ pgstat_count_backend_io_op(IOObject io_object, IOContext io_context,
 
 	PendingBackendStats.pending_io.counts[io_object][io_context][io_op] += cnt;
 	PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
-
-	have_backendstats = true;
 }
 
 /*
@@ -98,8 +93,8 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 	PgStat_PendingIO pending_io;
 
 	/*
-	 * This function can be called even if nothing at all has happened for
-	 * IO statistics.  In this case, avoid unnecessarily modifying the stats
+	 * This function can be called even if nothing at all has happened for IO
+	 * statistics.  In this case, avoid unnecessarily modifying the stats
 	 * entry.
 	 */
 	if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
@@ -129,6 +124,11 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 			}
 		}
 	}
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
 }
 
 /*
@@ -142,7 +142,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
 {
 	PgStat_EntryRef *entry_ref;
 
-	if (!have_backendstats)
+	if (pg_memory_is_all_zeros(&PendingBackendStats,
+							   sizeof(struct PgStat_BackendPending)))
 		return false;
 
 	if (!pgstat_tracks_backend_bktype(MyBackendType))
@@ -159,12 +160,6 @@ pgstat_flush_backend(bool nowait, bits32 flags)
 
 	pgstat_unlock_entry(entry_ref);
 
-	/*
-	 * Clear out the statistics buffer, so it can be re-used.
-	 */
-	MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
-
-	have_backendstats = false;
 	return false;
 }
 
@@ -174,7 +169,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
 bool
 pgstat_backend_have_pending_cb(void)
 {
-	return have_backendstats;
+	return (!pg_memory_is_all_zeros(&PendingBackendStats,
+									sizeof(struct PgStat_BackendPending)));
 }
 
 /*
@@ -209,7 +205,6 @@ pgstat_create_backend(ProcNumber procnum)
 	pgstat_unlock_entry(entry_ref);
 
 	MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
-	have_backendstats = false;
 }
 
 /*
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 000ed5b36f6..7222b414779 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -156,8 +156,8 @@ typedef struct PgStat_EntryRef
 	 * Pending statistics data that will need to be flushed to shared memory
 	 * stats eventually. Each stats kind utilizing pending data defines what
 	 * format its pending data has and needs to provide a
-	 * PgStat_KindInfo->flush_pending_cb callback to merge pending into shared
-	 * stats.
+	 * PgStat_KindInfo->flush_dynamic_cb callback to merge pending entries
+	 * that are in dynamic memory into shared stats.
 	 */
 	void	   *pending;
 	dlist_node	pending_node;	/* membership in pgStatPending list */
@@ -259,10 +259,11 @@ typedef struct PgStat_KindInfo
 	void		(*init_backend_cb) (void);
 
 	/*
-	 * For variable-numbered stats: flush pending stats within the dshash.
-	 * Required if pending data interacts with the pgstats dshash.
+	 * For variable-numbered stats: flush pending stats entries in dynamic
+	 * memory within the dshash. Required if pending data interacts with the
+	 * pgstats dshash.
 	 */
-	bool		(*flush_pending_cb) (PgStat_EntryRef *sr, bool nowait);
+	bool		(*flush_dynamic_cb) (PgStat_EntryRef *sr, bool nowait);
 
 	/*
 	 * For variable-numbered stats: delete pending stats. Optional.
@@ -298,10 +299,10 @@ typedef struct PgStat_KindInfo
 
 	/*
 	 * For fixed-numbered or variable-numbered statistics: Flush pending
-	 * stats. Returns true if some of the stats could not be flushed, due
-	 * to lock contention for example. Optional.
+	 * static stats. Returns true if some of the stats could not be flushed,
+	 * due to lock contention for example. Optional.
 	 */
-	bool		(*flush_cb) (bool nowait);
+	bool		(*flush_static_cb) (bool nowait);
 
 	/*
 	 * For fixed-numbered statistics: Reset All.
diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index 5db62bca66f..4f3691c702b 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -48,7 +48,7 @@ static const PgStat_KindInfo injection_stats = {
 	.shared_data_off = offsetof(PgStatShared_InjectionPoint, stats),
 	.shared_data_len = sizeof(((PgStatShared_InjectionPoint *) 0)->stats),
 	.pending_size = sizeof(PgStat_StatInjEntry),
-	.flush_pending_cb = injection_stats_flush_cb,
+	.flush_dynamic_cb = injection_stats_flush_cb,
 };
 
 /*
#92Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#91)
Re: per backend I/O statistics

On Mon, Jan 20, 2025 at 11:10:40AM +0000, Bertrand Drouvot wrote:

I think that it would be better to make the distinction based on "local/static"
vs "dynamic memory" pending stats instead: I did so in v3 attached, using:

.flush_dynamic_cb(): flushes pending entries tracked in dynamic memory
.flush_static_cb(): flushes pending stats from static/global variable

"static" and "dynamic" feel a bit off when used together. For
"static", the stats are from a static state but they can be pushed to
the dshash, as well, which is in dynamic shared memory.

Hmm. Thinking more about that, I think that we should leave the
existing "flush_pending_cb" alone. For the two others, how about
"have_static_pending_cb" and "flush_static_cb" rather than "fixed"?
It seems important to me to show the relationship between these two.

Not sure about this one, see above. I mean it is currently Ok but once we'll
introduce the WAL part then it will not be correct depending of the flag value
being passed.

So, I did put back the previous logic in place (setting to zero only the stats
the flush callback is responsible for) in v3 attached.

Hmm. I think that I'm OK with what you are doing here with the next
parts in mind, based on this argument.

I don't think setting have_backendstats to false in pgstat_flush_backend()
is correct. I mean, it is correct currently but once we'll add the WAL part it
won't necessary be correct if the flags != PGSTAT_BACKEND_FLUSH_ALL. So, using a
pg_memory_is_all_zeros check on PendingBackendStats instead in the attached.

Indeed. I got this part slightly wrong here. What you are doing with
an all-zero check looks simpler in the long run.
--
Michael

#93Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#92)
1 attachment(s)
Re: per backend I/O statistics

Hi,

On Mon, Jan 20, 2025 at 09:11:16PM +0900, Michael Paquier wrote:

On Mon, Jan 20, 2025 at 11:10:40AM +0000, Bertrand Drouvot wrote:

I think that it would be better to make the distinction based on "local/static"
vs "dynamic memory" pending stats instead: I did so in v3 attached, using:

.flush_dynamic_cb(): flushes pending entries tracked in dynamic memory
.flush_static_cb(): flushes pending stats from static/global variable

"static" and "dynamic" feel a bit off when used together. For
"static", the stats are from a static state but they can be pushed to
the dshash, as well, which is in dynamic shared memory.

Right, that's another way to find it confusing ;-). But if we keep focus
on where the pending stats are stored, it is not.

Hmm. Thinking more about that, I think that we should leave the
existing "flush_pending_cb" alone. For the two others, how about
"have_static_pending_cb" and "flush_static_cb" rather than "fixed"?
It seems important to me to show the relationship between these two.

Yeah, that works for me: done in v4 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v4-0001-Rework-per-backend-pending-stats.patchtext/x-diff; charset=us-asciiDownload
From d57755402e9a9f672d2057c7f7ffa3832d402a3b Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Fri, 17 Jan 2025 06:44:47 +0000
Subject: [PATCH v4] Rework per backend pending stats

9aea73fc61d added per backend pending statistics but not all of them are well
suited to memory allocation and have to be outside of critical sections.

For those (the only current one is I/O statistics but WAL statistics is under
discussion), let's rely on a new PendingBackendStats instead.
---
 src/backend/utils/activity/pgstat.c          |  39 ++---
 src/backend/utils/activity/pgstat_backend.c  | 142 ++++++++++++-------
 src/backend/utils/activity/pgstat_io.c       |  23 +--
 src/backend/utils/activity/pgstat_relation.c |   4 +-
 src/include/pgstat.h                         |   9 ++
 src/include/utils/pgstat_internal.h          |  34 +++--
 6 files changed, 144 insertions(+), 107 deletions(-)
  76.8% src/backend/utils/activity/
  19.1% src/include/utils/
   4.0% src/include/

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 34520535d54..d4fc03b5d39 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -372,7 +372,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
 		.pending_size = sizeof(PgStat_BackendPending),
 
-		.flush_pending_cb = pgstat_backend_flush_cb,
+		.have_static_pending_cb = pgstat_backend_have_pending_cb,
+		.flush_static_cb = pgstat_backend_flush_cb,
 		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
 	},
 
@@ -437,8 +438,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_IO, stats),
 		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
 
-		.flush_fixed_cb = pgstat_io_flush_cb,
-		.have_fixed_pending_cb = pgstat_io_have_pending_cb,
+		.flush_static_cb = pgstat_io_flush_cb,
+		.have_static_pending_cb = pgstat_io_have_pending_cb,
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
 		.snapshot_cb = pgstat_io_snapshot_cb,
@@ -455,8 +456,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_SLRU, stats),
 		.shared_data_len = sizeof(((PgStatShared_SLRU *) 0)->stats),
 
-		.flush_fixed_cb = pgstat_slru_flush_cb,
-		.have_fixed_pending_cb = pgstat_slru_have_pending_cb,
+		.flush_static_cb = pgstat_slru_flush_cb,
+		.have_static_pending_cb = pgstat_slru_have_pending_cb,
 		.init_shmem_cb = pgstat_slru_init_shmem_cb,
 		.reset_all_cb = pgstat_slru_reset_all_cb,
 		.snapshot_cb = pgstat_slru_snapshot_cb,
@@ -474,8 +475,8 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_Wal *) 0)->stats),
 
 		.init_backend_cb = pgstat_wal_init_backend_cb,
-		.flush_fixed_cb = pgstat_wal_flush_cb,
-		.have_fixed_pending_cb = pgstat_wal_have_pending_cb,
+		.flush_static_cb = pgstat_wal_flush_cb,
+		.have_static_pending_cb = pgstat_wal_have_pending_cb,
 		.init_shmem_cb = pgstat_wal_init_shmem_cb,
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
@@ -713,22 +714,17 @@ pgstat_report_stat(bool force)
 	{
 		bool		do_flush = false;
 
-		/* Check for pending fixed-numbered stats */
+		/* Check for pending stats */
 		for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
 		{
 			const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
 			if (!kind_info)
 				continue;
-			if (!kind_info->fixed_amount)
-			{
-				Assert(kind_info->have_fixed_pending_cb == NULL);
-				continue;
-			}
-			if (!kind_info->have_fixed_pending_cb)
+			if (!kind_info->have_static_pending_cb)
 				continue;
 
-			if (kind_info->have_fixed_pending_cb())
+			if (kind_info->have_static_pending_cb())
 			{
 				do_flush = true;
 				break;
@@ -789,25 +785,20 @@ pgstat_report_stat(bool force)
 
 	partial_flush = false;
 
-	/* flush of variable-numbered stats */
+	/* flush of variable-numbered stats tracked in pending entries list */
 	partial_flush |= pgstat_flush_pending_entries(nowait);
 
-	/* flush of fixed-numbered stats */
+	/* flush stats for each registered kind that has a flush static callback */
 	for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
 	{
 		const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
 
 		if (!kind_info)
 			continue;
-		if (!kind_info->fixed_amount)
-		{
-			Assert(kind_info->flush_fixed_cb == NULL);
-			continue;
-		}
-		if (!kind_info->flush_fixed_cb)
+		if (!kind_info->flush_static_cb)
 			continue;
 
-		partial_flush |= kind_info->flush_fixed_cb(nowait);
+		partial_flush |= kind_info->flush_static_cb(nowait);
 	}
 
 	last_flush = now;
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 79e4d0a3053..bcae6a78169 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -11,7 +11,9 @@
  * This statistics kind uses a proc number as object ID for the hash table
  * of pgstats.  Entries are created each time a process is spawned, and are
  * dropped when the process exits.  These are not written to the pgstats file
- * on disk.
+ * on disk.  Pending statistics are managed without direct interactions with
+ * the pgstats dshash, relying on PendingBackendStats instead so as it is
+ * possible to report data within critical sections.
  *
  * Copyright (c) 2001-2025, PostgreSQL Global Development Group
  *
@@ -22,8 +24,49 @@
 
 #include "postgres.h"
 
+#include "storage/bufmgr.h"
+#include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
 
+/*
+ * Backend statistics counts waiting to be flushed out. These counters may be
+ * reported within critical sections so we use static memory in order to avoid
+ * memory allocation.
+ */
+static PgStat_BackendPending PendingBackendStats;
+
+/*
+ * Utility routines to report I/O stats for backends, kept here to avoid
+ * exposing PendingBackendStats to the outside world.
+ */
+void
+pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
+								IOOp io_op, instr_time io_time)
+{
+	Assert(track_io_timing);
+
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
+
+	INSTR_TIME_ADD(PendingBackendStats.pending_io.pending_times[io_object][io_context][io_op],
+				   io_time);
+}
+
+void
+pgstat_count_backend_io_op(IOObject io_object, IOContext io_context,
+						   IOOp io_op, uint32 cnt, uint64 bytes)
+{
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
+		return;
+
+	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
+
+	PendingBackendStats.pending_io.counts[io_object][io_context][io_op] += cnt;
+	PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
+}
+
 /*
  * Returns statistics of a backend by proc number.
  */
@@ -46,14 +89,21 @@ static void
 pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 {
 	PgStatShared_Backend *shbackendent;
-	PgStat_BackendPending *pendingent;
 	PgStat_BktypeIO *bktype_shstats;
-	PgStat_PendingIO *pending_io;
+	PgStat_PendingIO pending_io;
+
+	/*
+	 * This function can be called even if nothing at all has happened for IO
+	 * statistics.  In this case, avoid unnecessarily modifying the stats
+	 * entry.
+	 */
+	if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
+							   sizeof(struct PgStat_PendingIO)))
+		return;
 
 	shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
-	pendingent = (PgStat_BackendPending *) entry_ref->pending;
 	bktype_shstats = &shbackendent->stats.io_stats;
-	pending_io = &pendingent->pending_io;
+	pending_io = PendingBackendStats.pending_io;
 
 	for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
 	{
@@ -64,68 +114,74 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
 				instr_time	time;
 
 				bktype_shstats->counts[io_object][io_context][io_op] +=
-					pending_io->counts[io_object][io_context][io_op];
+					pending_io.counts[io_object][io_context][io_op];
 				bktype_shstats->bytes[io_object][io_context][io_op] +=
-					pending_io->bytes[io_object][io_context][io_op];
-
-				time = pending_io->pending_times[io_object][io_context][io_op];
+					pending_io.bytes[io_object][io_context][io_op];
+				time = pending_io.pending_times[io_object][io_context][io_op];
 
 				bktype_shstats->times[io_object][io_context][io_op] +=
 					INSTR_TIME_GET_MICROSEC(time);
 			}
 		}
 	}
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
 }
 
 /*
- * Wrapper routine to flush backend statistics.
+ * Flush out locally pending backend statistics
+ *
+ * "flags" parameter controls which statistics to flush.  Returns true
+ * if some statistics could not be flushed.
  */
-static bool
-pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
-						   bits32 flags)
+bool
+pgstat_flush_backend(bool nowait, bits32 flags)
 {
-	if (!pgstat_tracks_backend_bktype(MyBackendType))
+	PgStat_EntryRef *entry_ref;
+
+	if (pg_memory_is_all_zeros(&PendingBackendStats,
+							   sizeof(struct PgStat_BackendPending)))
 		return false;
 
-	if (!pgstat_lock_entry(entry_ref, nowait))
+	if (!pgstat_tracks_backend_bktype(MyBackendType))
 		return false;
 
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
+											MyProcNumber, nowait);
+	if (!entry_ref)
+		return true;
+
 	/* Flush requested statistics */
 	if (flags & PGSTAT_BACKEND_FLUSH_IO)
 		pgstat_flush_backend_entry_io(entry_ref);
 
 	pgstat_unlock_entry(entry_ref);
 
-	return true;
+	return false;
 }
 
 /*
- * Callback to flush out locally pending backend statistics.
- *
- * If no stats have been recorded, this function returns false.
+ * Check if there are any backend stats waiting for flush.
  */
 bool
-pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+pgstat_backend_have_pending_cb(void)
 {
-	return pgstat_flush_backend_entry(entry_ref, nowait, PGSTAT_BACKEND_FLUSH_ALL);
+	return (!pg_memory_is_all_zeros(&PendingBackendStats,
+									sizeof(struct PgStat_BackendPending)));
 }
 
 /*
- * Flush out locally pending backend statistics
+ * Callback to flush out locally pending backend statistics.
  *
- * "flags" parameter controls which statistics to flush.
+ * If some stats could not be flushed, return true.
  */
-void
-pgstat_flush_backend(bool nowait, bits32 flags)
+bool
+pgstat_backend_flush_cb(bool nowait)
 {
-	PgStat_EntryRef *entry_ref;
-
-	if (!pgstat_tracks_backend_bktype(MyBackendType))
-		return;
-
-	entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
-									 MyProcNumber, false, NULL);
-	(void) pgstat_flush_backend_entry(entry_ref, nowait, flags);
+	return pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_ALL);
 }
 
 /*
@@ -137,9 +193,8 @@ pgstat_create_backend(ProcNumber procnum)
 	PgStat_EntryRef *entry_ref;
 	PgStatShared_Backend *shstatent;
 
-	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
-										  procnum, NULL);
-
+	entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
+											MyProcNumber, false);
 	shstatent = (PgStatShared_Backend *) entry_ref->shared_stats;
 
 	/*
@@ -147,20 +202,9 @@ pgstat_create_backend(ProcNumber procnum)
 	 * e.g. if we previously used this proc number.
 	 */
 	memset(&shstatent->stats, 0, sizeof(shstatent->stats));
-}
-
-/*
- * Find or create a local PgStat_BackendPending entry for proc number.
- */
-PgStat_BackendPending *
-pgstat_prep_backend_pending(ProcNumber procnum)
-{
-	PgStat_EntryRef *entry_ref;
-
-	entry_ref = pgstat_prep_pending_entry(PGSTAT_KIND_BACKEND, InvalidOid,
-										  procnum, NULL);
+	pgstat_unlock_entry(entry_ref);
 
-	return entry_ref->pending;
+	MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
 }
 
 /*
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 027aad8b24e..6ff5d9e96a1 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -73,18 +73,12 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 	Assert(pgstat_is_ioop_tracked_in_bytes(io_op) || bytes == 0);
 	Assert(pgstat_tracks_io_op(MyBackendType, io_object, io_context, io_op));
 
-	if (pgstat_tracks_backend_bktype(MyBackendType))
-	{
-		PgStat_BackendPending *entry_ref;
-
-		entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-		entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
-		entry_ref->pending_io.bytes[io_object][io_context][io_op] += bytes;
-	}
-
 	PendingIOStats.counts[io_object][io_context][io_op] += cnt;
 	PendingIOStats.bytes[io_object][io_context][io_op] += bytes;
 
+	/* Add the per-backend counts */
+	pgstat_count_backend_io_op(io_object, io_context, io_op, cnt, bytes);
+
 	have_iostats = true;
 }
 
@@ -145,14 +139,9 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 		INSTR_TIME_ADD(PendingIOStats.pending_times[io_object][io_context][io_op],
 					   io_time);
 
-		if (pgstat_tracks_backend_bktype(MyBackendType))
-		{
-			PgStat_BackendPending *entry_ref;
-
-			entry_ref = pgstat_prep_backend_pending(MyProcNumber);
-			INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
-						   io_time);
-		}
+		/* Add the per-backend count */
+		pgstat_count_backend_io_op_time(io_object, io_context, io_op,
+										io_time);
 	}
 
 	pgstat_count_io_op(io_object, io_context, io_op, cnt, bytes);
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 09247ba0971..965a7fe2c64 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	 * VACUUM command has processed all tables and committed.
 	 */
 	pgstat_flush_io(false);
-	pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+	(void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
 }
 
 /*
@@ -351,7 +351,7 @@ pgstat_report_analyze(Relation rel,
 
 	/* see pgstat_report_vacuum() */
 	pgstat_flush_io(false);
-	pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
+	(void) pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
 }
 
 /*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 2d40fe3e70f..d0d45150977 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -540,6 +540,15 @@ extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
  * Functions in pgstat_backend.c
  */
 
+/* used by pgstat_io.c for I/O stats tracked in backends */
+extern void pgstat_count_backend_io_op_time(IOObject io_object,
+											IOContext io_context,
+											IOOp io_op,
+											instr_time io_time);
+extern void pgstat_count_backend_io_op(IOObject io_object,
+									   IOContext io_context,
+									   IOOp io_op, uint32 cnt,
+									   uint64 bytes);
 extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
 extern bool pgstat_tracks_backend_bktype(BackendType bktype);
 extern void pgstat_create_backend(ProcNumber procnum);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 4bb8e5c53ab..c0e77488aed 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -156,8 +156,8 @@ typedef struct PgStat_EntryRef
 	 * Pending statistics data that will need to be flushed to shared memory
 	 * stats eventually. Each stats kind utilizing pending data defines what
 	 * format its pending data has and needs to provide a
-	 * PgStat_KindInfo->flush_pending_cb callback to merge pending into shared
-	 * stats.
+	 * PgStat_KindInfo->flush_pending_cb callback to merge pending entries
+	 * that are in dynamic memory into shared stats.
 	 */
 	void	   *pending;
 	dlist_node	pending_node;	/* membership in pgStatPending list */
@@ -259,8 +259,9 @@ typedef struct PgStat_KindInfo
 	void		(*init_backend_cb) (void);
 
 	/*
-	 * For variable-numbered stats: flush pending stats. Required if pending
-	 * data is used.  See flush_fixed_cb for fixed-numbered stats.
+	 * For variable-numbered stats: flush pending stats entries in dynamic
+	 * memory within the dshash. Required if pending data interacts with the
+	 * pgstats dshash.
 	 */
 	bool		(*flush_pending_cb) (PgStat_EntryRef *sr, bool nowait);
 
@@ -289,17 +290,19 @@ typedef struct PgStat_KindInfo
 	void		(*init_shmem_cb) (void *stats);
 
 	/*
-	 * For fixed-numbered statistics: Flush pending stats. Returns true if
-	 * some of the stats could not be flushed, due to lock contention for
-	 * example. Optional.
+	 * For fixed-numbered or variable-numbered statistics: Check for pending
+	 * stats in need of flush, when these do not use the pgstats dshash.
+	 * Returns true if there are any stats pending for flush, triggering
+	 * flush_cb. Optional.
 	 */
-	bool		(*flush_fixed_cb) (bool nowait);
+	bool		(*have_static_pending_cb) (void);
 
 	/*
-	 * For fixed-numbered statistics: Check for pending stats in need of
-	 * flush. Returns true if there are any stats pending for flush. Optional.
+	 * For fixed-numbered or variable-numbered statistics: Flush pending
+	 * static stats. Returns true if some of the stats could not be flushed,
+	 * due to lock contention for example. Optional.
 	 */
-	bool		(*have_fixed_pending_cb) (void);
+	bool		(*flush_static_cb) (bool nowait);
 
 	/*
 	 * For fixed-numbered statistics: Reset All.
@@ -617,10 +620,11 @@ extern void pgstat_archiver_snapshot_cb(void);
 #define PGSTAT_BACKEND_FLUSH_IO		(1 << 0)	/* Flush I/O statistics */
 #define PGSTAT_BACKEND_FLUSH_ALL	(PGSTAT_BACKEND_FLUSH_IO)
 
-extern void pgstat_flush_backend(bool nowait, bits32 flags);
-extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
-extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
-extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
+extern bool pgstat_flush_backend(bool nowait, bits32 flags);
+extern bool pgstat_backend_flush_cb(bool nowait);
+extern bool pgstat_backend_have_pending_cb(void);
+extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header,
+											  TimestampTz ts);
 
 /*
  * Functions in pgstat_bgwriter.c
-- 
2.34.1

#94Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#93)
Re: per backend I/O statistics

On Mon, Jan 20, 2025 at 01:26:55PM +0000, Bertrand Drouvot wrote:

Right, that's another way to find it confusing ;-). But if we keep focus
on where the pending stats are stored, it is not.

The name of the callbacks can be tuned infinitely. I have split the
renaming into its own patch, then the result looked fine after a
second look so I have applied both things separately to unlock all the
WAL patches.
--
Michael