Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Started by Bertrand Drouvot11 months ago15 messages
#1Bertrand Drouvot
bertranddrouvot.pg@gmail.com
3 attachment(s)

Hi hackers,

a051e71e28a added the WAL writes and fsyncs information (and their related timing)
into the pg_stat_io view so that one can now find the same information in 2 places:
pg_stat_wal and pg_stat_io (for the wal object). This lead to a discussion in
[1]: /messages/by-id/Z7NSc5j6M4g2r1HD@ip-10-97-1-34.eu-west-3.compute.internal

=== Let's sump up in this dedicated thread what has been discussed in [1]/messages/by-id/Z7NSc5j6M4g2r1HD@ip-10-97-1-34.eu-west-3.compute.internal

1. The pg_stat_io's writes and fsyncs (and their relative timing) are incremented
at the exact same places as their pg_stat_wal counterparts.

2. pgstat_tracks_io_bktype() returns false for backend's that do not generate WAL
(so that we don't lost WAL information in pg_stat_io).

3. pg_stat_stat.wal_write is the same value as "select sum(writes)
from pg_stat_io where object = 'wal' and context = 'normal'" as these
are incremented in XLogWrite().

4. Same argument about pg_stat_wal.wal_write_time with
pg_stat_io.write_time.

5. issue_xlog_fsync() tells that pg_stat_wal.wal_sync_time and
sum(pg_stat_io.fsync_time) under object=wal and context=normal are the
same values.

6. Same argument with the fsync counters pg_stat_wal.wal_sync and
pg_stat_io.fsyncs.

7. This will encourage monitoring pull to move to pg_stat_io, where there is much
more context and granularity of the stats data.

8. That will simplify the work done for the per backend WAL statistics ([1]/messages/by-id/Z7NSc5j6M4g2r1HD@ip-10-97-1-34.eu-west-3.compute.internal)

=== Remarks

R1. The "bytes" fields differ (as pg_stat_wal.wal_bytes somehow "focus" on the wal
records size while the pg_stat_io's unit is the wal_block_size) so we keep them
in both places.

R2. track_wal_io_timing becomes useless once those fields are gone from pg_stat_wal

R3. PendingWalStats becomes empty, so removes it.

=== Patch structure

PFA 3 sub patches:

- 0001: Add details in the pg_stat_io doc about the wal object. That's basically
more or less a copy/paste from the pg_stat_wal fields description that will be
removed in 0002. As we already track them in pg_stat_io, it's done in a dedicated
patch.

- 0002: Remove wal_[sync|write][_time] from pg_stat_wal, PendingWalStats and
track_wal_io_timing. That does not make sense to split this work in sub-patches
so that all of this in done in 0002.

- 0003: remove the pgstat_prepare_io_time() parameter. Now that track_wal_io_timing
is gone there is no need for pgstat_prepare_io_time() to get a parameter.

[1]: /messages/by-id/Z7NSc5j6M4g2r1HD@ip-10-97-1-34.eu-west-3.compute.internal

Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-Add-details-in-the-pg_stat_io-doc-about-the-wal-o.patchtext/x-diff; charset=us-asciiDownload
From bc9dec707e2b12a6c6e7d1f654fbe41c8e5bbb65 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 18 Feb 2025 07:14:05 +0000
Subject: [PATCH v1 1/3] Add details in the pg_stat_io doc about the wal object

Adding details about writes, fsyncs, write_time and sync_time when linked
to the wal object. Same level of details as for wal_write, wal_sync, wal_write_time
and wal_sync_time in pg_stat_wal.
---
 doc/src/sgml/monitoring.sgml | 42 ++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 928a6eb64b0..2c61c0d1e50 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2930,6 +2930,48 @@ description | Waiting for a newly initialized WAL file to reach durable storage
    writer</literal>.
   </para>
 
+  <para>
+   For the <literal>wal</literal> <structfield>object</structfield>:
+   <itemizedlist>
+    <listitem>
+     <para>
+      <structfield>writes</structfield> is the number of times WAL buffers were
+       written out to disk via <function>XLogWrite</function> request.
+       See <xref linkend="wal-configuration"/> for more information about the internal
+       WAL function <function>XLogWrite</function>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      <structfield>fsyncs</structfield> is the number of times WAL files were
+       synced to disk via <function>issue_xlog_fsync</function> request (if <xref linkend="guc-fsync"/>
+       is <literal>on</literal> and <xref linkend="guc-wal-sync-method"/> is either
+       <literal>fdatasync</literal>, <literal>fsync</literal> or <literal>fsync_writethrough</literal>,
+       otherwise zero). See <xref linkend="wal-configuration"/> for more information
+       about the internal WAL function <function>issue_xlog_fsync</function>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      <structfield>write_time</structfield> is the total amount of time spent writing
+       WAL buffers to disk via <function>XLogWrite</function> request.
+       This includes the sync time when <varname>wal_sync_method</varname>
+       is either <literal>open_datasync</literal> or <literal>open_sync</literal>.
+     </para>
+    </listitem>
+    <listitem>
+     <para>
+      <structfield>sync_time</structfield> is the total amount of time spent syncing
+       WAL files to disk via <function>issue_xlog_fsync</function> request (if
+       <varname>fsync</varname> is <literal>on</literal>,
+       and <varname>wal_sync_method</varname> is either <literal>fdatasync</literal>,
+       <literal>fsync</literal> or <literal>fsync_writethrough</literal>,
+       otherwise zero).
+     </para>
+    </listitem>
+   </itemizedlist>
+  </para>
+
   <para>
    <structname>pg_stat_io</structname> can be used to inform database tuning.
    For example:
-- 
2.34.1

v1-0002-Remove-wal_-sync-write-_time-from-pg_stat_wal-and.patchtext/x-diff; charset=us-asciiDownload
From 42b2d46d80f082829c91f9c91f844ab750f2cafd Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 18 Feb 2025 07:58:24 +0000
Subject: [PATCH v1 2/3] Remove wal_[sync|write][_time] from pg_stat_wal and
 track_wal_io_timing

a051e71e28a added this information into pg_stat_io (with more details and
granularity), so there is no need to keep it in pg_stat_wal. This also
allows to remove PendingWalStats and simplifies up coming commits related
to per backend WAL statistics. Once done, track_wal_io_timing becomes useless
so it is also removed.
---
 doc/src/sgml/config.sgml                      | 22 -------
 doc/src/sgml/monitoring.sgml                  | 62 -------------------
 doc/src/sgml/wal.sgml                         | 12 ++--
 src/backend/access/transam/xlog.c             | 38 ++----------
 src/backend/catalog/system_views.sql          |  4 --
 src/backend/utils/activity/pgstat_wal.c       | 24 +------
 src/backend/utils/adt/pgstatfuncs.c           | 20 +-----
 src/backend/utils/misc/guc_tables.c           |  9 ---
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/access/xlog.h                     |  1 -
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          | 28 ---------
 src/test/regress/expected/rules.out           |  6 +-
 src/tools/pgindent/typedefs.list              |  1 -
 14 files changed, 19 insertions(+), 215 deletions(-)
  48.6% doc/src/sgml/
  12.7% src/backend/access/transam/
  11.1% src/backend/utils/activity/
  10.6% src/backend/utils/adt/
   3.4% src/backend/
   3.2% src/include/catalog/
   7.9% src/include/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 336630ce417..05760241174 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8327,28 +8327,6 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
-      <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
-      <indexterm>
-       <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Enables timing of WAL I/O calls. This parameter is off by default,
-        as it will repeatedly query the operating system for the current time,
-        which may cause significant overhead on some platforms.
-        You can use the <application>pg_test_timing</application> tool to
-        measure the overhead of timing on your system.
-        I/O timing information is
-        displayed in <link linkend="monitoring-pg-stat-wal-view">
-        <structname>pg_stat_wal</structname></link>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
-       </para>
-      </listitem>
-     </varlistentry>
-
      <varlistentry id="guc-track-functions" xreflabel="track_functions">
       <term><varname>track_functions</varname> (<type>enum</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 2c61c0d1e50..8cbc89aa797 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -183,11 +183,6 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
    of block read, write, extend, and fsync times.
   </para>
 
-  <para>
-   The parameter <xref linkend="guc-track-wal-io-timing"/> enables monitoring
-   of WAL write and fsync times.
-  </para>
-
   <para>
    Normally these parameters are set in <filename>postgresql.conf</filename> so
    that they apply to all server processes, but it is possible to turn
@@ -3297,63 +3292,6 @@ description | Waiting for a newly initialized WAL file to reach durable storage
       </para></entry>
      </row>
 
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_write</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of times WAL buffers were written out to disk via
-       <function>XLogWrite</function> request.
-       See <xref linkend="wal-configuration"/> for more information about
-       the internal WAL function <function>XLogWrite</function>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_sync</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of times WAL files were synced to disk via
-       <function>issue_xlog_fsync</function> request
-       (if <xref linkend="guc-fsync"/> is <literal>on</literal> and
-       <xref linkend="guc-wal-sync-method"/> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
-       See <xref linkend="wal-configuration"/> for more information about
-       the internal WAL function <function>issue_xlog_fsync</function>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_write_time</structfield> <type>double precision</type>
-      </para>
-      <para>
-       Total amount of time spent writing WAL buffers to disk via
-       <function>XLogWrite</function> request, in milliseconds
-       (if <xref linkend="guc-track-wal-io-timing"/> is enabled,
-       otherwise zero).  This includes the sync time when
-       <varname>wal_sync_method</varname> is either
-       <literal>open_datasync</literal> or <literal>open_sync</literal>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_sync_time</structfield> <type>double precision</type>
-      </para>
-      <para>
-       Total amount of time spent syncing WAL files to disk via
-       <function>issue_xlog_fsync</function> request, in milliseconds
-       (if <varname>track_wal_io_timing</varname> is enabled,
-       <varname>fsync</varname> is <literal>on</literal>, and
-       <varname>wal_sync_method</varname> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
-      </para></entry>
-     </row>
-
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index b908720adea..8f3ac982b0e 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -810,11 +810,11 @@
   <para>
    There are two internal functions to write WAL data to disk:
    <function>XLogWrite</function> and <function>issue_xlog_fsync</function>.
-   When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
+   When <xref linkend="guc-track-io-timing"/> is enabled, the total
    amounts of time <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
-   <literal>wal_write_time</literal> and <literal>wal_sync_time</literal> in
-   <xref linkend="pg-stat-wal-view"/>, respectively.
+   <literal>write_time</literal> and <literal>sync_time</literal> in
+   <xref linkend="pg-stat-io-view"/> for the wal <literal>object</literal>, respectively.
    <function>XLogWrite</function> is normally called by
    <function>XLogInsertRecord</function> (when there is no space for the new
    record in WAL buffers), <function>XLogFlush</function> and the WAL writer,
@@ -829,11 +829,11 @@
    <literal>fsync</literal>, or <literal>fsync_writethrough</literal>,
    the write operation moves WAL buffers to kernel cache and
    <function>issue_xlog_fsync</function> syncs them to disk. Regardless
-   of the setting of <varname>track_wal_io_timing</varname>, the number
+   of the setting of <varname>track_io_timing</varname>, the number
    of times <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are also
-   counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
-   in <structname>pg_stat_wal</structname>, respectively.
+   counted as <literal>writes</literal> and <literal>fsyncs</literal>
+   in <structname>pg_stat_io</structname> for the wal <literal>object</literal>, respectively.
   </para>
 
   <para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 25a5c605404..5a1940a42f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -134,7 +134,6 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 int			wal_decode_buffer_size = 512 * 1024;
-bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -2436,10 +2435,9 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				errno = 0;
 
 				/*
-				 * Measure I/O timing to write WAL data, for pg_stat_io and/or
-				 * pg_stat_wal.
+				 * Measure I/O timing to write WAL data, for pg_stat_io.
 				 */
-				start = pgstat_prepare_io_time(track_io_timing || track_wal_io_timing);
+				start = pgstat_prepare_io_time(track_io_timing);
 
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
@@ -2448,20 +2446,6 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
 										IOOP_WRITE, start, 1, written);
 
-				/*
-				 * Increment the I/O timing and the number of times WAL data
-				 * were written out to disk.
-				 */
-				if (track_wal_io_timing)
-				{
-					instr_time	end;
-
-					INSTR_TIME_SET_CURRENT(end);
-					INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
-				}
-
-				PendingWalStats.wal_write++;
-
 				if (written <= 0)
 				{
 					char		xlogfname[MAXFNAMELEN];
@@ -8718,10 +8702,9 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		return;
 
 	/*
-	 * Measure I/O timing to sync the WAL file for pg_stat_io and/or
-	 * pg_stat_wal.
+	 * Measure I/O timing to sync the WAL file for pg_stat_io.
 	 */
-	start = pgstat_prepare_io_time(track_io_timing || track_wal_io_timing);
+	start = pgstat_prepare_io_time(track_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (wal_sync_method)
@@ -8767,21 +8750,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 
 	pgstat_report_wait_end();
 
-	/*
-	 * Increment the I/O timing and the number of times WAL files were synced.
-	 */
-	if (track_wal_io_timing)
-	{
-		instr_time	end;
-
-		INSTR_TIME_SET_CURRENT(end);
-		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
-	}
-
 	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
 							start, 1, 0);
-
-	PendingWalStats.wal_sync++;
 }
 
 /*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eff0990957e..a4d2cfdcaf5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1189,10 +1189,6 @@ CREATE VIEW pg_stat_wal AS
         w.wal_fpi,
         w.wal_bytes,
         w.wal_buffers_full,
-        w.wal_write,
-        w.wal_sync,
-        w.wal_write_time,
-        w.wal_sync_time,
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index c1ca65eb991..4dc41a4a590 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -21,8 +21,6 @@
 #include "utils/pgstat_internal.h"
 
 
-PgStat_PendingWalStats PendingWalStats = {0};
-
 /*
  * WAL usage counters saved from pgWalUsage at the previous call to
  * pgstat_report_wal(). This is used to calculate how much WAL usage
@@ -118,17 +116,10 @@ pgstat_wal_flush_cb(bool nowait)
 
 #define WALSTAT_ACC(fld, var_to_add) \
 	(stats_shmem->stats.fld += var_to_add.fld)
-#define WALSTAT_ACC_INSTR_TIME(fld) \
-	(stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
 	WALSTAT_ACC(wal_records, wal_usage_diff);
 	WALSTAT_ACC(wal_fpi, wal_usage_diff);
 	WALSTAT_ACC(wal_bytes, wal_usage_diff);
 	WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
-	WALSTAT_ACC(wal_write, PendingWalStats);
-	WALSTAT_ACC(wal_sync, PendingWalStats);
-	WALSTAT_ACC_INSTR_TIME(wal_write_time);
-	WALSTAT_ACC_INSTR_TIME(wal_sync_time);
-#undef WALSTAT_ACC_INSTR_TIME
 #undef WALSTAT_ACC
 
 	LWLockRelease(&stats_shmem->lock);
@@ -138,11 +129,6 @@ pgstat_wal_flush_cb(bool nowait)
 	 */
 	prevWalUsage = pgWalUsage;
 
-	/*
-	 * Clear out the statistics buffer, so it can be re-used.
-	 */
-	MemSet(&PendingWalStats, 0, sizeof(PendingWalStats));
-
 	return false;
 }
 
@@ -158,18 +144,12 @@ pgstat_wal_init_backend_cb(void)
 }
 
 /*
- * To determine whether any WAL activity has occurred since last time, not
- * only the number of generated WAL records but also the numbers of WAL
- * writes and syncs need to be checked. Because even transaction that
- * generates no WAL records can write or sync WAL data when flushing the
- * data pages.
+ * To determine whether WAL usage happened.
  */
 bool
 pgstat_wal_have_pending_cb(void)
 {
-	return pgWalUsage.wal_records != prevWalUsage.wal_records ||
-		PendingWalStats.wal_write != 0 ||
-		PendingWalStats.wal_sync != 0;
+	return pgWalUsage.wal_records != prevWalUsage.wal_records;
 }
 
 void
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index e9096a88492..68e16e52ab6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1637,7 +1637,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_wal(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_WAL_COLS	9
+#define PG_STAT_GET_WAL_COLS	5
 	TupleDesc	tupdesc;
 	Datum		values[PG_STAT_GET_WAL_COLS] = {0};
 	bool		nulls[PG_STAT_GET_WAL_COLS] = {0};
@@ -1654,15 +1654,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 					   NUMERICOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_write",
-					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_sync",
-					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_time",
-					   FLOAT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_sync_time",
-					   FLOAT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 
 	BlessTupleDesc(tupdesc);
@@ -1682,14 +1674,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 									Int32GetDatum(-1));
 
 	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[4] = Int64GetDatum(wal_stats->wal_write);
-	values[5] = Int64GetDatum(wal_stats->wal_sync);
-
-	/* Convert counters from microsec to millisec for display */
-	values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
-	values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
 
-	values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+	values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index cce73314609..6d4cf7609a7 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1499,15 +1499,6 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
-	{
-		{"track_wal_io_timing", PGC_SUSET, STATS_CUMULATIVE,
-			gettext_noop("Collects timing statistics for WAL I/O activity."),
-			NULL
-		},
-		&track_wal_io_timing,
-		false,
-		NULL, NULL, NULL
-	},
 
 	{
 		{"update_process_title", PGC_SUSET, PROCESS_TITLE,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d472987ed46..c64b3e8a0e6 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -637,7 +637,6 @@
 #track_counts = on
 #track_cost_delay_timing = off
 #track_io_timing = off
-#track_wal_io_timing = off
 #track_functions = none			# none, pl, all
 #stats_fetch_consistency = cache	# cache, none, snapshot
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4411c1468ac..bd18d95a6cf 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -54,7 +54,6 @@ extern PGDLLIMPORT char *wal_consistency_checking_string;
 extern PGDLLIMPORT bool log_checkpoints;
 extern PGDLLIMPORT int CommitDelay;
 extern PGDLLIMPORT int CommitSiblings;
-extern PGDLLIMPORT bool track_wal_io_timing;
 extern PGDLLIMPORT int wal_decode_buffer_size;
 
 extern PGDLLIMPORT int CheckPointSegments;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d7..1e1626964e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5950,9 +5950,9 @@
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o}',
-  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+  proallargtypes => '{int8,int8,numeric,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o}',
+  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
 { oid => '6248', descr => 'statistics: information about WAL prefetching',
   proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 53f2a8458e6..a3a341cc604 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -480,28 +480,9 @@ typedef struct PgStat_WalStats
 	PgStat_Counter wal_fpi;
 	uint64		wal_bytes;
 	PgStat_Counter wal_buffers_full;
-	PgStat_Counter wal_write;
-	PgStat_Counter wal_sync;
-	PgStat_Counter wal_write_time;
-	PgStat_Counter wal_sync_time;
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
-/*
- * This struct stores wal-related durations as instr_time, which makes it
- * cheaper and easier to accumulate them, by not requiring type
- * conversions. During stats flush instr_time will be converted into
- * microseconds.
- */
-typedef struct PgStat_PendingWalStats
-{
-	PgStat_Counter wal_write;
-	PgStat_Counter wal_sync;
-	instr_time	wal_write_time;
-	instr_time	wal_sync_time;
-} PgStat_PendingWalStats;
-
-
 /*
  * Functions in pgstat.c
  */
@@ -834,13 +815,4 @@ extern PGDLLIMPORT PgStat_Counter pgStatTransactionIdleTime;
 /* updated by the traffic cop and in errfinish() */
 extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 
-
-/*
- * Variables in pgstat_wal.c
- */
-
-/* updated directly by backends and background processes */
-extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
-
-
 #endif							/* PGSTAT_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5baba8d39ff..62f69ac20b2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2259,12 +2259,8 @@ pg_stat_wal| SELECT wal_records,
     wal_fpi,
     wal_bytes,
     wal_buffers_full,
-    wal_write,
-    wal_sync,
-    wal_write_time,
-    wal_sync_time,
     stats_reset
-   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_sync, wal_write_time, wal_sync_time, stats_reset);
+   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, stats_reset);
 pg_stat_wal_receiver| SELECT pid,
     status,
     receive_start_lsn,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bce4214503d..740777127e9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2163,7 +2163,6 @@ PgStat_KindInfo
 PgStat_LocalState
 PgStat_PendingDroppedStatsItem
 PgStat_PendingIO
-PgStat_PendingWalStats
 PgStat_SLRUStats
 PgStat_ShmemControl
 PgStat_Snapshot
-- 
2.34.1

v1-0003-remove-the-pgstat_prepare_io_time-parameter.patchtext/x-diff; charset=us-asciiDownload
From 6f878f1bbfb119b1dbe95b4cb26b30c97b7a047e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 18 Feb 2025 08:52:41 +0000
Subject: [PATCH v1 3/3] remove the pgstat_prepare_io_time() parameter

Now that track_wal_io_timing is gone there is no need for pgstat_prepare_io_time()
to get a parameter.
---
 src/backend/access/transam/xlog.c         |  8 ++++----
 src/backend/access/transam/xlogreader.c   |  2 +-
 src/backend/access/transam/xlogrecovery.c |  2 +-
 src/backend/storage/buffer/bufmgr.c       | 10 +++++-----
 src/backend/storage/buffer/localbuf.c     |  4 ++--
 src/backend/storage/smgr/md.c             |  4 ++--
 src/backend/utils/activity/pgstat_io.c    |  7 +++----
 src/include/pgstat.h                      |  2 +-
 8 files changed, 19 insertions(+), 20 deletions(-)
  31.1% src/backend/access/transam/
  37.2% src/backend/storage/buffer/
  10.5% src/backend/storage/smgr/
  14.7% src/backend/utils/activity/
   6.2% src/include/

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5a1940a42f5..24624c84748 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2437,7 +2437,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				/*
 				 * Measure I/O timing to write WAL data, for pg_stat_io.
 				 */
-				start = pgstat_prepare_io_time(track_io_timing);
+				start = pgstat_prepare_io_time();
 
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
@@ -3248,7 +3248,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
 	/* Measure I/O timing when initializing segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
@@ -3310,7 +3310,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 
 	/* Measure I/O timing when flushing segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
@@ -8704,7 +8704,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	/*
 	 * Measure I/O timing to sync the WAL file for pg_stat_io.
 	 */
-	start = pgstat_prepare_io_time(track_io_timing);
+	start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (wal_sync_method)
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 31bffc6f501..995e4f7c26f 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1557,7 +1557,7 @@ WALRead(XLogReaderState *state,
 
 #ifndef FRONTEND
 		/* Measure I/O timing when reading segment */
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 #endif
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 473de6710d7..ac70b73749c 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3401,7 +3401,7 @@ retry:
 	readOff = targetPageOff;
 
 	/* Measure I/O timing when reading segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 	r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 80b0d0c5ded..e880bed4627 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1509,7 +1509,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 			io_pages[io_buffers_len++] = BufferGetBlock(buffers[i]);
 		}
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 		smgrreadv(operation->smgr, forknum, io_first_block, io_pages, io_buffers_len);
 		pgstat_count_io_op_time(io_object, io_context, IOOP_READ, io_start,
 								1, io_buffers_len * BLCKSZ);
@@ -2401,7 +2401,7 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * Note: if smgrzeroextend fails, we will end up with buffers that are
@@ -3858,7 +3858,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 */
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
@@ -4459,7 +4459,7 @@ FlushRelationBuffers(Relation rel)
 
 				PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-				io_start = pgstat_prepare_io_time(track_io_timing);
+				io_start = pgstat_prepare_io_time();
 
 				smgrwrite(srel,
 						  BufTagGetForkNum(&bufHdr->tag),
@@ -5916,7 +5916,7 @@ IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context)
 	sort_pending_writebacks(wb_context->pending_writebacks,
 							wb_context->nr_pending);
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * Coalesce neighbouring writes, but nothing else. For that we iterate
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 64931efaa75..78b65fb160a 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -244,7 +244,7 @@ GetLocalVictimBuffer(void)
 
 		PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		/* And write... */
 		smgrwrite(oreln,
@@ -414,7 +414,7 @@ ExtendBufferedRelLocal(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/* actually extend relation */
 	smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false);
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 7bf0b45e2c3..298754d1b64 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1381,7 +1381,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
@@ -1786,7 +1786,7 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 		need_to_close = true;
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/* Sync the file. */
 	result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 28a431084b8..e7362b52a37 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -83,15 +83,14 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 /*
- * Initialize the internal timing for an IO operation, depending on an
- * IO timing GUC.
+ * Initialize the internal timing for an IO operation.
  */
 instr_time
-pgstat_prepare_io_time(bool track_io_guc)
+pgstat_prepare_io_time(void)
 {
 	instr_time	io_start;
 
-	if (track_io_guc)
+	if (track_io_timing)
 		INSTR_TIME_SET_CURRENT(io_start);
 	else
 	{
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a3a341cc604..030b101028f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -564,7 +564,7 @@ extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 										 BackendType bktype);
 extern void pgstat_count_io_op(IOObject io_object, IOContext io_context,
 							   IOOp io_op, uint32 cnt, uint64 bytes);
-extern instr_time pgstat_prepare_io_time(bool track_io_guc);
+extern instr_time pgstat_prepare_io_time(void);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time,
 									uint32 cnt, uint64 bytes);
-- 
2.34.1

#2Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#1)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

On Tue, Feb 18, 2025 at 10:43:15AM +0000, Bertrand Drouvot wrote:

=== Remarks

As mentioned on a previous thread, I am siding with the removal of
these fields. There are more benefits on terms of granularity of this
data to encourage users to move away from pg_stat_wal and stick to
pg_stat_io. Reasons are posted here:
/messages/by-id/Z7PHiAyt6GI0_Tlt@paquier.xyz

R1. The "bytes" fields differ (as pg_stat_wal.wal_bytes somehow "focus" on the wal
records size while the pg_stat_io's unit is the wal_block_size) so we keep them
in both places.

Yes, I don't think we should drop that anyway. There is no equivalent
mapping and pg_stat_wal.bytes is not I/O. This is also useful in the
WAL stats for the backends.

R2. track_wal_io_timing becomes useless once those fields are gone
from pg_stat_wal

R3. PendingWalStats becomes empty, so removes it.

Sounds fine to me. The simplifications are in pgstat_wal.c, where we
can now depend only on the diffs in the WalUsage, which we rely on for
EXPLAIN, VACUUM/ANALYZE and pgss. This makes the implementation of
the backend WAL stats much simpler.

=== Patch structure

PFA 3 sub patches:

- 0001: Add details in the pg_stat_io doc about the wal object. That's basically
more or less a copy/paste from the pg_stat_wal fields description that will be
removed in 0002. As we already track them in pg_stat_io, it's done in a dedicated
patch.

These additions feel unbalanced with the existing contents, and
overlap with the follow-up paragraph about the tuning that can be
guessed from the contents of pg_stat_io because the new content refers
twice to the section "WAL Configuration". Perhaps this could be
simpler, with one sentence in the tuning part? I would propose
something like:
For the WAL object, "fsyncs" and "sync_time" track the sync activity
of WAL files via issue_xlog_fsync(). "writes" and "write_time" track
the write activity of WAL files via XLogWrite(). See <xref
linkend="wal-configuration"/> for more information.

My point is that the "WAL configuration" section already provides all
the details that were in pg_stat_wal removed in 0002 and added in
0001, leading to duplicated contents.

- 0002: Remove wal_[sync|write][_time] from pg_stat_wal, PendingWalStats and
track_wal_io_timing. That does not make sense to split this work in sub-patches
so that all of this in done in 0002.
- 0003: remove the pgstat_prepare_io_time() parameter. Now that track_wal_io_timing
is gone there is no need for pgstat_prepare_io_time() to get a parameter.

0002 and 0003 can be grouped in the same commit, IMO.

-   When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
+   When <xref linkend="guc-track-io-timing"/> is enabled, the total
    amounts of time <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
-   <literal>wal_write_time</literal> and <literal>wal_sync_time</literal> in
-   <xref linkend="pg-stat-wal-view"/>, respectively.
+   <literal>write_time</literal> and <literal>sync_time</literal> in
+   <xref linkend="pg-stat-io-view"/> for the wal <literal>object</literal>, respectively.

Okay to update these are part of 0002 that removes the fields, but
there is also an argument about updating that on its own because this
is actually the case on HEAD? (No need to post an updated patch for
this remark.)

The removal with PgStat_PendingWalStats and the GUC gone looks clean
as presented.
--
Michael

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#2)
2 attachment(s)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On Wed, Feb 19, 2025 at 01:32:33PM +0900, Michael Paquier wrote:

These additions feel unbalanced with the existing contents, and
overlap with the follow-up paragraph about the tuning that can be
guessed from the contents of pg_stat_io because the new content refers
twice to the section "WAL Configuration". Perhaps this could be
simpler, with one sentence in the tuning part? I would propose
something like:
For the WAL object, "fsyncs" and "sync_time" track the sync activity
of WAL files via issue_xlog_fsync(). "writes" and "write_time" track
the write activity of WAL files via XLogWrite(). See <xref
linkend="wal-configuration"/> for more information.

My point is that the "WAL configuration" section already provides all
the details that were in pg_stat_wal removed in 0002 and added in
0001, leading to duplicated contents.

Makes sense, done that way in the attached.

- 0002: Remove wal_[sync|write][_time] from pg_stat_wal, PendingWalStats and
track_wal_io_timing. That does not make sense to split this work in sub-patches
so that all of this in done in 0002.
- 0003: remove the pgstat_prepare_io_time() parameter. Now that track_wal_io_timing
is gone there is no need for pgstat_prepare_io_time() to get a parameter.

0002 and 0003 can be grouped in the same commit, IMO.

Idea was to "ease" the review but I'm fine to merge them. Done that way in the
attached.

-   When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
+   When <xref linkend="guc-track-io-timing"/> is enabled, the total
amounts of time <function>XLogWrite</function> writes and
<function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
-   <literal>wal_write_time</literal> and <literal>wal_sync_time</literal> in
-   <xref linkend="pg-stat-wal-view"/>, respectively.
+   <literal>write_time</literal> and <literal>sync_time</literal> in
+   <xref linkend="pg-stat-io-view"/> for the wal <literal>object</literal>, respectively.

Okay to update these are part of 0002 that removes the fields, but
there is also an argument about updating that on its own because this
is actually the case on HEAD? (No need to post an updated patch for
this remark.)

That's right but that would mean almost duplicating the pg_stat_wal related
stuff (because it can't be removed from the doc until the fields are gone). I
think it's simpler to do it as proposed initially (the end result is the same).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v2-0001-Add-details-in-the-pg_stat_io-doc-about-the-wal-o.patchtext/x-diff; charset=us-asciiDownload
From 7b2492de4fbf30ab971e0e1637accee5f7f6e67b Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 18 Feb 2025 07:14:05 +0000
Subject: [PATCH v2 1/2] Add details in the pg_stat_io doc about the wal object

Adding details about writes, fsyncs, write_time and sync_time when linked
to the wal object.
---
 doc/src/sgml/monitoring.sgml | 9 +++++++++
 1 file changed, 9 insertions(+)
 100.0% doc/src/sgml/

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 928a6eb64b0..151d69cd0b3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2959,6 +2959,15 @@ description | Waiting for a newly initialized WAL file to reach durable storage
       linkend="wal-configuration"/>.
      </para>
     </listitem>
+    <listitem>
+     <para>
+      For the WAL <structfield>object</structfield>, <varname>fsyncs</varname>
+      and <varname>sync_time</varname> track the sync activity of WAL files via
+      issue_xlog_fsync(). <varname>writes</varname> and <varname>write_time</varname>
+      track the write activity of WAL files via XLogWrite().
+      See <xref linkend="wal-configuration"/> for more information.
+     </para>
+    </listitem>
    </itemizedlist>
   </para>
 
-- 
2.34.1

v2-0002-Remove-wal_-sync-write-_time-from-pg_stat_wal-and.patchtext/x-diff; charset=us-asciiDownload
From f36afdf6630c7eb116284a5932c9b9e0b565ecba Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 18 Feb 2025 07:58:24 +0000
Subject: [PATCH v2 2/2] Remove wal_[sync|write][_time] from pg_stat_wal and
 track_wal_io_timing

a051e71e28a added this information into pg_stat_io (with more details and
granularity), so there is no need to keep it in pg_stat_wal. This also
allows to remove PendingWalStats and simplifies up coming commits related
to per backend WAL statistics. Once done, track_wal_io_timing becomes useless
so it is also removed.

In passing remove the pgstat_prepare_io_time() parameter now that
track_wal_io_timing is gone.
---
 doc/src/sgml/config.sgml                      | 22 -------
 doc/src/sgml/monitoring.sgml                  | 62 -------------------
 doc/src/sgml/wal.sgml                         | 12 ++--
 src/backend/access/transam/xlog.c             | 42 ++-----------
 src/backend/access/transam/xlogreader.c       |  2 +-
 src/backend/access/transam/xlogrecovery.c     |  2 +-
 src/backend/catalog/system_views.sql          |  4 --
 src/backend/storage/buffer/bufmgr.c           | 10 +--
 src/backend/storage/buffer/localbuf.c         |  4 +-
 src/backend/storage/smgr/md.c                 |  4 +-
 src/backend/utils/activity/pgstat_io.c        |  7 +--
 src/backend/utils/activity/pgstat_wal.c       | 24 +------
 src/backend/utils/adt/pgstatfuncs.c           | 20 +-----
 src/backend/utils/misc/guc_tables.c           |  9 ---
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/include/access/xlog.h                     |  1 -
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          | 30 +--------
 src/test/regress/expected/rules.out           |  6 +-
 src/tools/pgindent/typedefs.list              |  1 -
 20 files changed, 36 insertions(+), 233 deletions(-)
  41.6% doc/src/sgml/
  14.0% src/backend/access/transam/
   6.0% src/backend/storage/buffer/
  11.9% src/backend/utils/activity/
   9.0% src/backend/utils/adt/
   4.6% src/backend/
  10.6% src/include/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9eedcf6f0f4..6d7b15d9f1e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8367,28 +8367,6 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
-      <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
-      <indexterm>
-       <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Enables timing of WAL I/O calls. This parameter is off by default,
-        as it will repeatedly query the operating system for the current time,
-        which may cause significant overhead on some platforms.
-        You can use the <application>pg_test_timing</application> tool to
-        measure the overhead of timing on your system.
-        I/O timing information is
-        displayed in <link linkend="monitoring-pg-stat-wal-view">
-        <structname>pg_stat_wal</structname></link>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
-       </para>
-      </listitem>
-     </varlistentry>
-
      <varlistentry id="guc-track-functions" xreflabel="track_functions">
       <term><varname>track_functions</varname> (<type>enum</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 151d69cd0b3..0a0f2a56ea4 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -183,11 +183,6 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
    of block read, write, extend, and fsync times.
   </para>
 
-  <para>
-   The parameter <xref linkend="guc-track-wal-io-timing"/> enables monitoring
-   of WAL write and fsync times.
-  </para>
-
   <para>
    Normally these parameters are set in <filename>postgresql.conf</filename> so
    that they apply to all server processes, but it is possible to turn
@@ -3264,63 +3259,6 @@ description | Waiting for a newly initialized WAL file to reach durable storage
       </para></entry>
      </row>
 
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_write</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of times WAL buffers were written out to disk via
-       <function>XLogWrite</function> request.
-       See <xref linkend="wal-configuration"/> for more information about
-       the internal WAL function <function>XLogWrite</function>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_sync</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of times WAL files were synced to disk via
-       <function>issue_xlog_fsync</function> request
-       (if <xref linkend="guc-fsync"/> is <literal>on</literal> and
-       <xref linkend="guc-wal-sync-method"/> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
-       See <xref linkend="wal-configuration"/> for more information about
-       the internal WAL function <function>issue_xlog_fsync</function>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_write_time</structfield> <type>double precision</type>
-      </para>
-      <para>
-       Total amount of time spent writing WAL buffers to disk via
-       <function>XLogWrite</function> request, in milliseconds
-       (if <xref linkend="guc-track-wal-io-timing"/> is enabled,
-       otherwise zero).  This includes the sync time when
-       <varname>wal_sync_method</varname> is either
-       <literal>open_datasync</literal> or <literal>open_sync</literal>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_sync_time</structfield> <type>double precision</type>
-      </para>
-      <para>
-       Total amount of time spent syncing WAL files to disk via
-       <function>issue_xlog_fsync</function> request, in milliseconds
-       (if <varname>track_wal_io_timing</varname> is enabled,
-       <varname>fsync</varname> is <literal>on</literal>, and
-       <varname>wal_sync_method</varname> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
-      </para></entry>
-     </row>
-
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index b908720adea..8a5ef49c1b5 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -810,11 +810,11 @@
   <para>
    There are two internal functions to write WAL data to disk:
    <function>XLogWrite</function> and <function>issue_xlog_fsync</function>.
-   When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
+   When <xref linkend="guc-track-io-timing"/> is enabled, the total
    amounts of time <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
-   <literal>wal_write_time</literal> and <literal>wal_sync_time</literal> in
-   <xref linkend="pg-stat-wal-view"/>, respectively.
+   <literal>write_time</literal> and <literal>sync_time</literal> in
+   <xref linkend="pg-stat-io-view"/> for the WAL <literal>object</literal>, respectively.
    <function>XLogWrite</function> is normally called by
    <function>XLogInsertRecord</function> (when there is no space for the new
    record in WAL buffers), <function>XLogFlush</function> and the WAL writer,
@@ -829,11 +829,11 @@
    <literal>fsync</literal>, or <literal>fsync_writethrough</literal>,
    the write operation moves WAL buffers to kernel cache and
    <function>issue_xlog_fsync</function> syncs them to disk. Regardless
-   of the setting of <varname>track_wal_io_timing</varname>, the number
+   of the setting of <varname>track_io_timing</varname>, the number
    of times <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are also
-   counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
-   in <structname>pg_stat_wal</structname>, respectively.
+   counted as <literal>writes</literal> and <literal>fsyncs</literal>
+   in <structname>pg_stat_io</structname> for the WAL <literal>object</literal>, respectively.
   </para>
 
   <para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9bf5ba7509..f1516664f5f 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -134,7 +134,6 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 int			wal_decode_buffer_size = 512 * 1024;
-bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -2436,10 +2435,9 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				errno = 0;
 
 				/*
-				 * Measure I/O timing to write WAL data, for pg_stat_io and/or
-				 * pg_stat_wal.
+				 * Measure I/O timing to write WAL data, for pg_stat_io.
 				 */
-				start = pgstat_prepare_io_time(track_io_timing || track_wal_io_timing);
+				start = pgstat_prepare_io_time();
 
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
@@ -2448,20 +2446,6 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
 										IOOP_WRITE, start, 1, written);
 
-				/*
-				 * Increment the I/O timing and the number of times WAL data
-				 * were written out to disk.
-				 */
-				if (track_wal_io_timing)
-				{
-					instr_time	end;
-
-					INSTR_TIME_SET_CURRENT(end);
-					INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
-				}
-
-				PendingWalStats.wal_write++;
-
 				if (written <= 0)
 				{
 					char		xlogfname[MAXFNAMELEN];
@@ -3264,7 +3248,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
 	/* Measure I/O timing when initializing segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
@@ -3326,7 +3310,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 
 	/* Measure I/O timing when flushing segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
@@ -8718,10 +8702,9 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		return;
 
 	/*
-	 * Measure I/O timing to sync the WAL file for pg_stat_io and/or
-	 * pg_stat_wal.
+	 * Measure I/O timing to sync the WAL file for pg_stat_io.
 	 */
-	start = pgstat_prepare_io_time(track_io_timing || track_wal_io_timing);
+	start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (wal_sync_method)
@@ -8767,21 +8750,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 
 	pgstat_report_wait_end();
 
-	/*
-	 * Increment the I/O timing and the number of times WAL files were synced.
-	 */
-	if (track_wal_io_timing)
-	{
-		instr_time	end;
-
-		INSTR_TIME_SET_CURRENT(end);
-		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
-	}
-
 	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
 							start, 1, 0);
-
-	PendingWalStats.wal_sync++;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 31bffc6f501..995e4f7c26f 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1557,7 +1557,7 @@ WALRead(XLogReaderState *state,
 
 #ifndef FRONTEND
 		/* Measure I/O timing when reading segment */
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 #endif
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 473de6710d7..ac70b73749c 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3401,7 +3401,7 @@ retry:
 	readOff = targetPageOff;
 
 	/* Measure I/O timing when reading segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 	r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eff0990957e..a4d2cfdcaf5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1189,10 +1189,6 @@ CREATE VIEW pg_stat_wal AS
         w.wal_fpi,
         w.wal_bytes,
         w.wal_buffers_full,
-        w.wal_write,
-        w.wal_sync,
-        w.wal_write_time,
-        w.wal_sync_time,
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 75cfc2b6fe9..d3216ef6ba7 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1509,7 +1509,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 			io_pages[io_buffers_len++] = BufferGetBlock(buffers[i]);
 		}
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 		smgrreadv(operation->smgr, forknum, io_first_block, io_pages, io_buffers_len);
 		pgstat_count_io_op_time(io_object, io_context, IOOP_READ, io_start,
 								1, io_buffers_len * BLCKSZ);
@@ -2401,7 +2401,7 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * Note: if smgrzeroextend fails, we will end up with buffers that are
@@ -3858,7 +3858,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 */
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
@@ -4460,7 +4460,7 @@ FlushRelationBuffers(Relation rel)
 
 				PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-				io_start = pgstat_prepare_io_time(track_io_timing);
+				io_start = pgstat_prepare_io_time();
 
 				smgrwrite(srel,
 						  BufTagGetForkNum(&bufHdr->tag),
@@ -5917,7 +5917,7 @@ IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context)
 	sort_pending_writebacks(wb_context->pending_writebacks,
 							wb_context->nr_pending);
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * Coalesce neighbouring writes, but nothing else. For that we iterate
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 64931efaa75..78b65fb160a 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -244,7 +244,7 @@ GetLocalVictimBuffer(void)
 
 		PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		/* And write... */
 		smgrwrite(oreln,
@@ -414,7 +414,7 @@ ExtendBufferedRelLocal(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/* actually extend relation */
 	smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false);
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 7bf0b45e2c3..298754d1b64 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1381,7 +1381,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
@@ -1786,7 +1786,7 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 		need_to_close = true;
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/* Sync the file. */
 	result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 28a431084b8..e7362b52a37 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -83,15 +83,14 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 /*
- * Initialize the internal timing for an IO operation, depending on an
- * IO timing GUC.
+ * Initialize the internal timing for an IO operation.
  */
 instr_time
-pgstat_prepare_io_time(bool track_io_guc)
+pgstat_prepare_io_time(void)
 {
 	instr_time	io_start;
 
-	if (track_io_guc)
+	if (track_io_timing)
 		INSTR_TIME_SET_CURRENT(io_start);
 	else
 	{
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index c1ca65eb991..4dc41a4a590 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -21,8 +21,6 @@
 #include "utils/pgstat_internal.h"
 
 
-PgStat_PendingWalStats PendingWalStats = {0};
-
 /*
  * WAL usage counters saved from pgWalUsage at the previous call to
  * pgstat_report_wal(). This is used to calculate how much WAL usage
@@ -118,17 +116,10 @@ pgstat_wal_flush_cb(bool nowait)
 
 #define WALSTAT_ACC(fld, var_to_add) \
 	(stats_shmem->stats.fld += var_to_add.fld)
-#define WALSTAT_ACC_INSTR_TIME(fld) \
-	(stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
 	WALSTAT_ACC(wal_records, wal_usage_diff);
 	WALSTAT_ACC(wal_fpi, wal_usage_diff);
 	WALSTAT_ACC(wal_bytes, wal_usage_diff);
 	WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
-	WALSTAT_ACC(wal_write, PendingWalStats);
-	WALSTAT_ACC(wal_sync, PendingWalStats);
-	WALSTAT_ACC_INSTR_TIME(wal_write_time);
-	WALSTAT_ACC_INSTR_TIME(wal_sync_time);
-#undef WALSTAT_ACC_INSTR_TIME
 #undef WALSTAT_ACC
 
 	LWLockRelease(&stats_shmem->lock);
@@ -138,11 +129,6 @@ pgstat_wal_flush_cb(bool nowait)
 	 */
 	prevWalUsage = pgWalUsage;
 
-	/*
-	 * Clear out the statistics buffer, so it can be re-used.
-	 */
-	MemSet(&PendingWalStats, 0, sizeof(PendingWalStats));
-
 	return false;
 }
 
@@ -158,18 +144,12 @@ pgstat_wal_init_backend_cb(void)
 }
 
 /*
- * To determine whether any WAL activity has occurred since last time, not
- * only the number of generated WAL records but also the numbers of WAL
- * writes and syncs need to be checked. Because even transaction that
- * generates no WAL records can write or sync WAL data when flushing the
- * data pages.
+ * To determine whether WAL usage happened.
  */
 bool
 pgstat_wal_have_pending_cb(void)
 {
-	return pgWalUsage.wal_records != prevWalUsage.wal_records ||
-		PendingWalStats.wal_write != 0 ||
-		PendingWalStats.wal_sync != 0;
+	return pgWalUsage.wal_records != prevWalUsage.wal_records;
 }
 
 void
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index e9096a88492..68e16e52ab6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1637,7 +1637,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_wal(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_WAL_COLS	9
+#define PG_STAT_GET_WAL_COLS	5
 	TupleDesc	tupdesc;
 	Datum		values[PG_STAT_GET_WAL_COLS] = {0};
 	bool		nulls[PG_STAT_GET_WAL_COLS] = {0};
@@ -1654,15 +1654,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 					   NUMERICOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_write",
-					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_sync",
-					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_time",
-					   FLOAT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_sync_time",
-					   FLOAT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 
 	BlessTupleDesc(tupdesc);
@@ -1682,14 +1674,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 									Int32GetDatum(-1));
 
 	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[4] = Int64GetDatum(wal_stats->wal_write);
-	values[5] = Int64GetDatum(wal_stats->wal_sync);
-
-	/* Convert counters from microsec to millisec for display */
-	values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
-	values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
 
-	values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+	values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 3cde94a1759..e323b75379a 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1499,15 +1499,6 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
-	{
-		{"track_wal_io_timing", PGC_SUSET, STATS_CUMULATIVE,
-			gettext_noop("Collects timing statistics for WAL I/O activity."),
-			NULL
-		},
-		&track_wal_io_timing,
-		false,
-		NULL, NULL, NULL
-	},
 
 	{
 		{"update_process_title", PGC_SUSET, PROCESS_TITLE,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 415f253096c..bd03540bcef 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -638,7 +638,6 @@
 #track_counts = on
 #track_cost_delay_timing = off
 #track_io_timing = off
-#track_wal_io_timing = off
 #track_functions = none			# none, pl, all
 #stats_fetch_consistency = cache	# cache, none, snapshot
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4411c1468ac..bd18d95a6cf 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -54,7 +54,6 @@ extern PGDLLIMPORT char *wal_consistency_checking_string;
 extern PGDLLIMPORT bool log_checkpoints;
 extern PGDLLIMPORT int CommitDelay;
 extern PGDLLIMPORT int CommitSiblings;
-extern PGDLLIMPORT bool track_wal_io_timing;
 extern PGDLLIMPORT int wal_decode_buffer_size;
 
 extern PGDLLIMPORT int CheckPointSegments;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d7..1e1626964e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5950,9 +5950,9 @@
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o}',
-  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+  proallargtypes => '{int8,int8,numeric,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o}',
+  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
 { oid => '6248', descr => 'statistics: information about WAL prefetching',
   proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 53f2a8458e6..030b101028f 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -480,28 +480,9 @@ typedef struct PgStat_WalStats
 	PgStat_Counter wal_fpi;
 	uint64		wal_bytes;
 	PgStat_Counter wal_buffers_full;
-	PgStat_Counter wal_write;
-	PgStat_Counter wal_sync;
-	PgStat_Counter wal_write_time;
-	PgStat_Counter wal_sync_time;
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
-/*
- * This struct stores wal-related durations as instr_time, which makes it
- * cheaper and easier to accumulate them, by not requiring type
- * conversions. During stats flush instr_time will be converted into
- * microseconds.
- */
-typedef struct PgStat_PendingWalStats
-{
-	PgStat_Counter wal_write;
-	PgStat_Counter wal_sync;
-	instr_time	wal_write_time;
-	instr_time	wal_sync_time;
-} PgStat_PendingWalStats;
-
-
 /*
  * Functions in pgstat.c
  */
@@ -583,7 +564,7 @@ extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 										 BackendType bktype);
 extern void pgstat_count_io_op(IOObject io_object, IOContext io_context,
 							   IOOp io_op, uint32 cnt, uint64 bytes);
-extern instr_time pgstat_prepare_io_time(bool track_io_guc);
+extern instr_time pgstat_prepare_io_time(void);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time,
 									uint32 cnt, uint64 bytes);
@@ -834,13 +815,4 @@ extern PGDLLIMPORT PgStat_Counter pgStatTransactionIdleTime;
 /* updated by the traffic cop and in errfinish() */
 extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 
-
-/*
- * Variables in pgstat_wal.c
- */
-
-/* updated directly by backends and background processes */
-extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
-
-
 #endif							/* PGSTAT_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5baba8d39ff..62f69ac20b2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2259,12 +2259,8 @@ pg_stat_wal| SELECT wal_records,
     wal_fpi,
     wal_bytes,
     wal_buffers_full,
-    wal_write,
-    wal_sync,
-    wal_write_time,
-    wal_sync_time,
     stats_reset
-   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_sync, wal_write_time, wal_sync_time, stats_reset);
+   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, stats_reset);
 pg_stat_wal_receiver| SELECT pid,
     status,
     receive_start_lsn,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 64c6bf7a891..c86156d4718 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2164,7 +2164,6 @@ PgStat_KindInfo
 PgStat_LocalState
 PgStat_PendingDroppedStatsItem
 PgStat_PendingIO
-PgStat_PendingWalStats
 PgStat_SLRUStats
 PgStat_ShmemControl
 PgStat_Snapshot
-- 
2.34.1

#4Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#3)
1 attachment(s)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

On Wed, Feb 19, 2025 at 09:24:41AM +0000, Bertrand Drouvot wrote:

That's right but that would mean almost duplicating the pg_stat_wal related
stuff (because it can't be removed from the doc until the fields are gone). I
think it's simpler to do it as proposed initially (the end result is the same).

After sleeping on it, I still saw no reason to not apply the changes
from 0002 in wal.sgml to describe that the stats for the writes/fsyncs
are in pg_stat_io rather than pg_stat_wal for the "WAL configuration"
section, so done that. The tags were a bit inconsistent, and I've
managed to miss the.. cough.. s/sync_time/fsync_time/.

0001 was mixing a bit the tags <varname>, <literal> and <function>,
and I've moved that as a paragraph not under the tuning part, to make
it a more general statement with its link to "WAL configuration",
which is a very popular link for pg_stat_io.

Attached is the rest, as of v3-0002.
--
Michael

Attachments:

v3-0002-Remove-wal_-sync-write-_time-from-pg_stat_wal-and.patchtext/x-diff; charset=us-asciiDownload
From b54bfecceff1f5eda6b6cfed0fa9874c7ffa0edd Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 20 Feb 2025 14:30:40 +0900
Subject: [PATCH v3] Remove wal_[sync|write][_time] from pg_stat_wal and
 track_wal_io_timing

a051e71e28a added this information into pg_stat_io (with more details and
granularity), so there is no need to keep it in pg_stat_wal. This also
allows to remove PendingWalStats and simplifies up coming commits related
to per backend WAL statistics. Once done, track_wal_io_timing becomes useless
so it is also removed.

In passing remove the pgstat_prepare_io_time() parameter now that
track_wal_io_timing is gone.
---
 src/include/access/xlog.h                     |  1 -
 src/include/catalog/pg_proc.dat               |  6 +-
 src/include/pgstat.h                          | 30 +--------
 src/backend/access/transam/xlog.c             | 42 ++-----------
 src/backend/access/transam/xlogreader.c       |  2 +-
 src/backend/access/transam/xlogrecovery.c     |  2 +-
 src/backend/catalog/system_views.sql          |  4 --
 src/backend/storage/buffer/bufmgr.c           | 10 +--
 src/backend/storage/buffer/localbuf.c         |  4 +-
 src/backend/storage/smgr/md.c                 |  4 +-
 src/backend/utils/activity/pgstat_io.c        |  7 +--
 src/backend/utils/activity/pgstat_wal.c       | 24 +------
 src/backend/utils/adt/pgstatfuncs.c           | 20 +-----
 src/backend/utils/misc/guc_tables.c           |  9 ---
 src/backend/utils/misc/postgresql.conf.sample |  1 -
 src/test/regress/expected/rules.out           |  6 +-
 doc/src/sgml/config.sgml                      | 22 -------
 doc/src/sgml/monitoring.sgml                  | 62 -------------------
 src/tools/pgindent/typedefs.list              |  1 -
 19 files changed, 30 insertions(+), 227 deletions(-)

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 4411c1468aca..bd18d95a6cf8 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -54,7 +54,6 @@ extern PGDLLIMPORT char *wal_consistency_checking_string;
 extern PGDLLIMPORT bool log_checkpoints;
 extern PGDLLIMPORT int CommitDelay;
 extern PGDLLIMPORT int CommitSiblings;
-extern PGDLLIMPORT bool track_wal_io_timing;
 extern PGDLLIMPORT int wal_decode_buffer_size;
 
 extern PGDLLIMPORT int CheckPointSegments;
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d7b..1e1626964e3b 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5950,9 +5950,9 @@
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o}',
-  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+  proallargtypes => '{int8,int8,numeric,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o}',
+  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
 { oid => '6248', descr => 'statistics: information about WAL prefetching',
   proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 53f2a8458e6f..030b101028f7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -480,28 +480,9 @@ typedef struct PgStat_WalStats
 	PgStat_Counter wal_fpi;
 	uint64		wal_bytes;
 	PgStat_Counter wal_buffers_full;
-	PgStat_Counter wal_write;
-	PgStat_Counter wal_sync;
-	PgStat_Counter wal_write_time;
-	PgStat_Counter wal_sync_time;
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
-/*
- * This struct stores wal-related durations as instr_time, which makes it
- * cheaper and easier to accumulate them, by not requiring type
- * conversions. During stats flush instr_time will be converted into
- * microseconds.
- */
-typedef struct PgStat_PendingWalStats
-{
-	PgStat_Counter wal_write;
-	PgStat_Counter wal_sync;
-	instr_time	wal_write_time;
-	instr_time	wal_sync_time;
-} PgStat_PendingWalStats;
-
-
 /*
  * Functions in pgstat.c
  */
@@ -583,7 +564,7 @@ extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 										 BackendType bktype);
 extern void pgstat_count_io_op(IOObject io_object, IOContext io_context,
 							   IOOp io_op, uint32 cnt, uint64 bytes);
-extern instr_time pgstat_prepare_io_time(bool track_io_guc);
+extern instr_time pgstat_prepare_io_time(void);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time,
 									uint32 cnt, uint64 bytes);
@@ -834,13 +815,4 @@ extern PGDLLIMPORT PgStat_Counter pgStatTransactionIdleTime;
 /* updated by the traffic cop and in errfinish() */
 extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
 
-
-/*
- * Variables in pgstat_wal.c
- */
-
-/* updated directly by backends and background processes */
-extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
-
-
 #endif							/* PGSTAT_H */
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9bf5ba75091..f1516664f5fa 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -134,7 +134,6 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 int			wal_decode_buffer_size = 512 * 1024;
-bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -2436,10 +2435,9 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				errno = 0;
 
 				/*
-				 * Measure I/O timing to write WAL data, for pg_stat_io and/or
-				 * pg_stat_wal.
+				 * Measure I/O timing to write WAL data, for pg_stat_io.
 				 */
-				start = pgstat_prepare_io_time(track_io_timing || track_wal_io_timing);
+				start = pgstat_prepare_io_time();
 
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
@@ -2448,20 +2446,6 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
 										IOOP_WRITE, start, 1, written);
 
-				/*
-				 * Increment the I/O timing and the number of times WAL data
-				 * were written out to disk.
-				 */
-				if (track_wal_io_timing)
-				{
-					instr_time	end;
-
-					INSTR_TIME_SET_CURRENT(end);
-					INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
-				}
-
-				PendingWalStats.wal_write++;
-
 				if (written <= 0)
 				{
 					char		xlogfname[MAXFNAMELEN];
@@ -3264,7 +3248,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
 	/* Measure I/O timing when initializing segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
@@ -3326,7 +3310,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 
 	/* Measure I/O timing when flushing segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
@@ -8718,10 +8702,9 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 		return;
 
 	/*
-	 * Measure I/O timing to sync the WAL file for pg_stat_io and/or
-	 * pg_stat_wal.
+	 * Measure I/O timing to sync the WAL file for pg_stat_io.
 	 */
-	start = pgstat_prepare_io_time(track_io_timing || track_wal_io_timing);
+	start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (wal_sync_method)
@@ -8767,21 +8750,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 
 	pgstat_report_wait_end();
 
-	/*
-	 * Increment the I/O timing and the number of times WAL files were synced.
-	 */
-	if (track_wal_io_timing)
-	{
-		instr_time	end;
-
-		INSTR_TIME_SET_CURRENT(end);
-		INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
-	}
-
 	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
 							start, 1, 0);
-
-	PendingWalStats.wal_sync++;
 }
 
 /*
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 31bffc6f5019..995e4f7c26fd 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1557,7 +1557,7 @@ WALRead(XLogReaderState *state,
 
 #ifndef FRONTEND
 		/* Measure I/O timing when reading segment */
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 #endif
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index f234007d3484..915cb330295c 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3401,7 +3401,7 @@ retry:
 	readOff = targetPageOff;
 
 	/* Measure I/O timing when reading segment */
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 	r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eff0990957eb..a4d2cfdcaf5d 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1189,10 +1189,6 @@ CREATE VIEW pg_stat_wal AS
         w.wal_fpi,
         w.wal_bytes,
         w.wal_buffers_full,
-        w.wal_write,
-        w.wal_sync,
-        w.wal_write_time,
-        w.wal_sync_time,
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 75cfc2b6fe96..d3216ef6ba75 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1509,7 +1509,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 			io_pages[io_buffers_len++] = BufferGetBlock(buffers[i]);
 		}
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 		smgrreadv(operation->smgr, forknum, io_first_block, io_pages, io_buffers_len);
 		pgstat_count_io_op_time(io_object, io_context, IOOP_READ, io_start,
 								1, io_buffers_len * BLCKSZ);
@@ -2401,7 +2401,7 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * Note: if smgrzeroextend fails, we will end up with buffers that are
@@ -3858,7 +3858,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 */
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
@@ -4460,7 +4460,7 @@ FlushRelationBuffers(Relation rel)
 
 				PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-				io_start = pgstat_prepare_io_time(track_io_timing);
+				io_start = pgstat_prepare_io_time();
 
 				smgrwrite(srel,
 						  BufTagGetForkNum(&bufHdr->tag),
@@ -5917,7 +5917,7 @@ IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context)
 	sort_pending_writebacks(wb_context->pending_writebacks,
 							wb_context->nr_pending);
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/*
 	 * Coalesce neighbouring writes, but nothing else. For that we iterate
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 64931efaa756..78b65fb160ad 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -244,7 +244,7 @@ GetLocalVictimBuffer(void)
 
 		PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		/* And write... */
 		smgrwrite(oreln,
@@ -414,7 +414,7 @@ ExtendBufferedRelLocal(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/* actually extend relation */
 	smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false);
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 7bf0b45e2c34..298754d1b64a 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1381,7 +1381,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
-		io_start = pgstat_prepare_io_time(track_io_timing);
+		io_start = pgstat_prepare_io_time();
 
 		if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
@@ -1786,7 +1786,7 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 		need_to_close = true;
 	}
 
-	io_start = pgstat_prepare_io_time(track_io_timing);
+	io_start = pgstat_prepare_io_time();
 
 	/* Sync the file. */
 	result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index 28a431084b89..e7362b52a37b 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -83,15 +83,14 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 /*
- * Initialize the internal timing for an IO operation, depending on an
- * IO timing GUC.
+ * Initialize the internal timing for an IO operation.
  */
 instr_time
-pgstat_prepare_io_time(bool track_io_guc)
+pgstat_prepare_io_time(void)
 {
 	instr_time	io_start;
 
-	if (track_io_guc)
+	if (track_io_timing)
 		INSTR_TIME_SET_CURRENT(io_start);
 	else
 	{
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index c1ca65eb9918..4dc41a4a590e 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -21,8 +21,6 @@
 #include "utils/pgstat_internal.h"
 
 
-PgStat_PendingWalStats PendingWalStats = {0};
-
 /*
  * WAL usage counters saved from pgWalUsage at the previous call to
  * pgstat_report_wal(). This is used to calculate how much WAL usage
@@ -118,17 +116,10 @@ pgstat_wal_flush_cb(bool nowait)
 
 #define WALSTAT_ACC(fld, var_to_add) \
 	(stats_shmem->stats.fld += var_to_add.fld)
-#define WALSTAT_ACC_INSTR_TIME(fld) \
-	(stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
 	WALSTAT_ACC(wal_records, wal_usage_diff);
 	WALSTAT_ACC(wal_fpi, wal_usage_diff);
 	WALSTAT_ACC(wal_bytes, wal_usage_diff);
 	WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
-	WALSTAT_ACC(wal_write, PendingWalStats);
-	WALSTAT_ACC(wal_sync, PendingWalStats);
-	WALSTAT_ACC_INSTR_TIME(wal_write_time);
-	WALSTAT_ACC_INSTR_TIME(wal_sync_time);
-#undef WALSTAT_ACC_INSTR_TIME
 #undef WALSTAT_ACC
 
 	LWLockRelease(&stats_shmem->lock);
@@ -138,11 +129,6 @@ pgstat_wal_flush_cb(bool nowait)
 	 */
 	prevWalUsage = pgWalUsage;
 
-	/*
-	 * Clear out the statistics buffer, so it can be re-used.
-	 */
-	MemSet(&PendingWalStats, 0, sizeof(PendingWalStats));
-
 	return false;
 }
 
@@ -158,18 +144,12 @@ pgstat_wal_init_backend_cb(void)
 }
 
 /*
- * To determine whether any WAL activity has occurred since last time, not
- * only the number of generated WAL records but also the numbers of WAL
- * writes and syncs need to be checked. Because even transaction that
- * generates no WAL records can write or sync WAL data when flushing the
- * data pages.
+ * To determine whether WAL usage happened.
  */
 bool
 pgstat_wal_have_pending_cb(void)
 {
-	return pgWalUsage.wal_records != prevWalUsage.wal_records ||
-		PendingWalStats.wal_write != 0 ||
-		PendingWalStats.wal_sync != 0;
+	return pgWalUsage.wal_records != prevWalUsage.wal_records;
 }
 
 void
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index e9096a88492c..68e16e52ab6d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1637,7 +1637,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_wal(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_WAL_COLS	9
+#define PG_STAT_GET_WAL_COLS	5
 	TupleDesc	tupdesc;
 	Datum		values[PG_STAT_GET_WAL_COLS] = {0};
 	bool		nulls[PG_STAT_GET_WAL_COLS] = {0};
@@ -1654,15 +1654,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 					   NUMERICOID, -1, 0);
 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_write",
-					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_sync",
-					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_time",
-					   FLOAT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_sync_time",
-					   FLOAT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 
 	BlessTupleDesc(tupdesc);
@@ -1682,14 +1674,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 									Int32GetDatum(-1));
 
 	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[4] = Int64GetDatum(wal_stats->wal_write);
-	values[5] = Int64GetDatum(wal_stats->wal_sync);
 
-	/* Convert counters from microsec to millisec for display */
-	values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
-	values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
-
-	values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+	values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 3cde94a17595..e323b75379ac 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1499,15 +1499,6 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
-	{
-		{"track_wal_io_timing", PGC_SUSET, STATS_CUMULATIVE,
-			gettext_noop("Collects timing statistics for WAL I/O activity."),
-			NULL
-		},
-		&track_wal_io_timing,
-		false,
-		NULL, NULL, NULL
-	},
 
 	{
 		{"update_process_title", PGC_SUSET, PROCESS_TITLE,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 415f253096c1..bd03540bceff 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -638,7 +638,6 @@
 #track_counts = on
 #track_cost_delay_timing = off
 #track_io_timing = off
-#track_wal_io_timing = off
 #track_functions = none			# none, pl, all
 #stats_fetch_consistency = cache	# cache, none, snapshot
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5baba8d39ff6..62f69ac20b2e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2259,12 +2259,8 @@ pg_stat_wal| SELECT wal_records,
     wal_fpi,
     wal_bytes,
     wal_buffers_full,
-    wal_write,
-    wal_sync,
-    wal_write_time,
-    wal_sync_time,
     stats_reset
-   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_sync, wal_write_time, wal_sync_time, stats_reset);
+   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, stats_reset);
 pg_stat_wal_receiver| SELECT pid,
     status,
     receive_start_lsn,
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 9eedcf6f0f41..6d7b15d9f1e4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8367,28 +8367,6 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
-     <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
-      <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
-      <indexterm>
-       <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
-      </indexterm>
-      </term>
-      <listitem>
-       <para>
-        Enables timing of WAL I/O calls. This parameter is off by default,
-        as it will repeatedly query the operating system for the current time,
-        which may cause significant overhead on some platforms.
-        You can use the <application>pg_test_timing</application> tool to
-        measure the overhead of timing on your system.
-        I/O timing information is
-        displayed in <link linkend="monitoring-pg-stat-wal-view">
-        <structname>pg_stat_wal</structname></link>.
-        Only superusers and users with the appropriate <literal>SET</literal>
-        privilege can change this setting.
-       </para>
-      </listitem>
-     </varlistentry>
-
      <varlistentry id="guc-track-functions" xreflabel="track_functions">
       <term><varname>track_functions</varname> (<type>enum</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 71c4f96d0544..b8b6311f1467 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -183,11 +183,6 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
    of block read, write, extend, and fsync times.
   </para>
 
-  <para>
-   The parameter <xref linkend="guc-track-wal-io-timing"/> enables monitoring
-   of WAL write and fsync times.
-  </para>
-
   <para>
    Normally these parameters are set in <filename>postgresql.conf</filename> so
    that they apply to all server processes, but it is possible to turn
@@ -3265,63 +3260,6 @@ description | Waiting for a newly initialized WAL file to reach durable storage
       </para></entry>
      </row>
 
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_write</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of times WAL buffers were written out to disk via
-       <function>XLogWrite</function> request.
-       See <xref linkend="wal-configuration"/> for more information about
-       the internal WAL function <function>XLogWrite</function>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_sync</structfield> <type>bigint</type>
-      </para>
-      <para>
-       Number of times WAL files were synced to disk via
-       <function>issue_xlog_fsync</function> request
-       (if <xref linkend="guc-fsync"/> is <literal>on</literal> and
-       <xref linkend="guc-wal-sync-method"/> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
-       See <xref linkend="wal-configuration"/> for more information about
-       the internal WAL function <function>issue_xlog_fsync</function>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_write_time</structfield> <type>double precision</type>
-      </para>
-      <para>
-       Total amount of time spent writing WAL buffers to disk via
-       <function>XLogWrite</function> request, in milliseconds
-       (if <xref linkend="guc-track-wal-io-timing"/> is enabled,
-       otherwise zero).  This includes the sync time when
-       <varname>wal_sync_method</varname> is either
-       <literal>open_datasync</literal> or <literal>open_sync</literal>.
-      </para></entry>
-     </row>
-
-     <row>
-      <entry role="catalog_table_entry"><para role="column_definition">
-       <structfield>wal_sync_time</structfield> <type>double precision</type>
-      </para>
-      <para>
-       Total amount of time spent syncing WAL files to disk via
-       <function>issue_xlog_fsync</function> request, in milliseconds
-       (if <varname>track_wal_io_timing</varname> is enabled,
-       <varname>fsync</varname> is <literal>on</literal>, and
-       <varname>wal_sync_method</varname> is either
-       <literal>fdatasync</literal>, <literal>fsync</literal> or
-       <literal>fsync_writethrough</literal>, otherwise zero).
-      </para></entry>
-     </row>
-
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index fb39c915d76a..f87f304d6fc2 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2165,7 +2165,6 @@ PgStat_KindInfo
 PgStat_LocalState
 PgStat_PendingDroppedStatsItem
 PgStat_PendingIO
-PgStat_PendingWalStats
 PgStat_SLRUStats
 PgStat_ShmemControl
 PgStat_Snapshot
-- 
2.47.2

#5Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#4)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On Thu, Feb 20, 2025 at 02:37:18PM +0900, Michael Paquier wrote:

On Wed, Feb 19, 2025 at 09:24:41AM +0000, Bertrand Drouvot wrote:

That's right but that would mean almost duplicating the pg_stat_wal related
stuff (because it can't be removed from the doc until the fields are gone). I
think it's simpler to do it as proposed initially (the end result is the same).

After sleeping on it, I still saw no reason to not apply the changes
from 0002 in wal.sgml to describe that the stats for the writes/fsyncs
are in pg_stat_io rather than pg_stat_wal for the "WAL configuration"
section, so done that.

I see, I did not get that you wanted to get rid of the pg_stat_wal part before
removing the fields. Makes sense that way to "just" replace the pg_stat_wal
by pg_stat_io first in the doc as you've done in 4538bd3f1dd.

and I've moved that as a paragraph not under the tuning part, to make
it a more general statement with its link to "WAL configuration",
which is a very popular link for pg_stat_io.

Makes more sense, agree.

Attached is the rest, as of v3-0002.

LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#5)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

On Thu, Feb 20, 2025 at 08:11:12AM +0000, Bertrand Drouvot wrote:

LGTM.

Applied this one. Now, to the part about the backend stats and WAL,
which should be the last piece..
--
Michael

#7Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#6)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On Mon, Feb 24, 2025 at 09:57:49AM +0900, Michael Paquier wrote:

On Thu, Feb 20, 2025 at 08:11:12AM +0000, Bertrand Drouvot wrote:

LGTM.

Applied this one.

Thanks!

Now, to the part about the backend stats and WAL,
which should be the last piece..

Yup, going back to [1]/messages/by-id/Z7UJoX8jr7aBbt2Q@paquier.xyz now.

[1]: /messages/by-id/Z7UJoX8jr7aBbt2Q@paquier.xyz

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#8Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#4)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On 2025-02-20 14:37:18 +0900, Michael Paquier wrote:

a051e71e28a added this information into pg_stat_io (with more details and
granularity), so there is no need to keep it in pg_stat_wal. This also
allows to remove PendingWalStats and simplifies up coming commits related
to per backend WAL statistics. Once done, track_wal_io_timing becomes useless
so it is also removed.

In passing remove the pgstat_prepare_io_time() parameter now that
track_wal_io_timing is gone.

I don't think this is a good idea - there was a good reason for
track_wal_io_timing to exist, namely that it happens while holding one of the
two most contended locks in postgres. On many systems it'll be an ok constant
overhead to enable track_io_timing, but enabling track_wal_io_timing will
cause scalability issues. Now you made it impossible to separate those two
situations, forcing disabling of all IO timing.

Greetings,

Andres Freund

#9Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#8)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On Mon, Feb 24, 2025 at 04:41:36AM -0500, Andres Freund wrote:

Hi,

On 2025-02-20 14:37:18 +0900, Michael Paquier wrote:

a051e71e28a added this information into pg_stat_io (with more details and
granularity), so there is no need to keep it in pg_stat_wal. This also
allows to remove PendingWalStats and simplifies up coming commits related
to per backend WAL statistics. Once done, track_wal_io_timing becomes useless
so it is also removed.

In passing remove the pgstat_prepare_io_time() parameter now that
track_wal_io_timing is gone.

I don't think this is a good idea - there was a good reason for
track_wal_io_timing to exist, namely that it happens while holding one of the
two most contended locks in postgres. On many systems it'll be an ok constant
overhead to enable track_io_timing, but enabling track_wal_io_timing will
cause scalability issues. Now you made it impossible to separate those two
situations, forcing disabling of all IO timing.

a051e71e28a added the "timing tracking" in the WAL code path with "only"
track_io_timing enabled (and track_wal_io_timing still false). Here, in this thread,
we remove unnecessary INSTR_TIME_SET_CURRENT()/INSTR_TIME_ACCUM_DIFF() calls when
both track_io_timing and track_wal_io_timing were enabled.

So I think that your main concern comes from the fact that a051e71e28a added the
"timing tracking" in the WAL code path with "only" track_io_timing enabled.

That's a concern we also had and discussed in [1]/messages/by-id/CAN55FZ1qOt_gjhQgJdQZjO1KebBLuZcHz4DYmjfUvF3yGBSahw@mail.gmail.com. The question was: should
we track the WAL timing stats in pg_stat_io only when track_wal_io_timing is
enabled or relying only on track_io_timing would be enough?

We ran several benchmarks ([2]/messages/by-id/Z6CcglxJF8XW+R7W@ip-10-97-1-34.eu-west-3.compute.internal, [3]/messages/by-id/CAN55FZ0thZHTBbXwAsNhfrRdgmDwxWbLPiZyh+TG9CrC1V8TeA@mail.gmail.com and [4]/messages/by-id/Z6HH150-Aw6ilQYE@paquier.xyz) and based on the results we reached
the conclusion that to rely only on track_io_timing to display the WAL timing
stats in pg_stat_io was "good" enough.

If you feel strongly that we should keep track_wal_io_timing then we would need
to change a bit the logic in a051e71e28a and then keep it in the current thread (
but still removing the "now useless" pg_stat_wal fields).

[1]: /messages/by-id/CAN55FZ1qOt_gjhQgJdQZjO1KebBLuZcHz4DYmjfUvF3yGBSahw@mail.gmail.com
[2]: /messages/by-id/Z6CcglxJF8XW+R7W@ip-10-97-1-34.eu-west-3.compute.internal
[3]: /messages/by-id/CAN55FZ0thZHTBbXwAsNhfrRdgmDwxWbLPiZyh+TG9CrC1V8TeA@mail.gmail.com
[4]: /messages/by-id/Z6HH150-Aw6ilQYE@paquier.xyz

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#10Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#9)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On 2025-02-24 10:54:54 +0000, Bertrand Drouvot wrote:

a051e71e28a added the "timing tracking" in the WAL code path with "only"
track_io_timing enabled (and track_wal_io_timing still false). Here, in this thread,
we remove unnecessary INSTR_TIME_SET_CURRENT()/INSTR_TIME_ACCUM_DIFF() calls when
both track_io_timing and track_wal_io_timing were enabled.

So I think that your main concern comes from the fact that a051e71e28a added the
"timing tracking" in the WAL code path with "only" track_io_timing enabled.

Correct.

That's a concern we also had and discussed in [1]. The question was: should
we track the WAL timing stats in pg_stat_io only when track_wal_io_timing is
enabled or relying only on track_io_timing would be enough?

We ran several benchmarks ([2], [3] and [4]) and based on the results we reached
the conclusion that to rely only on track_io_timing to display the WAL timing
stats in pg_stat_io was "good" enough.

Sorry to be blunt, but those benchmarks miss the mark. The benchmark

- emits WAL in a transactional fashion, with fairly small transactions, where
the disk is fairly slow. It's absolutely guaranteed that the bottleneck is
just the WAL flush disk time. You're doing ~1k-2k TPS - the timing overhead
would have to be *ginormous* to show up.

- emits WAL at a low concurrency thus lock contention not really an issue.

- likely is executed so on a system with very cheap timing functions - but
often production workloads unfortunately don't. While it's not quite as
common as it used to be, virtualized systems often have considerably slower
clocks.

On this system it'll likely be fine overhead-wise:

$ echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 19.17 ns
Histogram of timing durations:
< us % of total count
1 98.08521 153484414
2 1.91421 2995376
4 0.00040 619
8 0.00014 225
16 0.00002 35
32 0.00001 9
64 0.00001 10
128 0.00000 3
256 0.00000 2
512 0.00000 1

On this system however, I'd not want to bet on it:

$ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 1228.91 ns
Histogram of timing durations:
< us % of total count
1 0.15574 3802
2 78.51065 1916598
4 21.26778 519188
8 0.02069 505
16 0.03957 966
32 0.00356 87
64 0.00193 47
128 0.00008 2

Greetings,

Andres Freund

#11Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Andres Freund (#10)
1 attachment(s)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On Mon, Feb 24, 2025 at 06:15:53AM -0500, Andres Freund wrote:

Hi,

On 2025-02-24 10:54:54 +0000, Bertrand Drouvot wrote:

a051e71e28a added the "timing tracking" in the WAL code path with "only"
track_io_timing enabled (and track_wal_io_timing still false). Here, in this thread,
we remove unnecessary INSTR_TIME_SET_CURRENT()/INSTR_TIME_ACCUM_DIFF() calls when
both track_io_timing and track_wal_io_timing were enabled.

So I think that your main concern comes from the fact that a051e71e28a added the
"timing tracking" in the WAL code path with "only" track_io_timing enabled.

Correct.

That's a concern we also had and discussed in [1]. The question was: should
we track the WAL timing stats in pg_stat_io only when track_wal_io_timing is
enabled or relying only on track_io_timing would be enough?

We ran several benchmarks ([2], [3] and [4]) and based on the results we reached
the conclusion that to rely only on track_io_timing to display the WAL timing
stats in pg_stat_io was "good" enough.

Sorry to be blunt, but those benchmarks miss the mark. The benchmark

- emits WAL in a transactional fashion, with fairly small transactions, where
the disk is fairly slow. It's absolutely guaranteed that the bottleneck is
just the WAL flush disk time. You're doing ~1k-2k TPS - the timing overhead
would have to be *ginormous* to show up.

- emits WAL at a low concurrency thus lock contention not really an issue.

- likely is executed so on a system with very cheap timing functions - but
often production workloads unfortunately don't. While it's not quite as
common as it used to be, virtualized systems often have considerably slower
clocks.

On this system it'll likely be fine overhead-wise:

$ echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 19.17 ns
Histogram of timing durations:
< us % of total count
1 98.08521 153484414
2 1.91421 2995376
4 0.00040 619
8 0.00014 225
16 0.00002 35
32 0.00001 9
64 0.00001 10
128 0.00000 3
256 0.00000 2
512 0.00000 1

On this system however, I'd not want to bet on it:

$ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource
$ src/bin/pg_test_timing/pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 1228.91 ns
Histogram of timing durations:
< us % of total count
1 0.15574 3802
2 78.51065 1916598
4 21.26778 519188
8 0.02069 505
16 0.03957 966
32 0.00356 87
64 0.00193 47
128 0.00008 2

I agree that we've to put everything in the picture (system with or without
cheap timing functions, lock contention and WAL flush disk time) and that we
can certainly find configuration/workload that would get benefits from a
dedicated track_wal_io_timing GUC.

PFA a patch to re-introduce the track_wal_io_timing GUC and to ensure that the
WAL write and fsync timing activities are tracked in pg_stat_io (and
pg_stat_get_backend_io()) only if both track_io_timing and track_wal_io_timing
are enabled.

Note that to change the a051e71e28a behavior, the attached patch adds an extra
"track_io_guc" parameter to pgstat_count_io_op_time() (like pgstat_prepare_io_time
already had in a051e71e28a).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v1-0001-re-introduce-track_wal_io_timing.patchtext/x-diff; charset=us-asciiDownload
From 4f1190bbb4e37319be37f69fe1682ec5ef7eb971 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 24 Feb 2025 11:29:35 +0000
Subject: [PATCH v1] re-introduce track_wal_io_timing

a051e71e28a added the "timing tracking" in the WAL code path with "only"
track_io_timing enabled (and track_wal_io_timing still false). Then it made
sense for 2421e9a51d2 to get rid of the track_wal_io_timing GUC.

Concerns have been raised about the fact that a051e71e28a added the "timing tracking"
in the WAL code path with "only" track_io_timing enabled. So this commit re-introduces
the track_wal_io_timing GUC and ensure that WAL timing is tracked in pg_stat_io
only if track_io_timing and track_wal_io_timing are both enabled.
---
 doc/src/sgml/config.sgml                      | 25 +++++++++++++++++++
 doc/src/sgml/monitoring.sgml                  |  5 ++++
 src/backend/access/transam/xlog.c             | 21 ++++++++++------
 src/backend/access/transam/xlogreader.c       |  4 +--
 src/backend/access/transam/xlogrecovery.c     |  6 ++---
 src/backend/storage/buffer/bufmgr.c           | 21 ++++++++--------
 src/backend/storage/buffer/localbuf.c         |  8 +++---
 src/backend/storage/smgr/md.c                 |  8 +++---
 src/backend/utils/activity/pgstat_io.c        | 12 +++++----
 src/backend/utils/misc/guc_tables.c           |  9 +++++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/include/access/xlog.h                     |  1 +
 src/include/pgstat.h                          |  4 +--
 13 files changed, 87 insertions(+), 38 deletions(-)
  25.8% doc/src/sgml/
  24.2% src/backend/access/transam/
  25.8% src/backend/storage/buffer/
   6.9% src/backend/storage/smgr/
   8.2% src/backend/utils/activity/
   4.2% src/backend/utils/misc/
   4.6% src/include/

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a8354576108..7e7eae01d33 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8393,6 +8393,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
+      <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables timing of WAL I/O write and fsync calls. This parameter is off by
+        default, as it will repeatedly query the operating system for the current time,
+        which may cause significant overhead on some platforms.
+        You can use the <application>pg_test_timing</application> tool to
+        measure the overhead of timing on your system.
+        I/O timing information is
+        displayed in <link linkend="monitoring-pg-stat-io-view">
+        <structname>pg_stat_io</structname></link> and in the output of the
+        <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function only if
+        <xref linkend="guc-track-io-timing"/> is also enabled.
+        Only superusers and users with the appropriate <literal>SET</literal>
+        privilege can change this setting.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-track-functions" xreflabel="track_functions">
       <term><varname>track_functions</varname> (<type>enum</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3dfd059b7ee..6ff8a4ec929 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -183,6 +183,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
    of block read, write, extend, and fsync times.
   </para>
 
+  <para>
+   The parameter <xref linkend="guc-track-wal-io-timing"/> enables monitoring
+   of WAL write and fsync times.
+  </para>
+
   <para>
    Normally these parameters are set in <filename>postgresql.conf</filename> so
    that they apply to all server processes, but it is possible to turn
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 919314f8258..9054aed2242 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -134,6 +134,7 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 int			wal_decode_buffer_size = 512 * 1024;
+bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -2432,19 +2433,22 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 			nleft = nbytes;
 			do
 			{
+				bool		track_timing = track_io_timing && track_wal_io_timing;
+
 				errno = 0;
 
 				/*
 				 * Measure I/O timing to write WAL data, for pg_stat_io.
 				 */
-				start = pgstat_prepare_io_time();
+				start = pgstat_prepare_io_time(track_timing);
 
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
 				pgstat_report_wait_end();
 
 				pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
-										IOOP_WRITE, start, 1, written);
+										IOOP_WRITE, start, 1, written,
+										track_timing);
 
 				if (written <= 0)
 				{
@@ -3248,7 +3252,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
 	/* Measure I/O timing when initializing segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
@@ -3291,7 +3295,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	 */
 	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT, IOOP_WRITE,
 							io_start, 1,
-							wal_init_zero ? wal_segment_size : 1);
+							wal_init_zero ? wal_segment_size : 1, track_io_timing);
 
 	if (save_errno)
 	{
@@ -3310,7 +3314,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 
 	/* Measure I/O timing when flushing segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
@@ -3325,7 +3329,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	pgstat_report_wait_end();
 
 	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_INIT,
-							IOOP_FSYNC, io_start, 1, 0);
+							IOOP_FSYNC, io_start, 1, 0, track_io_timing);
 
 	if (close(fd) != 0)
 		ereport(ERROR,
@@ -8728,6 +8732,7 @@ void
 issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 {
 	char	   *msg = NULL;
+	bool		track_timing = track_io_timing && track_wal_io_timing;
 	instr_time	start;
 
 	Assert(tli != 0);
@@ -8744,7 +8749,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	/*
 	 * Measure I/O timing to sync the WAL file for pg_stat_io.
 	 */
-	start = pgstat_prepare_io_time();
+	start = pgstat_prepare_io_time(track_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (wal_sync_method)
@@ -8791,7 +8796,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	pgstat_report_wait_end();
 
 	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
-							start, 1, 0);
+							start, 1, 0, track_timing);
 }
 
 /*
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 995e4f7c26f..d2418243a18 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1557,7 +1557,7 @@ WALRead(XLogReaderState *state,
 
 #ifndef FRONTEND
 		/* Measure I/O timing when reading segment */
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 
 		pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 #endif
@@ -1570,7 +1570,7 @@ WALRead(XLogReaderState *state,
 		pgstat_report_wait_end();
 
 		pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_READ,
-								io_start, 1, readbytes);
+								io_start, 1, readbytes, track_io_timing);
 #endif
 
 		if (readbytes <= 0)
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 915cb330295..9e177e9716a 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3401,7 +3401,7 @@ retry:
 	readOff = targetPageOff;
 
 	/* Measure I/O timing when reading segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 	r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);
@@ -3413,7 +3413,7 @@ retry:
 		pgstat_report_wait_end();
 
 		pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_READ,
-								io_start, 1, r);
+								io_start, 1, r, track_io_timing);
 
 		XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size);
 		if (r < 0)
@@ -3436,7 +3436,7 @@ retry:
 	pgstat_report_wait_end();
 
 	pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_READ,
-							io_start, 1, r);
+							io_start, 1, r, track_io_timing);
 
 	Assert(targetSegNo == readSegNo);
 	Assert(targetPageOff == readOff);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index d3216ef6ba7..5de9490d439 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1509,10 +1509,10 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 			io_pages[io_buffers_len++] = BufferGetBlock(buffers[i]);
 		}
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 		smgrreadv(operation->smgr, forknum, io_first_block, io_pages, io_buffers_len);
 		pgstat_count_io_op_time(io_object, io_context, IOOP_READ, io_start,
-								1, io_buffers_len * BLCKSZ);
+								1, io_buffers_len * BLCKSZ, track_io_timing);
 
 		/* Verify each block we read, and terminate the I/O. */
 		for (int j = 0; j < io_buffers_len; ++j)
@@ -2401,7 +2401,7 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * Note: if smgrzeroextend fails, we will end up with buffers that are
@@ -2426,7 +2426,7 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		UnlockRelationForExtension(bmr.rel, ExclusiveLock);
 
 	pgstat_count_io_op_time(IOOBJECT_RELATION, io_context, IOOP_EXTEND,
-							io_start, 1, extend_by * BLCKSZ);
+							io_start, 1, extend_by * BLCKSZ, track_io_timing);
 
 	/* Set BM_VALID, terminate IO, and wake up any waiters */
 	for (uint32 i = 0; i < extend_by; i++)
@@ -3858,7 +3858,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 */
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
@@ -3888,7 +3888,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 * of a dirty shared buffer (IOCONTEXT_NORMAL IOOP_WRITE).
 	 */
 	pgstat_count_io_op_time(IOOBJECT_RELATION, io_context,
-							IOOP_WRITE, io_start, 1, BLCKSZ);
+							IOOP_WRITE, io_start, 1, BLCKSZ, track_io_timing);
 
 	pgBufferUsage.shared_blks_written++;
 
@@ -4460,7 +4460,7 @@ FlushRelationBuffers(Relation rel)
 
 				PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-				io_start = pgstat_prepare_io_time();
+				io_start = pgstat_prepare_io_time(track_io_timing);
 
 				smgrwrite(srel,
 						  BufTagGetForkNum(&bufHdr->tag),
@@ -4470,7 +4470,7 @@ FlushRelationBuffers(Relation rel)
 
 				pgstat_count_io_op_time(IOOBJECT_TEMP_RELATION,
 										IOCONTEXT_NORMAL, IOOP_WRITE,
-										io_start, 1, BLCKSZ);
+										io_start, 1, BLCKSZ, track_io_timing);
 
 				buf_state &= ~(BM_DIRTY | BM_JUST_DIRTIED);
 				pg_atomic_unlocked_write_u32(&bufHdr->state, buf_state);
@@ -5917,7 +5917,7 @@ IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context)
 	sort_pending_writebacks(wb_context->pending_writebacks,
 							wb_context->nr_pending);
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * Coalesce neighbouring writes, but nothing else. For that we iterate
@@ -5977,7 +5977,8 @@ IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context)
 	 * blocks of permanent relations.
 	 */
 	pgstat_count_io_op_time(IOOBJECT_RELATION, io_context,
-							IOOP_WRITEBACK, io_start, wb_context->nr_pending, 0);
+							IOOP_WRITEBACK, io_start, wb_context->nr_pending, 0,
+							track_io_timing);
 
 	wb_context->nr_pending = 0;
 }
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 78b65fb160a..4ca78c412d3 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -244,7 +244,7 @@ GetLocalVictimBuffer(void)
 
 		PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 
 		/* And write... */
 		smgrwrite(oreln,
@@ -255,7 +255,7 @@ GetLocalVictimBuffer(void)
 
 		/* Temporary table I/O does not use Buffer Access Strategies */
 		pgstat_count_io_op_time(IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL,
-								IOOP_WRITE, io_start, 1, BLCKSZ);
+								IOOP_WRITE, io_start, 1, BLCKSZ, track_io_timing);
 
 		/* Mark not-dirty now in case we error out below */
 		buf_state &= ~BM_DIRTY;
@@ -414,13 +414,13 @@ ExtendBufferedRelLocal(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/* actually extend relation */
 	smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false);
 
 	pgstat_count_io_op_time(IOOBJECT_TEMP_RELATION, IOCONTEXT_NORMAL, IOOP_EXTEND,
-							io_start, 1, extend_by * BLCKSZ);
+							io_start, 1, extend_by * BLCKSZ, track_io_timing);
 
 	for (uint32 i = 0; i < extend_by; i++)
 	{
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 298754d1b64..d6df9f5102a 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1381,7 +1381,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 
 		if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
@@ -1401,7 +1401,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 		 * backend fsyncs.
 		 */
 		pgstat_count_io_op_time(IOOBJECT_RELATION, IOCONTEXT_NORMAL,
-								IOOP_FSYNC, io_start, 1, 0);
+								IOOP_FSYNC, io_start, 1, 0, track_io_timing);
 	}
 }
 
@@ -1786,7 +1786,7 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 		need_to_close = true;
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/* Sync the file. */
 	result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
@@ -1796,7 +1796,7 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 		FileClose(file);
 
 	pgstat_count_io_op_time(IOOBJECT_RELATION, IOCONTEXT_NORMAL,
-							IOOP_FSYNC, io_start, 1, 0);
+							IOOP_FSYNC, io_start, 1, 0, track_io_timing);
 
 	errno = save_errno;
 	return result;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index e7362b52a37..3746f9e4bc2 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -83,14 +83,15 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 /*
- * Initialize the internal timing for an IO operation.
+ * Initialize the internal timing for an IO operation, depending on an
+ * IO timing GUC.
  */
 instr_time
-pgstat_prepare_io_time(void)
+pgstat_prepare_io_time(bool track_io_guc)
 {
 	instr_time	io_start;
 
-	if (track_io_timing)
+	if (track_io_guc)
 		INSTR_TIME_SET_CURRENT(io_start);
 	else
 	{
@@ -117,9 +118,10 @@ pgstat_prepare_io_time(void)
  */
 void
 pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
-						instr_time start_time, uint32 cnt, uint64 bytes)
+						instr_time start_time, uint32 cnt, uint64 bytes,
+						bool track_io_guc)
 {
-	if (track_io_timing)
+	if (track_io_guc)
 	{
 		instr_time	io_time;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 3f1c7b1ebf6..03a6dd49154 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1500,6 +1500,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"track_wal_io_timing", PGC_SUSET, STATS_CUMULATIVE,
+			gettext_noop("Collects timing statistics for WAL I/O activity."),
+			NULL
+		},
+		&track_wal_io_timing,
+		false,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"update_process_title", PGC_SUSET, PROCESS_TITLE,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e771d87da1f..5362ff80519 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -641,6 +641,7 @@
 #track_counts = on
 #track_cost_delay_timing = off
 #track_io_timing = off
+#track_wal_io_timing = off
 #track_functions = none			# none, pl, all
 #stats_fetch_consistency = cache	# cache, none, snapshot
 
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 0a50d160bba..d313099c027 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -54,6 +54,7 @@ extern PGDLLIMPORT char *wal_consistency_checking_string;
 extern PGDLLIMPORT bool log_checkpoints;
 extern PGDLLIMPORT int CommitDelay;
 extern PGDLLIMPORT int CommitSiblings;
+extern PGDLLIMPORT bool track_wal_io_timing;
 extern PGDLLIMPORT int wal_decode_buffer_size;
 
 extern PGDLLIMPORT int CheckPointSegments;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fc651d03cf9..1e60e94be90 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -564,10 +564,10 @@ extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 										 BackendType bktype);
 extern void pgstat_count_io_op(IOObject io_object, IOContext io_context,
 							   IOOp io_op, uint32 cnt, uint64 bytes);
-extern instr_time pgstat_prepare_io_time(void);
+extern instr_time pgstat_prepare_io_time(bool track_io_guc);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time,
-									uint32 cnt, uint64 bytes);
+									uint32 cnt, uint64 bytes, bool track_io_guc);
 
 extern PgStat_IO *pgstat_fetch_stat_io(void);
 extern const char *pgstat_get_io_context_name(IOContext io_context);
-- 
2.34.1

#12Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#11)
1 attachment(s)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

On Mon, Feb 24, 2025 at 01:08:03PM +0000, Bertrand Drouvot wrote:

I agree that we've to put everything in the picture (system with or without
cheap timing functions, lock contention and WAL flush disk time) and that we
can certainly find configuration/workload that would get benefits from a
dedicated track_wal_io_timing GUC.

PFA a patch to re-introduce the track_wal_io_timing GUC and to ensure that the
WAL write and fsync timing activities are tracked in pg_stat_io (and
pg_stat_get_backend_io()) only if both track_io_timing and track_wal_io_timing
are enabled.

Note that to change the a051e71e28a behavior, the attached patch adds an extra
"track_io_guc" parameter to pgstat_count_io_op_time() (like pgstat_prepare_io_time
already had in a051e71e28a).

+      bool      track_timing = track_io_timing && track_wal_io_timing;
-      start = pgstat_prepare_io_time();
+      start = pgstat_prepare_io_time(track_timing);

This does not represent exactly what I am understanding from the
comment of upthread, because WAL IO timings would require both
track_wal_io_timing and track_io_timing to be enabled with this patch.
It seems to me that what we should do is to decouple completely the
timings for WAL and non-WAL IO across the two GUCs track_wal_io_timing
and track_io_timing, without any dependency between one and the other.
This way, it is possible to only enable one of them without affecting
the paths of the other, or both if you are ready to pay the price.

If you take that into account, XLogWrite(), XLogFileInitInternal(),
issue_xlog_fsync() should trigger clock calculations only for
track_wal_io_timing. The write and fsyncs are the hot ones in
standalone servers.

Two new things tracked in pg_stat_io are WALRead() and XLogPageRead(),
which are used at recovery, and for consistency with the rest there is
a good argument for controlling these as well with
track_wal_io_timing, I guess.

 void
 pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
-                        instr_time start_time, uint32 cnt, uint64 bytes)
+                        instr_time start_time, uint32 cnt, uint64 bytes,
+                        bool track_io_guc)

Not much a fan of the new argument to pass the GUC value, which could
be error prone. It would be simpler to check that start_time is 0
instead. There is no need to change the other callers of
pgstat_count_io_op_time() if we do that.

pgstat_count_backend_io_op_time() would have triggered an assertion
failure, it needs to be adjusted.

With more tweaks applied to the docs, the attached is my result.
--
Michael

Attachments:

v2-0001-re-introduce-track_wal_io_timing.patchtext/x-diff; charset=us-asciiDownload
From 196f84b8bd8f817b6e21e65de90464d32a27b286 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 24 Feb 2025 11:29:35 +0000
Subject: [PATCH v2] re-introduce track_wal_io_timing

a051e71e28a added the "timing tracking" in the WAL code path with "only"
track_io_timing enabled (and track_wal_io_timing still false). Then it made
sense for 2421e9a51d2 to get rid of the track_wal_io_timing GUC.

Concerns have been raised about the fact that a051e71e28a added the "timing tracking"
in the WAL code path with "only" track_io_timing enabled. So this commit re-introduces
the track_wal_io_timing GUC and ensure that WAL timing is tracked in pg_stat_io
only if track_io_timing and track_wal_io_timing are both enabled.
---
 src/include/access/xlog.h                     |  1 +
 src/include/pgstat.h                          |  2 +-
 src/backend/access/transam/xlog.c             |  9 +++---
 src/backend/access/transam/xlogreader.c       |  2 +-
 src/backend/access/transam/xlogrecovery.c     |  2 +-
 src/backend/storage/buffer/bufmgr.c           | 10 +++---
 src/backend/storage/buffer/localbuf.c         |  4 +--
 src/backend/storage/smgr/md.c                 |  4 +--
 src/backend/utils/activity/pgstat_backend.c   |  3 +-
 src/backend/utils/activity/pgstat_io.c        |  9 +++---
 src/backend/utils/misc/guc_tables.c           |  9 ++++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 doc/src/sgml/config.sgml                      | 25 +++++++++++++++
 doc/src/sgml/monitoring.sgml                  | 31 ++++++++++++++++---
 doc/src/sgml/wal.sgml                         |  4 +--
 15 files changed, 88 insertions(+), 28 deletions(-)

diff --git src/include/access/xlog.h src/include/access/xlog.h
index 0a50d160bba8..d313099c027f 100644
--- src/include/access/xlog.h
+++ src/include/access/xlog.h
@@ -54,6 +54,7 @@ extern PGDLLIMPORT char *wal_consistency_checking_string;
 extern PGDLLIMPORT bool log_checkpoints;
 extern PGDLLIMPORT int CommitDelay;
 extern PGDLLIMPORT int CommitSiblings;
+extern PGDLLIMPORT bool track_wal_io_timing;
 extern PGDLLIMPORT int wal_decode_buffer_size;
 
 extern PGDLLIMPORT int CheckPointSegments;
diff --git src/include/pgstat.h src/include/pgstat.h
index fc651d03cf97..e64c527265b9 100644
--- src/include/pgstat.h
+++ src/include/pgstat.h
@@ -564,7 +564,7 @@ extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 										 BackendType bktype);
 extern void pgstat_count_io_op(IOObject io_object, IOContext io_context,
 							   IOOp io_op, uint32 cnt, uint64 bytes);
-extern instr_time pgstat_prepare_io_time(void);
+extern instr_time pgstat_prepare_io_time(bool track_io_guc);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time,
 									uint32 cnt, uint64 bytes);
diff --git src/backend/access/transam/xlog.c src/backend/access/transam/xlog.c
index 919314f82587..799fc739e18c 100644
--- src/backend/access/transam/xlog.c
+++ src/backend/access/transam/xlog.c
@@ -134,6 +134,7 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 int			wal_decode_buffer_size = 512 * 1024;
+bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -2437,7 +2438,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				/*
 				 * Measure I/O timing to write WAL data, for pg_stat_io.
 				 */
-				start = pgstat_prepare_io_time();
+				start = pgstat_prepare_io_time(track_wal_io_timing);
 
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
@@ -3248,7 +3249,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
 	/* Measure I/O timing when initializing segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
@@ -3310,7 +3311,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 
 	/* Measure I/O timing when flushing segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
@@ -8744,7 +8745,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	/*
 	 * Measure I/O timing to sync the WAL file for pg_stat_io.
 	 */
-	start = pgstat_prepare_io_time();
+	start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (wal_sync_method)
diff --git src/backend/access/transam/xlogreader.c src/backend/access/transam/xlogreader.c
index 995e4f7c26fd..5483d4f0dd2b 100644
--- src/backend/access/transam/xlogreader.c
+++ src/backend/access/transam/xlogreader.c
@@ -1557,7 +1557,7 @@ WALRead(XLogReaderState *state,
 
 #ifndef FRONTEND
 		/* Measure I/O timing when reading segment */
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 		pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 #endif
diff --git src/backend/access/transam/xlogrecovery.c src/backend/access/transam/xlogrecovery.c
index 915cb330295c..c5475aaa97a1 100644
--- src/backend/access/transam/xlogrecovery.c
+++ src/backend/access/transam/xlogrecovery.c
@@ -3401,7 +3401,7 @@ retry:
 	readOff = targetPageOff;
 
 	/* Measure I/O timing when reading segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 	r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);
diff --git src/backend/storage/buffer/bufmgr.c src/backend/storage/buffer/bufmgr.c
index d3216ef6ba75..75cfc2b6fe96 100644
--- src/backend/storage/buffer/bufmgr.c
+++ src/backend/storage/buffer/bufmgr.c
@@ -1509,7 +1509,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 			io_pages[io_buffers_len++] = BufferGetBlock(buffers[i]);
 		}
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 		smgrreadv(operation->smgr, forknum, io_first_block, io_pages, io_buffers_len);
 		pgstat_count_io_op_time(io_object, io_context, IOOP_READ, io_start,
 								1, io_buffers_len * BLCKSZ);
@@ -2401,7 +2401,7 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * Note: if smgrzeroextend fails, we will end up with buffers that are
@@ -3858,7 +3858,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 */
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
@@ -4460,7 +4460,7 @@ FlushRelationBuffers(Relation rel)
 
 				PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-				io_start = pgstat_prepare_io_time();
+				io_start = pgstat_prepare_io_time(track_io_timing);
 
 				smgrwrite(srel,
 						  BufTagGetForkNum(&bufHdr->tag),
@@ -5917,7 +5917,7 @@ IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context)
 	sort_pending_writebacks(wb_context->pending_writebacks,
 							wb_context->nr_pending);
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * Coalesce neighbouring writes, but nothing else. For that we iterate
diff --git src/backend/storage/buffer/localbuf.c src/backend/storage/buffer/localbuf.c
index 78b65fb160ad..64931efaa756 100644
--- src/backend/storage/buffer/localbuf.c
+++ src/backend/storage/buffer/localbuf.c
@@ -244,7 +244,7 @@ GetLocalVictimBuffer(void)
 
 		PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 
 		/* And write... */
 		smgrwrite(oreln,
@@ -414,7 +414,7 @@ ExtendBufferedRelLocal(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/* actually extend relation */
 	smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false);
diff --git src/backend/storage/smgr/md.c src/backend/storage/smgr/md.c
index 298754d1b64a..7bf0b45e2c34 100644
--- src/backend/storage/smgr/md.c
+++ src/backend/storage/smgr/md.c
@@ -1381,7 +1381,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 
 		if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
@@ -1786,7 +1786,7 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 		need_to_close = true;
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/* Sync the file. */
 	result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
diff --git src/backend/utils/activity/pgstat_backend.c src/backend/utils/activity/pgstat_backend.c
index 4a667e7019cc..338da73a9a91 100644
--- src/backend/utils/activity/pgstat_backend.c
+++ src/backend/utils/activity/pgstat_backend.c
@@ -24,6 +24,7 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "storage/bufmgr.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
@@ -43,7 +44,7 @@ void
 pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
 								IOOp io_op, instr_time io_time)
 {
-	Assert(track_io_timing);
+	Assert(track_io_timing || track_wal_io_timing);
 
 	if (!pgstat_tracks_backend_bktype(MyBackendType))
 		return;
diff --git src/backend/utils/activity/pgstat_io.c src/backend/utils/activity/pgstat_io.c
index e7362b52a37b..edb489f3a864 100644
--- src/backend/utils/activity/pgstat_io.c
+++ src/backend/utils/activity/pgstat_io.c
@@ -83,14 +83,15 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 /*
- * Initialize the internal timing for an IO operation.
+ * Initialize the internal timing for an IO operation, depending on an
+ * IO timing GUC.
  */
 instr_time
-pgstat_prepare_io_time(void)
+pgstat_prepare_io_time(bool track_io_guc)
 {
 	instr_time	io_start;
 
-	if (track_io_timing)
+	if (track_io_guc)
 		INSTR_TIME_SET_CURRENT(io_start);
 	else
 	{
@@ -119,7 +120,7 @@ void
 pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 						instr_time start_time, uint32 cnt, uint64 bytes)
 {
-	if (track_io_timing)
+	if (!INSTR_TIME_IS_ZERO(start_time))
 	{
 		instr_time	io_time;
 
diff --git src/backend/utils/misc/guc_tables.c src/backend/utils/misc/guc_tables.c
index 690bf96ef030..ad25cbb39c55 100644
--- src/backend/utils/misc/guc_tables.c
+++ src/backend/utils/misc/guc_tables.c
@@ -1501,6 +1501,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"track_wal_io_timing", PGC_SUSET, STATS_CUMULATIVE,
+			gettext_noop("Collects timing statistics for WAL I/O activity."),
+			NULL
+		},
+		&track_wal_io_timing,
+		false,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"update_process_title", PGC_SUSET, PROCESS_TITLE,
diff --git src/backend/utils/misc/postgresql.conf.sample src/backend/utils/misc/postgresql.conf.sample
index e771d87da1fb..5362ff805195 100644
--- src/backend/utils/misc/postgresql.conf.sample
+++ src/backend/utils/misc/postgresql.conf.sample
@@ -641,6 +641,7 @@
 #track_counts = on
 #track_cost_delay_timing = off
 #track_io_timing = off
+#track_wal_io_timing = off
 #track_functions = none			# none, pl, all
 #stats_fetch_consistency = cache	# cache, none, snapshot
 
diff --git doc/src/sgml/config.sgml doc/src/sgml/config.sgml
index a83545761085..6186648ccaf5 100644
--- doc/src/sgml/config.sgml
+++ doc/src/sgml/config.sgml
@@ -8393,6 +8393,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
+      <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables timing of WAL I/O calls. This parameter is off by default,
+        as it will repeatedly query the operating system for the current time,
+        which may cause significant overhead on some platforms.
+        You can use the <application>pg_test_timing</application> tool to
+        measure the overhead of timing on your system.
+        I/O timing information is displayed in
+        <link linkend="monitoring-pg-stat-io-view">
+        <structname>pg_stat_io</structname></link> for the
+        <varname>object</varname> <literal>wal</literal> and in the output of
+        the <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function.
+        Only superusers and users with the appropriate <literal>SET</literal>
+        privilege can change this setting.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-track-functions" xreflabel="track_functions">
       <term><varname>track_functions</varname> (<type>enum</type>)
       <indexterm>
diff --git doc/src/sgml/monitoring.sgml doc/src/sgml/monitoring.sgml
index 3dfd059b7ee8..9178f1d34efd 100644
--- doc/src/sgml/monitoring.sgml
+++ doc/src/sgml/monitoring.sgml
@@ -183,6 +183,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
    of block read, write, extend, and fsync times.
   </para>
 
+  <para>
+   The parameter <xref linkend="guc-track-wal-io-timing"/> enables monitoring
+   of WAL read, write and fsync times.
+  </para>
+
   <para>
    Normally these parameters are set in <filename>postgresql.conf</filename> so
    that they apply to all server processes, but it is possible to turn
@@ -2723,7 +2728,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para>
        <para>
         Time spent in read operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
@@ -2757,7 +2766,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para>
        <para>
         Time spent in write operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
@@ -2816,8 +2829,12 @@ description | Waiting for a newly initialized WAL file to reach durable storage
         <structfield>extend_time</structfield> <type>double precision</type>
        </para>
        <para>
-        Time spent in extend operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        Time spent in extend operations in milliseconds. (if
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
@@ -2887,7 +2904,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para>
        <para>
         Time spent in fsync operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
diff --git doc/src/sgml/wal.sgml doc/src/sgml/wal.sgml
index 4fc09761115e..f3b86b26be90 100644
--- doc/src/sgml/wal.sgml
+++ doc/src/sgml/wal.sgml
@@ -808,7 +808,7 @@
   <para>
    There are two internal functions to write WAL data to disk:
    <function>XLogWrite</function> and <function>issue_xlog_fsync</function>.
-   When <xref linkend="guc-track-io-timing"/> is enabled, the total
+   When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
    amounts of time <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
    <varname>write_time</varname> and <varname>fsync_time</varname> in
@@ -828,7 +828,7 @@
    <literal>fsync</literal>, or <literal>fsync_writethrough</literal>,
    the write operation moves WAL buffers to kernel cache and
    <function>issue_xlog_fsync</function> syncs them to disk. Regardless
-   of the setting of <varname>track_io_timing</varname>, the number
+   of the setting of <varname>track_wal_io_timing</varname>, the number
    of times <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are also
    counted as <varname>writes</varname> and <varname>fsyncs</varname>
-- 
2.47.2

#13Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#12)
1 attachment(s)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

On Tue, Feb 25, 2025 at 02:20:48PM +0900, Michael Paquier wrote:

With more tweaks applied to the docs, the attached is my result.

Resending once as there was a blip with the prefixes used with
format-patch, preventing git am to work by default.
--
Michael

Attachments:

v2-0001-re-introduce-track_wal_io_timing.patchtext/x-diff; charset=us-asciiDownload
From 196f84b8bd8f817b6e21e65de90464d32a27b286 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 24 Feb 2025 11:29:35 +0000
Subject: [PATCH v2] re-introduce track_wal_io_timing

a051e71e28a added the "timing tracking" in the WAL code path with "only"
track_io_timing enabled (and track_wal_io_timing still false). Then it made
sense for 2421e9a51d2 to get rid of the track_wal_io_timing GUC.

Concerns have been raised about the fact that a051e71e28a added the "timing tracking"
in the WAL code path with "only" track_io_timing enabled. So this commit re-introduces
the track_wal_io_timing GUC and ensure that WAL timing is tracked in pg_stat_io
only if track_io_timing and track_wal_io_timing are both enabled.
---
 src/include/access/xlog.h                     |  1 +
 src/include/pgstat.h                          |  2 +-
 src/backend/access/transam/xlog.c             |  9 +++---
 src/backend/access/transam/xlogreader.c       |  2 +-
 src/backend/access/transam/xlogrecovery.c     |  2 +-
 src/backend/storage/buffer/bufmgr.c           | 10 +++---
 src/backend/storage/buffer/localbuf.c         |  4 +--
 src/backend/storage/smgr/md.c                 |  4 +--
 src/backend/utils/activity/pgstat_backend.c   |  3 +-
 src/backend/utils/activity/pgstat_io.c        |  9 +++---
 src/backend/utils/misc/guc_tables.c           |  9 ++++++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 doc/src/sgml/config.sgml                      | 25 +++++++++++++++
 doc/src/sgml/monitoring.sgml                  | 31 ++++++++++++++++---
 doc/src/sgml/wal.sgml                         |  4 +--
 15 files changed, 88 insertions(+), 28 deletions(-)

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 0a50d160bba8..d313099c027f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -54,6 +54,7 @@ extern PGDLLIMPORT char *wal_consistency_checking_string;
 extern PGDLLIMPORT bool log_checkpoints;
 extern PGDLLIMPORT int CommitDelay;
 extern PGDLLIMPORT int CommitSiblings;
+extern PGDLLIMPORT bool track_wal_io_timing;
 extern PGDLLIMPORT int wal_decode_buffer_size;
 
 extern PGDLLIMPORT int CheckPointSegments;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fc651d03cf97..e64c527265b9 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -564,7 +564,7 @@ extern bool pgstat_bktype_io_stats_valid(PgStat_BktypeIO *backend_io,
 										 BackendType bktype);
 extern void pgstat_count_io_op(IOObject io_object, IOContext io_context,
 							   IOOp io_op, uint32 cnt, uint64 bytes);
-extern instr_time pgstat_prepare_io_time(void);
+extern instr_time pgstat_prepare_io_time(bool track_io_guc);
 extern void pgstat_count_io_op_time(IOObject io_object, IOContext io_context,
 									IOOp io_op, instr_time start_time,
 									uint32 cnt, uint64 bytes);
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 919314f82587..799fc739e18c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -134,6 +134,7 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 int			wal_decode_buffer_size = 512 * 1024;
+bool		track_wal_io_timing = false;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -2437,7 +2438,7 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
 				/*
 				 * Measure I/O timing to write WAL data, for pg_stat_io.
 				 */
-				start = pgstat_prepare_io_time();
+				start = pgstat_prepare_io_time(track_wal_io_timing);
 
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
@@ -3248,7 +3249,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 				 errmsg("could not create file \"%s\": %m", tmppath)));
 
 	/* Measure I/O timing when initializing segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
 	save_errno = 0;
@@ -3310,7 +3311,7 @@ XLogFileInitInternal(XLogSegNo logsegno, TimeLineID logtli,
 	}
 
 	/* Measure I/O timing when flushing segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC);
 	if (pg_fsync(fd) != 0)
@@ -8744,7 +8745,7 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
 	/*
 	 * Measure I/O timing to sync the WAL file for pg_stat_io.
 	 */
-	start = pgstat_prepare_io_time();
+	start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (wal_sync_method)
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 995e4f7c26fd..5483d4f0dd2b 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -1557,7 +1557,7 @@ WALRead(XLogReaderState *state,
 
 #ifndef FRONTEND
 		/* Measure I/O timing when reading segment */
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 		pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 #endif
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 915cb330295c..c5475aaa97a1 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3401,7 +3401,7 @@ retry:
 	readOff = targetPageOff;
 
 	/* Measure I/O timing when reading segment */
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_wal_io_timing);
 
 	pgstat_report_wait_start(WAIT_EVENT_WAL_READ);
 	r = pg_pread(readFile, readBuf, XLOG_BLCKSZ, (off_t) readOff);
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index d3216ef6ba75..75cfc2b6fe96 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1509,7 +1509,7 @@ WaitReadBuffers(ReadBuffersOperation *operation)
 			io_pages[io_buffers_len++] = BufferGetBlock(buffers[i]);
 		}
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 		smgrreadv(operation->smgr, forknum, io_first_block, io_pages, io_buffers_len);
 		pgstat_count_io_op_time(io_object, io_context, IOOP_READ, io_start,
 								1, io_buffers_len * BLCKSZ);
@@ -2401,7 +2401,7 @@ ExtendBufferedRelShared(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * Note: if smgrzeroextend fails, we will end up with buffers that are
@@ -3858,7 +3858,7 @@ FlushBuffer(BufferDesc *buf, SMgrRelation reln, IOObject io_object,
 	 */
 	bufToWrite = PageSetChecksumCopy((Page) bufBlock, buf->tag.blockNum);
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * bufToWrite is either the shared buffer or a copy, as appropriate.
@@ -4460,7 +4460,7 @@ FlushRelationBuffers(Relation rel)
 
 				PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-				io_start = pgstat_prepare_io_time();
+				io_start = pgstat_prepare_io_time(track_io_timing);
 
 				smgrwrite(srel,
 						  BufTagGetForkNum(&bufHdr->tag),
@@ -5917,7 +5917,7 @@ IssuePendingWritebacks(WritebackContext *wb_context, IOContext io_context)
 	sort_pending_writebacks(wb_context->pending_writebacks,
 							wb_context->nr_pending);
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/*
 	 * Coalesce neighbouring writes, but nothing else. For that we iterate
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 78b65fb160ad..64931efaa756 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -244,7 +244,7 @@ GetLocalVictimBuffer(void)
 
 		PageSetChecksumInplace(localpage, bufHdr->tag.blockNum);
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 
 		/* And write... */
 		smgrwrite(oreln,
@@ -414,7 +414,7 @@ ExtendBufferedRelLocal(BufferManagerRelation bmr,
 		}
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/* actually extend relation */
 	smgrzeroextend(bmr.smgr, fork, first_block, extend_by, false);
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 298754d1b64a..7bf0b45e2c34 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1381,7 +1381,7 @@ register_dirty_segment(SMgrRelation reln, ForkNumber forknum, MdfdVec *seg)
 		ereport(DEBUG1,
 				(errmsg_internal("could not forward fsync request because request queue is full")));
 
-		io_start = pgstat_prepare_io_time();
+		io_start = pgstat_prepare_io_time(track_io_timing);
 
 		if (FileSync(seg->mdfd_vfd, WAIT_EVENT_DATA_FILE_SYNC) < 0)
 			ereport(data_sync_elevel(ERROR),
@@ -1786,7 +1786,7 @@ mdsyncfiletag(const FileTag *ftag, char *path)
 		need_to_close = true;
 	}
 
-	io_start = pgstat_prepare_io_time();
+	io_start = pgstat_prepare_io_time(track_io_timing);
 
 	/* Sync the file. */
 	result = FileSync(file, WAIT_EVENT_DATA_FILE_SYNC);
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 4a667e7019cc..338da73a9a91 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -24,6 +24,7 @@
 
 #include "postgres.h"
 
+#include "access/xlog.h"
 #include "storage/bufmgr.h"
 #include "utils/memutils.h"
 #include "utils/pgstat_internal.h"
@@ -43,7 +44,7 @@ void
 pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
 								IOOp io_op, instr_time io_time)
 {
-	Assert(track_io_timing);
+	Assert(track_io_timing || track_wal_io_timing);
 
 	if (!pgstat_tracks_backend_bktype(MyBackendType))
 		return;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index e7362b52a37b..edb489f3a864 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -83,14 +83,15 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 }
 
 /*
- * Initialize the internal timing for an IO operation.
+ * Initialize the internal timing for an IO operation, depending on an
+ * IO timing GUC.
  */
 instr_time
-pgstat_prepare_io_time(void)
+pgstat_prepare_io_time(bool track_io_guc)
 {
 	instr_time	io_start;
 
-	if (track_io_timing)
+	if (track_io_guc)
 		INSTR_TIME_SET_CURRENT(io_start);
 	else
 	{
@@ -119,7 +120,7 @@ void
 pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
 						instr_time start_time, uint32 cnt, uint64 bytes)
 {
-	if (track_io_timing)
+	if (!INSTR_TIME_IS_ZERO(start_time))
 	{
 		instr_time	io_time;
 
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 690bf96ef030..ad25cbb39c55 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1501,6 +1501,15 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"track_wal_io_timing", PGC_SUSET, STATS_CUMULATIVE,
+			gettext_noop("Collects timing statistics for WAL I/O activity."),
+			NULL
+		},
+		&track_wal_io_timing,
+		false,
+		NULL, NULL, NULL
+	},
 
 	{
 		{"update_process_title", PGC_SUSET, PROCESS_TITLE,
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index e771d87da1fb..5362ff805195 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -641,6 +641,7 @@
 #track_counts = on
 #track_cost_delay_timing = off
 #track_io_timing = off
+#track_wal_io_timing = off
 #track_functions = none			# none, pl, all
 #stats_fetch_consistency = cache	# cache, none, snapshot
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a83545761085..6186648ccaf5 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8393,6 +8393,31 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-track-wal-io-timing" xreflabel="track_wal_io_timing">
+      <term><varname>track_wal_io_timing</varname> (<type>boolean</type>)
+      <indexterm>
+       <primary><varname>track_wal_io_timing</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Enables timing of WAL I/O calls. This parameter is off by default,
+        as it will repeatedly query the operating system for the current time,
+        which may cause significant overhead on some platforms.
+        You can use the <application>pg_test_timing</application> tool to
+        measure the overhead of timing on your system.
+        I/O timing information is displayed in
+        <link linkend="monitoring-pg-stat-io-view">
+        <structname>pg_stat_io</structname></link> for the
+        <varname>object</varname> <literal>wal</literal> and in the output of
+        the <link linkend="pg-stat-get-backend-io">
+        <function>pg_stat_get_backend_io()</function></link> function.
+        Only superusers and users with the appropriate <literal>SET</literal>
+        privilege can change this setting.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-track-functions" xreflabel="track_functions">
       <term><varname>track_functions</varname> (<type>enum</type>)
       <indexterm>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3dfd059b7ee8..9178f1d34efd 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -183,6 +183,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
    of block read, write, extend, and fsync times.
   </para>
 
+  <para>
+   The parameter <xref linkend="guc-track-wal-io-timing"/> enables monitoring
+   of WAL read, write and fsync times.
+  </para>
+
   <para>
    Normally these parameters are set in <filename>postgresql.conf</filename> so
    that they apply to all server processes, but it is possible to turn
@@ -2723,7 +2728,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para>
        <para>
         Time spent in read operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
@@ -2757,7 +2766,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para>
        <para>
         Time spent in write operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
@@ -2816,8 +2829,12 @@ description | Waiting for a newly initialized WAL file to reach durable storage
         <structfield>extend_time</structfield> <type>double precision</type>
        </para>
        <para>
-        Time spent in extend operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        Time spent in extend operations in milliseconds. (if
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
@@ -2887,7 +2904,11 @@ description | Waiting for a newly initialized WAL file to reach durable storage
        </para>
        <para>
         Time spent in fsync operations in milliseconds (if
-        <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+        <xref linkend="guc-track-io-timing"/> is enabled and
+        <varname>object</varname> is not <literal>wal</literal>,
+        or if <xref linkend="guc-track-wal-io-timing"/> is enabled
+        and <varname>object</varname> is <literal>wal</literal>,
+        otherwise zero)
        </para>
       </entry>
      </row>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4fc09761115e..f3b86b26be90 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -808,7 +808,7 @@
   <para>
    There are two internal functions to write WAL data to disk:
    <function>XLogWrite</function> and <function>issue_xlog_fsync</function>.
-   When <xref linkend="guc-track-io-timing"/> is enabled, the total
+   When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
    amounts of time <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
    <varname>write_time</varname> and <varname>fsync_time</varname> in
@@ -828,7 +828,7 @@
    <literal>fsync</literal>, or <literal>fsync_writethrough</literal>,
    the write operation moves WAL buffers to kernel cache and
    <function>issue_xlog_fsync</function> syncs them to disk. Regardless
-   of the setting of <varname>track_io_timing</varname>, the number
+   of the setting of <varname>track_wal_io_timing</varname>, the number
    of times <function>XLogWrite</function> writes and
    <function>issue_xlog_fsync</function> syncs WAL data to disk are also
    counted as <varname>writes</varname> and <varname>fsyncs</varname>
-- 
2.47.2

#14Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#12)
1 attachment(s)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

Hi,

On Tue, Feb 25, 2025 at 02:20:48PM +0900, Michael Paquier wrote:

On Mon, Feb 24, 2025 at 01:08:03PM +0000, Bertrand Drouvot wrote:

I agree that we've to put everything in the picture (system with or without
cheap timing functions, lock contention and WAL flush disk time) and that we
can certainly find configuration/workload that would get benefits from a
dedicated track_wal_io_timing GUC.

PFA a patch to re-introduce the track_wal_io_timing GUC and to ensure that the
WAL write and fsync timing activities are tracked in pg_stat_io (and
pg_stat_get_backend_io()) only if both track_io_timing and track_wal_io_timing
are enabled.

Note that to change the a051e71e28a behavior, the attached patch adds an extra
"track_io_guc" parameter to pgstat_count_io_op_time() (like pgstat_prepare_io_time
already had in a051e71e28a).

+      bool      track_timing = track_io_timing && track_wal_io_timing;
-      start = pgstat_prepare_io_time();
+      start = pgstat_prepare_io_time(track_timing);

This does not represent exactly what I am understanding from the
comment of upthread, because WAL IO timings would require both
track_wal_io_timing and track_io_timing to be enabled with this patch.
It seems to me that what we should do is to decouple completely the
timings for WAL and non-WAL IO across the two GUCs track_wal_io_timing
and track_io_timing, without any dependency between one and the other.
This way, it is possible to only enable one of them without affecting
the paths of the other, or both if you are ready to pay the price.

The idea was to not let track_io_timing alone enable the timing in the WAL
code path. My reasoning was: if you want to see timing in pg_stat_io then you
need to enable track_io_timing. But that's not enough if you also want to see
WAL timing, then you also need to set track_wal_io_timing. Your proposal also
ensures that "track_io_timing alone can not enable the timing in the WAL code path",
with a clear separation of duties, it's probably better so I'm fine with it.

Two new things tracked in pg_stat_io are WALRead() and XLogPageRead(),
which are used at recovery, and for consistency with the rest there is
a good argument for controlling these as well with
track_wal_io_timing, I guess.

Yeah, I though about it too but decided to not change the READ part in v1 (because
I think they are less of a concern). OTOH, if you want to see the READ timing then
you need to set track_wal_io_timing but then once the recovery is over then you'll
need to disable track_wal_io_timing (if you don't want to pay the price for
write/fsync activities).

OTOH pre a051e71e28a track_wal_io_timing was impacting only the write/fsyncs,
in that regard v1 was close to that.

I think both idea (v1 / v2) have pros and cons and I don't have a strong opinion
on it (though I do prefer v1 a bit for the reasons mentioned above).

void
pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
-                        instr_time start_time, uint32 cnt, uint64 bytes)
+                        instr_time start_time, uint32 cnt, uint64 bytes,
+                        bool track_io_guc)

Not much a fan of the new argument to pass the GUC value, which could
be error prone. It would be simpler to check that start_time is 0
instead. There is no need to change the other callers of
pgstat_count_io_op_time() if we do that.

Yeah I thought about it too

-       if (track_io_guc)
+       if (!INSTR_TIME_IS_ZERO(start_time))

INSTR_TIME_IS_ZERO is cheap so it is fine by me.

pgstat_count_backend_io_op_time() would have triggered an assertion
failure, it needs to be adjusted.

With v2 in place, yeah.

With more tweaks applied to the docs, the attached is my result.

In the track_io_timing GUC description Shouldn't we also mention the
wal object restriction, something like?

"
in pg_stat_database, pg_stat_io (if object is not wal), in the output of the
pg_stat_get_backend_io() function (if object is not wal)
"

Also in the track_wal_io_timing GUC description add the wal object restriction
for the function:

"
and in the output of the pg_stat_get_backend_io() function for the object wal
"

The proposed doc changes are in the .txt attached (that applies on top of v2).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

doc_update.txttext/plain; charset=us-asciiDownload
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6186648ccaf..e55700f35b8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8378,9 +8378,11 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         displayed in <link linkend="monitoring-pg-stat-database-view">
         <structname>pg_stat_database</structname></link>,
         <link linkend="monitoring-pg-stat-io-view">
-        <structname>pg_stat_io</structname></link>, in the output of the
+        <structname>pg_stat_io</structname></link> (if <varname>object</varname>
+        is not <literal>wal</literal>), in the output of the
         <link linkend="pg-stat-get-backend-io">
-        <function>pg_stat_get_backend_io()</function></link> function, in the
+        <function>pg_stat_get_backend_io()</function></link> function (if
+        <varname>object</varname> is not <literal>wal</literal>), in the
         output of <xref linkend="sql-explain"/> when the <literal>BUFFERS</literal>
         option is used, in the output of <xref linkend="sql-vacuum"/> when
         the <literal>VERBOSE</literal> option is used, by autovacuum
@@ -8411,7 +8413,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
         <structname>pg_stat_io</structname></link> for the
         <varname>object</varname> <literal>wal</literal> and in the output of
         the <link linkend="pg-stat-get-backend-io">
-        <function>pg_stat_get_backend_io()</function></link> function.
+        <function>pg_stat_get_backend_io()</function></link> function for the
+        <varname>object</varname> <literal>wal</literal>.
         Only superusers and users with the appropriate <literal>SET</literal>
         privilege can change this setting.
        </para>
#15Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#14)
Re: Remove wal_[sync|write][_time] from pg_stat_wal and track_wal_io_timing

On Tue, Feb 25, 2025 at 08:12:53AM +0000, Bertrand Drouvot wrote:

The idea was to not let track_io_timing alone enable the timing in the WAL
code path. My reasoning was: if you want to see timing in pg_stat_io then you
need to enable track_io_timing. But that's not enough if you also want to see
WAL timing, then you also need to set track_wal_io_timing. Your proposal also
ensures that "track_io_timing alone can not enable the timing in the WAL code path",
with a clear separation of duties, it's probably better so I'm fine with it.

One problem with this approach is that you would need to always pay
the price of the timings under track_io_timing if the WAL part is
enabled, so you'd lose flexibility compared to the past configuration.
Based on the complaint of upthread, a split makes things more
flexible, as it is possible to monitor one or the other or both. I'm
OK to tweak things more as required, we still have time for that.

Yeah, I though about it too but decided to not change the READ part in v1 (because
I think they are less of a concern). OTOH, if you want to see the READ timing then
you need to set track_wal_io_timing but then once the recovery is over then you'll
need to disable track_wal_io_timing (if you don't want to pay the price for
write/fsync activities).

OTOH pre a051e71e28a track_wal_io_timing was impacting only the write/fsyncs,
in that regard v1 was close to that.

Still less consistent.. We've never tracked WAL reads in the stats up
to now, so we need to put it in one of these buckets.

In the track_io_timing GUC description Shouldn't we also mention the
wal object restriction, something like?

"
in pg_stat_database, pg_stat_io (if object is not wal), in the output of the
pg_stat_get_backend_io() function (if object is not wal)
"

The proposed doc changes are in the .txt attached (that applies on top of v2).

Sounds like a good set of additions as we have the two GUCs.

Bonus idea: We could also have more GUCs to control all that, or just
put eveything in one single GUC that takes a list of values made of
pairs of (object,op), then set the timings only for the combinations
listed in the GUC. That feels a bit over-engineered, but you could
deeply control the areas where the data is aggregated. I'm not really
convinced it is strictly necessary, but, well, that would not be the
first time people tell me I'm wrong.. All things said, v2 sounds like
it has the right balance for now based on what I am reading, providing
the same control as pg_stat_wal previously, so I've used it for now.
--
Michael