New statistics for tuning WAL buffer size

Started by Masahiro Ikedaover 5 years ago57 messages

ikedamsh@oss.nttdata.com

over 5 years ago

Hi,

It's important to provide the metrics for tuning the size of WAL
buffers.
For now, it's lack of the statistics how often processes wait to write
WAL because WAL buffer is full.

If those situation are often occurred, WAL buffer is too small for the
workload.
DBAs must to tune the WAL buffer size for performance improvement.

There are related threads, but those are not merged.
/messages/by-id/4FF824F3.5090407@uptime.jp
/messages/by-id/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com

What do you think?
If we can have a consensus, I will make a PoC patch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Masahiro Ikeda (#1)

RE: New statistics for tuning WAL buffer size

From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>

It's important to provide the metrics for tuning the size of WAL buffers.
For now, it's lack of the statistics how often processes wait to write WAL
because WAL buffer is full.

If those situation are often occurred, WAL buffer is too small for the workload.
DBAs must to tune the WAL buffer size for performance improvement.

Yes, it's helpful to know if we need to enlarge the WAL buffer. That's why our colleague HariBabu proposed the patch. We'd be happy if it could be committed in some form.

There are related threads, but those are not merged.
/messages/by-id/4FF824F3.5090407@uptime.jp
/messages/by-id/CAJrrPGc6APFUGYNcPe4qcNx
pL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com

What's the difference between those patches? What blocked them from being committed?

Regards
Takayuki Tsunakawa

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: tsunakawa.takay@fujitsu.com (#2)

RE: New statistics for tuning WAL buffer size

On 2020-08-18 16:35, tsunakawa.takay@fujitsu.com wrote:

From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>

It's important to provide the metrics for tuning the size of WAL
buffers.
For now, it's lack of the statistics how often processes wait to write
WAL
because WAL buffer is full.

If those situation are often occurred, WAL buffer is too small for the
workload.
DBAs must to tune the WAL buffer size for performance improvement.

Yes, it's helpful to know if we need to enlarge the WAL buffer.
That's why our colleague HariBabu proposed the patch. We'd be happy
if it could be committed in some form.

There are related threads, but those are not merged.
/messages/by-id/4FF824F3.5090407@uptime.jp
/messages/by-id/CAJrrPGc6APFUGYNcPe4qcNx
pL8gXKYv1KST%2BvwJcFtCSCEySnA%40mail.gmail.com

What's the difference between those patches? What blocked them from
being committed?

Thanks for replying.

Since the above threads are not active now and those patches can't be
applied HEAD,
I made this thread. If it is better to reply the above thread, I will do
so.

If my understanding is correct, we have to measure the performance
impact first.
Do you know HariBabu is now trying to solve it? If not, I will try to
modify patches to apply HEAD.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Masahiro Ikeda (#3)

RE: New statistics for tuning WAL buffer size

From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>

If my understanding is correct, we have to measure the performance
impact first.
Do you know HariBabu is now trying to solve it? If not, I will try to
modify patches to apply HEAD.

No, he's not doing it anymore. It'd be great if you could resume it. However, I recommend sharing your understanding about what were the issues with those two threads and how you're trying to solve them. Was the performance overhead the blocker in both of the threads?

Regards
Takayuki Tsunakawa

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: tsunakawa.takay@fujitsu.com (#4)

RE: New statistics for tuning WAL buffer size

On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote:

From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>

If my understanding is correct, we have to measure the performance
impact first.
Do you know HariBabu is now trying to solve it? If not, I will try to
modify patches to apply HEAD.

No, he's not doing it anymore. It'd be great if you could resume it.

OK, thanks.

However, I recommend sharing your understanding about what were the
issues with those two threads and how you're trying to solve them.
Was the performance overhead the blocker in both of the threads?

In my understanding, some comments are not solved in both of the
threads.
I think the following works are remained.

1) Modify patches to apply HEAD
2) Get consensus what metrics we collect and how to use them for tuning.
3) Measure performance impact and if it leads poor performance, we solve
it.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#5)

Re: New statistics for tuning WAL buffer size

On 2020/08/19 14:10, Masahiro Ikeda wrote:

On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote:

From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>

If my understanding is correct, we have to measure the performance
impact first.
Do you know HariBabu is now trying to solve it? If not, I will try to
modify patches to apply HEAD.

No, he's not doing it anymore. It'd be great if you could resume it.

OK, thanks.

However, I recommend sharing your understanding about what were the
issues with those two threads and how you're trying to solve them.
Was the performance overhead the blocker in both of the threads?

In my understanding, some comments are not solved in both of the threads.
I think the following works are remained.

1) Modify patches to apply HEAD
2) Get consensus what metrics we collect and how to use them for tuning.

I agree to expose the number of WAL write caused by full of WAL buffers.
It's helpful when tuning wal_buffers size. Haribabu separated that number
into two fields in his patch; one is the number of WAL write by backend,
and another is by background processes and workers. But I'm not sure
how useful such separation is. I'm ok with just one field for that number.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#6)

Re: New statistics for tuning WAL buffer size

On 2020/08/20 20:01, Fujii Masao wrote:

On 2020/08/19 14:10, Masahiro Ikeda wrote:

On 2020-08-19 13:49, tsunakawa.takay@fujitsu.com wrote:

From: Masahiro Ikeda <ikedamsh@oss.nttdata.com>

If my understanding is correct, we have to measure the performance
impact first.
Do you know HariBabu is now trying to solve it? If not, I will try to
modify patches to apply HEAD.

No, he's not doing it anymore. It'd be great if you could resume it.

OK, thanks.

However, I recommend sharing your understanding about what were the
issues with those two threads and how you're trying to solve them.
Was the performance overhead the blocker in both of the threads?

In my understanding, some comments are not solved in both of the threads.
I think the following works are remained.

1) Modify patches to apply HEAD
2) Get consensus what metrics we collect and how to use them for tuning.

I agree to expose the number of WAL write caused by full of WAL buffers.
It's helpful when tuning wal_buffers size. Haribabu separated that number
into two fields in his patch; one is the number of WAL write by backend,
and another is by background processes and workers. But I'm not sure
how useful such separation is. I'm ok with just one field for that number.

Just idea; it may be worth exposing the number of when new WAL file is
created and zero-filled. This initialization may have impact on
the performance of write-heavy workload generating lots of WAL. If this
number is reported high, to reduce the number of this initialization,
we can tune WAL-related parameters so that more "recycled" WAL files
can be hold.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Fujii Masao (#6)

RE: New statistics for tuning WAL buffer size

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I agree to expose the number of WAL write caused by full of WAL buffers.
It's helpful when tuning wal_buffers size. Haribabu separated that number
into two fields in his patch; one is the number of WAL write by backend,
and another is by background processes and workers. But I'm not sure
how useful such separation is. I'm ok with just one field for that number.

I agree with you. I don't think we need to separate the numbers for foreground processes and background ones. WAL buffer is a single resource. So "Writes due to full WAL buffer are happening. We may be able to boost performance by increasing wal_buffers" would be enough.

Regards
Takayuki Tsunakawa

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Fujii Masao (#7)

RE: New statistics for tuning WAL buffer size

From: Fujii Masao <masao.fujii@oss.nttdata.com>

Just idea; it may be worth exposing the number of when new WAL file is
created and zero-filled. This initialization may have impact on
the performance of write-heavy workload generating lots of WAL. If this
number is reported high, to reduce the number of this initialization,
we can tune WAL-related parameters so that more "recycled" WAL files
can be hold.

Sounds good. Actually, I want to know how much those zeroing affected the transaction response times, but it may be the target of the wait event statistics that Imai-san is addressing.

(I wonder how the fallocate() patch went that tries to minimize the zeroing time.)

Regards
Takayuki Tsunakawa

#10

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: tsunakawa.takay@fujitsu.com (#9)

Re: New statistics for tuning WAL buffer size

On 2020/08/21 12:08, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

Just idea; it may be worth exposing the number of when new WAL file is
created and zero-filled. This initialization may have impact on
the performance of write-heavy workload generating lots of WAL. If this
number is reported high, to reduce the number of this initialization,
we can tune WAL-related parameters so that more "recycled" WAL files
can be hold.

Sounds good. Actually, I want to know how much those zeroing affected the transaction response times, but it may be the target of the wait event statistics that Imai-san is addressing.

Maybe, so I'm ok if the first pg_stat_walwriter patch doesn't expose
this number. We can extend it to include that later after we confirm
that number is really useful.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#11

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#10)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

Hi, thanks for useful comments.

I agree to expose the number of WAL write caused by full of WAL
buffers.
It's helpful when tuning wal_buffers size. Haribabu separated that
number
into two fields in his patch; one is the number of WAL write by
backend,
and another is by background processes and workers. But I'm not sure
how useful such separation is. I'm ok with just one field for that
number.

I agree with you. I don't think we need to separate the numbers for
foreground processes and background ones. WAL buffer is a single
resource. So "Writes due to full WAL buffer are happening. We may be
able to boost performance by increasing wal_buffers" would be enough.

I made a patch to expose the number of WAL write caused by full of WAL
buffers.
I'm going to submit this patch to commitfests.

As Fujii-san and Tsunakawa-san said, it expose the total number
since I agreed that we don't need to separate the numbers for
foreground processes and background ones.

By the way, do we need to add another metrics related to WAL?
For example, is the total number of WAL writes to the buffers useful to
calculate the dirty WAL write ratio?

Is it enough as a first step?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0001_pg_stat_walwrites_view.patchtext/x-diff; name=0001_pg_stat_walwrites_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7dcddf478a..d49e539da3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_walwrites</structname><indexterm><primary>pg_stat_walwrites</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the
+       WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-walwrites-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3244,6 +3252,48 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-walwrites-view">
+   <title><structname>pg_stat_walwrites</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_walwrites</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_walwrites</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-walwrites-view" xreflabel="pg_stat_walwrites">
+   <title><structname>pg_stat_walwrites</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>dirty_writes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of dirty WAL writes that are carried out by background processes and workers
+       when the <xref linkend="guc-wal-buffers"/> are full.
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4632,8 +4682,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view,or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>walwrites</literal>
+        to reset all the counters shown in the <structname>pg_stat_walwrites</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 756b838e6a..66abd200d1 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2193,6 +2193,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Write = OldPageRqstPtr;
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
+					WALWriteStats->dirty_writes++;
 					LWLockRelease(WALWriteLock);
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8625cbeab6..cfc6a13b6a 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -994,6 +994,9 @@ CREATE VIEW pg_stat_progress_analyze AS
     FROM pg_stat_get_progress_info('ANALYZE') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_walwrites AS
+   SELECT * FROM pg_stat_get_walwrites() AS A;
+
 CREATE VIEW pg_stat_progress_vacuum AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 15f92b66c6..17feecd1a5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -167,6 +167,15 @@ static const char *const slru_names[] = {
  */
 static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
 
+/*
+ * WalWrites Local statistics counters.
+ * The statistics data gets populated in XLogWrite function.
+ * Stored directly in a stats message structure so it can be sent
+ * to stats collector process without needing to copy things around.
+ * We assume this inits to zeroes.
+ */
+PgStat_WalWritesStats *WALWriteStats;
+
 /* ----------
  * Local data
  * ----------
@@ -657,6 +666,29 @@ startup_failed:
 	SetConfigOption("track_counts", "off", PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
+/*
+ * Initialization of shared memory for WALWritesStats
+ */
+Size
+WALWritesShmemSize(void)
+{
+	return sizeof(PgStat_WalWritesStats);
+}
+
+void
+WALWritesShmemInit(void)
+{
+	bool		foundWALWrites;
+
+	WALWriteStats = (PgStat_WalWritesStats *)
+		ShmemInitStruct("WAL WriteStats", WALWritesShmemSize(), &foundWALWrites);
+
+	if (!foundWALWrites)
+	{
+		MemSet(WALWriteStats, 0, sizeof(PgStat_WalWritesStats));
+	}
+}
+
 /*
  * subroutine for pgstat_reset_all
  */
@@ -1370,11 +1402,25 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "walwrites") == 0)
+	{
+		/*
+		 * Reset the wal writes statistics of the cluster. These statistics
+		 * are not reset by the stats collector because these are resides in a
+		 * shared memory, so it is not possible for the stats collector to
+		 * reset them. FIXME: This may need a sepearate function entirely to
+		 * reset the stats.
+		 */
+		LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
+		memset(WALWriteStats, 0, sizeof(PgStat_WalWritesStats));
+		WALWriteStats->stat_reset_timestamp = GetCurrentTimestamp();
+		LWLockRelease(WALWriteLock);
+	}
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\" or \"bgwriter\" or \"walwrites\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..a49760ee24 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -149,6 +149,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WALWritesShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -216,6 +217,7 @@ CreateSharedMemoryAndSemaphores(void)
 	 * Set up xlog, clog, and buffers
 	 */
 	XLOGShmemInit();
+	WALWritesShmemInit();
 	CLOGShmemInit();
 	CommitTsShmemInit();
 	SUBTRANSShmemInit();
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..12d762ef6a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2098,3 +2098,38 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
+
+Datum
+pg_stat_get_walwrites(PG_FUNCTION_ARGS)
+{
+	TupleDesc	tupdesc;
+	TimestampTz result;
+#define NUM_PG_STAT_WALWRITE_COLS 2
+	Datum		values[NUM_PG_STAT_WALWRITE_COLS];
+	bool		nulls[NUM_PG_STAT_WALWRITE_COLS];
+
+	/* Initialize values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
+
+	/* Get statistics about the archiver process */
+	/* Fill values and NULLs */
+	values[0] = Int64GetDatum(WALWriteStats->dirty_writes);
+
+	result = TimestampTzGetDatum(WALWriteStats->stat_reset_timestamp);
+	if (result == 0)
+		nulls[1] = true;
+	else
+		values[1] = result;
+
+	LWLockRelease(WALWriteLock);
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(
+									  heap_form_tuple(tupdesc, values, nulls)));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 082a11f270..f87bfdcfa9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,6 +5484,14 @@
   proargnames => '{name,blks_zeroed,blks_hit,blks_read,blks_written,blks_exists,flushes,truncates,stats_reset}',
   prosrc => 'pg_stat_get_slru' },
 
+{ oid => '8000', descr => 'statistics: information about WAL writes activity',
+  proname => 'pg_stat_get_walwrites', provolatile => 's', proparallel => 'r',
+  prorettype => 'record', proargtypes => '',
+  proallargtypes => '{int8,timestamptz}',
+  proargmodes => '{o,o}',
+  proargnames => '{dirty_writes,stats_reset}',
+  prosrc => 'pg_stat_get_walwrites' },
+
 { oid => '2978', descr => 'statistics: number of function calls',
   proname => 'pg_stat_get_function_calls', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..94e4da6fd7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -745,6 +745,19 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * Walwrites statistics kept in the stats collector
+ */
+typedef struct PgStat_WalWritesStats
+{
+	PgStat_Counter dirty_writes;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalWritesStats;
+
+/* ----------
+ * Backend types
+ * ----------
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1260,6 +1273,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics updated in XLogWrite function
+ */
+extern PgStat_WalWritesStats * WALWriteStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1278,6 +1296,9 @@ extern int	pgstat_start(void);
 extern void pgstat_reset_all(void);
 extern void allow_immediate_pgstat_restart(void);
 
+extern Size WALWritesShmemSize(void);
+extern void WALWritesShmemInit(void);
+
 #ifdef EXEC_BACKEND
 extern void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
 #endif
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 601734a6f1..3457cf2904 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2136,6 +2136,9 @@ pg_stat_wal_receiver| SELECT s.pid,
     s.conninfo
    FROM pg_stat_get_wal_receiver() s(pid, status, receive_start_lsn, receive_start_tli, written_lsn, flushed_lsn, received_tli, last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time, slot_name, sender_host, sender_port, conninfo)
   WHERE (s.pid IS NOT NULL);
+pg_stat_walwrites| SELECT a.dirty_writes,
+    a.stats_reset
+   FROM pg_stat_get_walwrites() a(dirty_writes, stats_reset);
 pg_stat_xact_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..3c04c57023 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -67,6 +67,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwrites;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 28e412b735..21f49c9a3b 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -32,6 +32,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwrites;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#12

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#11)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-08-24 20:45, Masahiro Ikeda wrote:

Hi, thanks for useful comments.

I agree to expose the number of WAL write caused by full of WAL
buffers.
It's helpful when tuning wal_buffers size. Haribabu separated that
number
into two fields in his patch; one is the number of WAL write by
backend,
and another is by background processes and workers. But I'm not sure
how useful such separation is. I'm ok with just one field for that
number.

I agree with you. I don't think we need to separate the numbers for
foreground processes and background ones. WAL buffer is a single
resource. So "Writes due to full WAL buffer are happening. We may be
able to boost performance by increasing wal_buffers" would be enough.

I made a patch to expose the number of WAL write caused by full of WAL
buffers.
I'm going to submit this patch to commitfests.

As Fujii-san and Tsunakawa-san said, it expose the total number
since I agreed that we don't need to separate the numbers for
foreground processes and background ones.

By the way, do we need to add another metrics related to WAL?
For example, is the total number of WAL writes to the buffers useful
to calculate the dirty WAL write ratio?

Is it enough as a first step?

I forgot to rebase the current master.
I've attached the rebased patch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0002_pg_stat_walwrites_view.patchtext/x-diff; name=0002_pg_stat_walwrites_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0f11375c85..8507f1b7e1 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_walwrites</structname><indexterm><primary>pg_stat_walwrites</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the
+       WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-walwrites-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3260,6 +3268,48 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-walwrites-view">
+   <title><structname>pg_stat_walwrites</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_walwrites</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_walwrites</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-walwrites-view" xreflabel="pg_stat_walwrites">
+   <title><structname>pg_stat_walwrites</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>dirty_writes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of dirty WAL writes that are carried out by background processes and workers
+       when the <xref linkend="guc-wal-buffers"/> are full.
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4648,8 +4698,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>walwrites</literal>
+        to reset all the counters shown in the <structname>pg_stat_walwrites</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..450870f89a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2193,6 +2193,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Write = OldPageRqstPtr;
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
+					WALWriteStats->dirty_writes++;
 					LWLockRelease(WALWriteLock);
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ba5a23ac25..ca7fa7eb66 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -997,6 +997,9 @@ CREATE VIEW pg_stat_progress_analyze AS
     FROM pg_stat_get_progress_info('ANALYZE') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_walwrites AS
+   SELECT * FROM pg_stat_get_walwrites() AS A;
+
 CREATE VIEW pg_stat_progress_vacuum AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 73ce944fb1..56c90329a8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -167,6 +167,15 @@ static const char *const slru_names[] = {
  */
 static PgStat_MsgSLRU SLRUStats[SLRU_NUM_ELEMENTS];
 
+/*
+ * WalWrites Local statistics counters.
+ * The statistics data gets populated in XLogWrite function.
+ * Stored directly in a stats message structure so it can be sent
+ * to stats collector process without needing to copy things around.
+ * We assume this inits to zeroes.
+ */
+PgStat_WalWritesStats *WALWriteStats;
+
 /* ----------
  * Local data
  * ----------
@@ -657,6 +666,29 @@ startup_failed:
 	SetConfigOption("track_counts", "off", PGC_INTERNAL, PGC_S_OVERRIDE);
 }
 
+/*
+ * Initialization of shared memory for WALWritesStats
+ */
+Size
+WALWritesShmemSize(void)
+{
+	return sizeof(PgStat_WalWritesStats);
+}
+
+void
+WALWritesShmemInit(void)
+{
+	bool		foundWALWrites;
+
+	WALWriteStats = (PgStat_WalWritesStats *)
+		ShmemInitStruct("WAL WriteStats", WALWritesShmemSize(), &foundWALWrites);
+
+	if (!foundWALWrites)
+	{
+		MemSet(WALWriteStats, 0, sizeof(PgStat_WalWritesStats));
+	}
+}
+
 /*
  * subroutine for pgstat_reset_all
  */
@@ -1370,11 +1402,25 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "walwrites") == 0)
+	{
+		/*
+		 * Reset the wal writes statistics of the cluster. These statistics
+		 * are not reset by the stats collector because these are resides in a
+		 * shared memory, so it is not possible for the stats collector to
+		 * reset them. FIXME: This may need a sepearate function entirely to
+		 * reset the stats.
+		 */
+		LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
+		memset(WALWriteStats, 0, sizeof(PgStat_WalWritesStats));
+		WALWriteStats->stat_reset_timestamp = GetCurrentTimestamp();
+		LWLockRelease(WALWriteLock);
+	}
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\" or \"bgwriter\" or \"walwrites\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 96c2aaabbd..a49760ee24 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -149,6 +149,7 @@ CreateSharedMemoryAndSemaphores(void)
 		size = add_size(size, BTreeShmemSize());
 		size = add_size(size, SyncScanShmemSize());
 		size = add_size(size, AsyncShmemSize());
+		size = add_size(size, WALWritesShmemSize());
 #ifdef EXEC_BACKEND
 		size = add_size(size, ShmemBackendArraySize());
 #endif
@@ -216,6 +217,7 @@ CreateSharedMemoryAndSemaphores(void)
 	 * Set up xlog, clog, and buffers
 	 */
 	XLOGShmemInit();
+	WALWritesShmemInit();
 	CLOGShmemInit();
 	CommitTsShmemInit();
 	SUBTRANSShmemInit();
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..12d762ef6a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -2098,3 +2098,38 @@ pg_stat_get_archiver(PG_FUNCTION_ARGS)
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
 }
+
+Datum
+pg_stat_get_walwrites(PG_FUNCTION_ARGS)
+{
+	TupleDesc	tupdesc;
+	TimestampTz result;
+#define NUM_PG_STAT_WALWRITE_COLS 2
+	Datum		values[NUM_PG_STAT_WALWRITE_COLS];
+	bool		nulls[NUM_PG_STAT_WALWRITE_COLS];
+
+	/* Initialize values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
+
+	/* Build a tuple descriptor for our result type */
+	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+		elog(ERROR, "return type must be a row type");
+
+	LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
+
+	/* Get statistics about the archiver process */
+	/* Fill values and NULLs */
+	values[0] = Int64GetDatum(WALWriteStats->dirty_writes);
+
+	result = TimestampTzGetDatum(WALWriteStats->stat_reset_timestamp);
+	if (result == 0)
+		nulls[1] = true;
+	else
+		values[1] = result;
+
+	LWLockRelease(WALWriteLock);
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(
+									  heap_form_tuple(tupdesc, values, nulls)));
+}
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 27989971db..efd1d18389 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,6 +5484,14 @@
   proargnames => '{name,blks_zeroed,blks_hit,blks_read,blks_written,blks_exists,flushes,truncates,stats_reset}',
   prosrc => 'pg_stat_get_slru' },
 
+{ oid => '8000', descr => 'statistics: information about WAL writes activity',
+  proname => 'pg_stat_get_walwrites', provolatile => 's', proparallel => 'r',
+  prorettype => 'record', proargtypes => '',
+  proallargtypes => '{int8,timestamptz}',
+  proargmodes => '{o,o}',
+  proargnames => '{dirty_writes,stats_reset}',
+  prosrc => 'pg_stat_get_walwrites' },
+
 { oid => '2978', descr => 'statistics: number of function calls',
   proname => 'pg_stat_get_function_calls', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 1387201382..94e4da6fd7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -745,6 +745,19 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * Walwrites statistics kept in the stats collector
+ */
+typedef struct PgStat_WalWritesStats
+{
+	PgStat_Counter dirty_writes;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalWritesStats;
+
+/* ----------
+ * Backend types
+ * ----------
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1260,6 +1273,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics updated in XLogWrite function
+ */
+extern PgStat_WalWritesStats * WALWriteStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1278,6 +1296,9 @@ extern int	pgstat_start(void);
 extern void pgstat_reset_all(void);
 extern void allow_immediate_pgstat_restart(void);
 
+extern Size WALWritesShmemSize(void);
+extern void WALWritesShmemInit(void);
+
 #ifdef EXEC_BACKEND
 extern void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
 #endif
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..d8adbdbb99 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2146,6 +2146,9 @@ pg_stat_wal_receiver| SELECT s.pid,
     s.conninfo
    FROM pg_stat_get_wal_receiver() s(pid, status, receive_start_lsn, receive_start_tli, written_lsn, flushed_lsn, received_tli, last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time, slot_name, sender_host, sender_port, conninfo)
   WHERE (s.pid IS NOT NULL);
+pg_stat_walwrites| SELECT a.dirty_writes,
+    a.stats_reset
+   FROM pg_stat_get_walwrites() a(dirty_writes, stats_reset);
 pg_stat_xact_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 06c4c3e476..3c04c57023 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -67,6 +67,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwrites;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index 28e412b735..21f49c9a3b 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -32,6 +32,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwrites;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#13

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#12)

Re: New statistics for tuning WAL buffer size

On 2020/08/24 21:00, Masahiro Ikeda wrote:

On 2020-08-24 20:45, Masahiro Ikeda wrote:

Hi, thanks for useful comments.

I agree to expose the number of WAL write caused by full of WAL buffers.
It's helpful when tuning wal_buffers size. Haribabu separated that number
into two fields in his patch; one is the number of WAL write by backend,
and another is by background processes and workers. But I'm not sure
how useful such separation is. I'm ok with just one field for that number.

I agree with you.ï¿½ I don't think we need to separate the numbers for foreground processes and background ones.ï¿½ WAL buffer is a single resource.ï¿½ So "Writes due to full WAL buffer are happening.ï¿½ We may be able to boost performance by increasing wal_buffers" would be enough.

I made a patch to expose the number of WAL write caused by full of WAL buffers.
I'm going to submit this patch to commitfests.

As Fujii-san and Tsunakawa-san said, it expose the total number
since I agreed that we don't need to separate the numbers for
foreground processes and background ones.

By the way, do we need to add another metrics related to WAL?
For example, is the total number of WAL writes to the buffers useful
to calculate the dirty WAL write ratio?

Is it enough as a first step?

I forgot to rebase the current master.
I've attached the rebased patch.

Thanks for the patch!

+/* ----------
+ * Backend types
+ * ----------

You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.

../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]

The contents of pg_stat_walwrites are reset when the server
is restarted. Isn't this problematic? IMO since pg_stat_walwrites
is a collected statistics view, basically its contents should be
kept even in the case of server restart.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#14

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#13)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block comment
[-Wcomment]

Thanks for the comment. I fixed.

The contents of pg_stat_walwrites are reset when the server
is restarted. Isn't this problematic? IMO since pg_stat_walwrites
is a collected statistics view, basically its contents should be
kept even in the case of server restart.

I agree your opinion.
I modified to use the statistics collector and persist the wal
statistics.

I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
I think it is better to match naming scheme with other views like
pg_stat_bgwriter,
which is for bgwriter statistics but it has the statistics related to
backend.

The pg_stat_walwriter is not security restricted now, so ordinary users
can access it.
I has the same security level as pg_stat_archiver.If you have any
comments, please let me know.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0003_pg_stat_walwriter_view.patchtext/x-diff; name=0003_pg_stat_walwriter_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d973e1149a..9ea5c9b2e2 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,13 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_walwriter</structname><indexterm><primary>pg_stat_walwriter</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-walwriter-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3264,6 +3271,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-walwriter-view">
+   <title><structname>pg_stat_walwriter</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_walwriter</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_walwriter</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-walwriter-view" xreflabel="pg_stat_walwriter">
+   <title><structname>pg_stat_walwriter</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>dirty_writes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of dirty WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4652,8 +4709,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>walwriter</literal>
+        to reset all the counters shown in the <structname>pg_stat_walwriter</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..47b148b3b5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2193,6 +2193,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Write = OldPageRqstPtr;
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
+					WalWriterStats.m_xlog_dirty_writes++;
 					LWLockRelease(WALWriteLock);
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index a2d61302f9..6b43ad61be 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1000,6 +1000,11 @@ CREATE VIEW pg_stat_progress_analyze AS
     FROM pg_stat_get_progress_info('ANALYZE') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_vacuum AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8116b23614..154cdc5ddb 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,14 @@ char	   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WalWriter global statistics counter.
+ * This counter is incremented by each XLogWrite call,
+ * both in the wal writer process and each backend.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWalWriter WalWriterStats;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -281,6 +289,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalwriterStats walwriterStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +362,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_walwriter(PgStat_MsgWalWriter *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +948,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Now, send wal writer statistics */
+	pgstat_send_walwriter();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1383,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "walwriter") == 0)
+		msg.m_resettarget = RESET_WALWRITER;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\" or \"bgwriter\" or \"walwriter\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2689,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_walwriter() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the walwriter statistics struct.
+ * ---------
+ */
+PgStat_WalwriterStats *
+pgstat_fetch_stat_walwriter(void)
+{
+	backend_read_statsfile();
+
+	return &walwriterStats;
+}
 
 /*
  * ---------
@@ -4407,6 +4437,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_walwriter() -
+ *
+ *		Send walwriter statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_walwriter(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWalWriter all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalWriterStats, &all_zeroes, sizeof(PgStat_MsgWalWriter)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalWriterStats.m_hdr, PGSTAT_MTYPE_WALWRITER);
+	pgstat_send(&WalWriterStats, sizeof(WalWriterStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalWriterStats, 0, sizeof(WalWriterStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4646,6 +4708,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WALWRITER:
+					pgstat_recv_walwriter(&msg.msg_walwriter, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4915,6 +4981,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write archiver stats struct
+	 */
+	rc = fwrite(&walwriterStats, sizeof(walwriterStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5179,6 +5251,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walwriterStats, 0, sizeof(walwriterStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5187,6 +5260,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walwriterStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5256,6 +5330,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read walwriter stats struct
+	 */
+	if (fread(&walwriterStats, 1, sizeof(walwriterStats), fpin) != sizeof(walwriterStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walwriterStats, 0, sizeof(walwriterStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5266,7 +5351,6 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		memset(&slruStats, 0, sizeof(slruStats));
 		goto done;
 	}
-
 	/*
 	 * We found an existing collector stats file. Read it and put all the
 	 * hashtable entries into place.
@@ -5620,6 +5704,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read walwriter stats struct
+	 */
+	if (fread(&walwriterStats, 1, sizeof(walwriterStats), fpin) != sizeof(walwriterStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6196,6 +6291,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WALWRITER)
+	{
+		/* Reset the walwriter statistics for the cluster. */
+		memset(&walwriterStats, 0, sizeof(walwriterStats));
+		walwriterStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6410,6 +6511,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_walwriter() -
+ *
+ *	Process a WALWRITER message.
+ * ----------
+ */
+static void
+pgstat_recv_walwriter(PgStat_MsgWalWriter *msg, int len)
+{
+	walwriterStats.xlog_dirty_writes += msg->m_xlog_dirty_writes;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 45a2757969..309cce2c64 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -243,6 +243,8 @@ WalWriterMain(void)
 		else if (left_till_hibernate > 0)
 			left_till_hibernate--;
 
+		pgstat_send_walwriter();
+
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
 		 * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..672ed9c8ca 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+Datum
+pg_stat_get_xlog_dirty_writes(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_walwriter()->xlog_dirty_writes);
+}
+
+Datum
+pg_stat_get_walwriter_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_walwriter()->stat_reset_timestamp);
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1dd325e0e6..131709ebfe 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5476,6 +5476,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '8000', descr => 'statistics: number of dirty WAL writes',
+  proname => 'pg_stat_get_xlog_dirty_writes', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_xlog_dirty_writes' },
+{ oid => '8001', descr => 'statistics: last reset for the walwriter',
+  proname => 'pg_stat_get_walwriter_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_walwriter_stat_reset_time' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 807a9c1edf..873da60e70 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WALWRITER,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WALWRITER
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,17 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWalWriter			Sent by the walwriter to update statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWalWriter
+{
+	PgStat_MsgHdr m_hdr;
+
+	PgStat_Counter m_xlog_dirty_writes;	/* number of WAL write caused by full of WAL buffers */
+} PgStat_MsgWalWriter;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +609,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWalWriter msg_walwriter;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -745,6 +759,20 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * Walwriter statistics kept in the stats collector
+ */
+typedef struct PgStat_WalwriterStats
+{
+	PgStat_Counter xlog_dirty_writes;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalwriterStats;
+
+/* ----------
+ * Backend types
+ * ----------
+ */
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1261,6 +1289,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1460,6 +1493,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_walwriter(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1474,6 +1508,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalwriterStats *pgstat_fetch_stat_walwriter(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..4c7e6c5316 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2146,6 +2146,8 @@ pg_stat_wal_receiver| SELECT s.pid,
     s.conninfo
    FROM pg_stat_get_wal_receiver() s(pid, status, receive_start_lsn, receive_start_tli, written_lsn, flushed_lsn, received_tli, last_msg_send_time, last_msg_receipt_time, latest_end_lsn, latest_end_time, slot_name, sender_host, sender_port, conninfo)
   WHERE (s.pid IS NOT NULL);
+pg_stat_walwriter| SELECT pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+    pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
 pg_stat_xact_all_tables| SELECT c.oid AS relid,
     n.nspname AS schemaname,
     c.relname,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..1e828ccfde 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..ab1a416c13 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#15

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#14)

Re: New statistics for tuning WAL buffer size

On 2020/09/02 18:56, Masahiro Ikeda wrote:

+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
Thanks for the comment. I fixed.

Thanks for the fix! But why are those comments necessary?

The contents of pg_stat_walwrites are reset when the server
is restarted. Isn't this problematic? IMO since pg_stat_walwrites
is a collected statistics view, basically its contents should be
kept even in the case of server restart.

I agree your opinion.
I modified to use the statistics collector and persist the wal statistics.

I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
I think it is better to match naming scheme with other views like pg_stat_bgwriter,
which is for bgwriter statistics but it has the statistics related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
I has the same security level as pg_stat_archiver.If you have any comments, please let me know.

+ <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

+/* ----------
+ * PgStat_MsgWalWriter			Sent by the walwriter to update statistics.

This comment seems not accurate because backends also send it.

+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;

This comment seems not right because the counter is not updated in XLogWrite().

+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;

What about changing this comment to "There must be only one record"?

+ WalWriterStats.m_xlog_dirty_writes++;
LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
with WALWriteLock, isn't it better to increment that after releasing the lock?

+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
  CREATE VIEW pg_stat_progress_vacuum AS

In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.

}
-
/*
* We found an existing collector stats file. Read it and put all the

You seem to accidentally have removed the empty line here.

-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\" or \"bgwriter\" or \"walwriter\".")));

There are two "or" in the message, but the former should be replaced with ","?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#16

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Fujii Masao (#15)

RE: New statistics for tuning WAL buffer size

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
I think it is better to match naming scheme with other views like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity. Likewise, pg_stat_walwriter leads to misunderstanding because its information is not limited to WAL writer.

How about simply pg_stat_wal? In the future, we may want to include WAL reads in this view, e.g. reading undo logs in zheap.

Regards
Takayuki Tsunakawa

#17

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: tsunakawa.takay@fujitsu.com (#16)

Re: New statistics for tuning WAL buffer size

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
I think it is better to match naming scheme with other views like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains the backends' activity. Likewise, pg_stat_walwriter leads to misunderstanding because its information is not limited to WAL writer.

How about simply pg_stat_wal? In the future, we may want to include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#18

Magnus Hagander

magnus@hagander.net

over 5 years ago

In reply to: Fujii Masao (#17)

Re: New statistics for tuning WAL buffer size

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao <masao.fujii@oss.nttdata.com>
wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to pg_stat_walwriter.
I think it is better to match naming scheme with other views like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics related to

backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains the

backends' activity. Likewise, pg_stat_walwriter leads to misunderstanding
because its information is not limited to WAL writer.

How about simply pg_stat_wal? In the future, we may want to include WAL

reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now -- it
became even more apparent when the checkpointer was split out to it's own
process, and that's not exactly a recent change. And it had allocs in it
from day one...

I think naming it for what the data in it is ("wal") rather than which
process deals with it ("walwriter") is correct, unless the statistics can
be known to only *ever* affect one type of process. (And then different
processes can affect different columns in the view). As a general rule --
and that's from what I can tell exactly what's being proposed.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

#19

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#17)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:

On 2020/09/02 18:56, Masahiro Ikeda wrote:
+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block comment
[-Wcomment]
Thanks for the comment. I fixed.
Thanks for the fix! But why are those comments necessary?

Sorry about that. This comment is not necessary.
I removed it.

The pg_stat_walwriter is not security restricted now, so ordinary
users can access it.
It has the same security level as pg_stat_archiver. If you have any
comments, please let me know.

+ <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".

+/* ----------
+ * PgStat_MsgWalWriter			Sent by the walwriter to update statistics.
This comment seems not accurate because backends also send it.
+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
This comment seems not right because the counter is not updated in
XLogWrite().

Right. I fixed it to "Sent by each backend and background workers to
update WAL statistics."
In the future, other statistics will be included so I remove the
function's name.

+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
What about changing this comment to "There must be only one record"?

Thanks, I fixed.

+ WalWriterStats.m_xlog_dirty_writes++;
LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
with WALWriteLock, isn't it better to increment that after releasing
the lock?

Thanks, I fixed.

+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
CREATE VIEW pg_stat_progress_vacuum AS
In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not
pg_stat_progress_analyze.

OK, I fixed it.

}
-
/*
* We found an existing collector stats file. Read it and put all the

You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.

-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\" or \"bgwriter\" or
\"walwriter\".")));
There are two "or" in the message, but the former should be replaced
with ","?

Thanks, I fixed.

On 2020-09-05 18:40, Magnus Hagander wrote:

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to

pg_stat_walwriter.

I think it is better to match naming scheme with other views

like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics

related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains

the backends' activity. Likewise, pg_stat_walwriter leads to
misunderstanding because its information is not limited to WAL
writer.

How about simply pg_stat_wal? In the future, we may want to

include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now --
it became even more apparent when the checkpointer was split out to
it's own process, and that's not exactly a recent change. And it had
allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than which
process deals with it ("walwriter") is correct, unless the statistics
can be known to only *ever* affect one type of process. (And then
different processes can affect different columns in the view). As a
general rule -- and that's from what I can tell exactly what's being
proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".

I fixed the code to send the WAL statistics from not only backend and
walwriter
but also checkpointer, walsender and autovacuum worker.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0004_pg_stat_walwriter_view.patchtext/x-diff; charset=us-ascii; name=0004_pg_stat_walwriter_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..6d56912221 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,13 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-wal-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3280,6 +3287,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-view">
+   <title><structname>pg_stat_wal</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-wal-view" xreflabel="pg_stat_wal">
+   <title><structname>pg_stat_wal</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_buffers_full</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4668,8 +4725,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>
+        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 92389e6666..5c97da49ae 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 						 onerel->rd_rel->relisshared,
 						 Max(new_live_tuples, 0),
 						 vacrelstats->new_dead_tuples);
+	pgstat_send_wal();
 	pgstat_progress_end_command();
 
 	/* and log the action if appropriate */
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..b485ff49f9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2194,6 +2194,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
 					LWLockRelease(WALWriteLock);
+					WalStats.m_wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
 				/* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..643445c189 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -979,6 +979,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_wal AS
+    SELECT
+        pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+        pg_stat_get_wal_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 069e27e427..450c19968b 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -238,6 +238,9 @@ BackgroundWriterMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		if (FirstCallSinceLastCheckpoint())
 		{
 			/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 624a3238b8..b82ba54523 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -494,6 +494,9 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send wal statistics to the stats collector. */
+		pgstat_send_wal();
+
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
 		 * checkpoint without sleeping.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 5f4b168fd1..e23446179f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,13 @@ char	   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -281,6 +288,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +361,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +947,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Send wal statistics */
+	pgstat_send_wal();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1382,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "wal") == 0)
+		msg.m_resettarget = RESET_WAL;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\" or \"wal\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2688,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_wal() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the wal statistics struct.
+ * ---------
+ */
+PgStat_WalStats *
+pgstat_fetch_stat_wal(void)
+{
+	backend_read_statsfile();
+
+	return &walStats;
+}
 
 /*
  * ---------
@@ -4419,6 +4448,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_wal() -
+ *
+ *		Send wal statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_wal(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
+	pgstat_send(&WalStats, sizeof(WalStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4658,6 +4719,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WAL:
+					pgstat_recv_wal(&msg.msg_wal, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4927,6 +4992,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write wal stats struct
+	 */
+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5186,11 +5257,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver and wal statistics so they start from zero in
 	 * case we can't load an existing statsfile.
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walStats, 0, sizeof(walStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5199,6 +5271,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5268,6 +5341,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walStats, 0, sizeof(walStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5632,6 +5716,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6208,6 +6303,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WAL)
+	{
+		/* Reset the wal statistics for the cluster. */
+		memset(&walStats, 0, sizeof(walStats));
+		walStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6422,6 +6523,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_wal() -
+ *
+ *	Process a WAL message.
+ * ----------
+ */
+static void
+pgstat_recv_wal(PgStat_MsgWal *msg, int len)
+{
+	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 45a2757969..8fead4ca51 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -243,6 +243,9 @@ WalWriterMain(void)
 		else if (left_till_hibernate > 0)
 			left_till_hibernate--;
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
 		 * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3f756b470a..9ae7b9d6e6 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
 		else
 			RecentFlushPtr = GetXLogReplayRecPtr(NULL);
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		/*
 		 * If postmaster asked us to stop, don't wait anymore.
 		 *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..aa41330796 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+Datum
+pg_stat_get_wal_buffers_full(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_wal()->wal_buffers_full);
+}
+
+Datum
+pg_stat_get_wal_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_wal()->stat_reset_timestamp);
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 687509ba92..13cc892abc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,6 +5484,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '8000', descr => 'statistics: number of WAL writes when the wal buffers are full',
+  proname => 'pg_stat_get_wal_buffers_full', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_wal_buffers_full' },
+{ oid => '8001', descr => 'statistics: last reset for the walwriter',
+  proname => 'pg_stat_get_wal_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_wal_stat_reset_time' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..eb706068ba 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,16 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWal			Sent by each backend and background workers to update WAL statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWal
+{
+	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+} PgStat_MsgWal;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +608,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -745,6 +758,15 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * WAL statistics kept in the stats collector
+ */
+typedef struct PgStat_WalStats
+{
+	PgStat_Counter wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalStats;
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1265,6 +1287,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics counter is updated by backend and background workers
+ */
+extern PgStat_MsgWal WalStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1464,6 +1491,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_wal(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1478,6 +1506,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..1e4ac4432e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2129,6 +2129,8 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
+pg_stat_wal| SELECT pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+    pg_stat_get_wal_stat_reset_time() AS stats_reset;
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..81bdacf59d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..b9b875bc6a 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#20

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#19)

Re: New statistics for tuning WAL buffer size

On 2020/09/07 9:58, Masahiro Ikeda wrote:

Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:
On 2020/09/02 18:56, Masahiro Ikeda wrote:
+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
Thanks for the comment. I fixed.
Thanks for the fix! But why are those comments necessary?
Sorry about that. This comment is not necessary.
I removed it.

The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
It has the same security level as pg_stat_archiver. If you have any comments, please let me know.

+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".
+/* ----------
+ * PgStat_MsgWalWriterï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Sent by the walwriter to update statistics.
This comment seems not accurate because backends also send it.
+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
This comment seems not right because the counter is not updated in XLogWrite().
Right. I fixed it to "Sent by each backend and background workers to update WAL statistics."
In the future, other statistics will be included so I remove the function's name.
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
What about changing this comment to "There must be only one record"?
Thanks, I fixed.

+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ WalWriterStats.m_xlog_dirty_writes++;
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
with WALWriteLock, isn't it better to increment that after releasing the lock?

Thanks, I fixed.
+CREATE VIEW pg_stat_walwriter AS
+ï¿½ï¿½ï¿½ SELECT
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
ï¿½CREATE VIEW pg_stat_progress_vacuum AS
In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.
OK, I fixed it.

ï¿½ï¿½ï¿½ï¿½ }
-
ï¿½ï¿½ï¿½ï¿½ /*
ï¿½ï¿½ï¿½ï¿½ï¿½ * We found an existing collector stats file. Read it and put all the

You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.
-ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ errhint("Target must be \"archiver\" or \"bgwriter\".")));
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ errhint("Target must be \"archiver\" or \"bgwriter\" or
\"walwriter\".")));
There are two "or" in the message, but the former should be replaced with ","?
Thanks, I fixed.

On 2020-09-05 18:40, Magnus Hagander wrote:

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to

pg_stat_walwriter.

I think it is better to match naming scheme with other views

like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics

related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains

the backends' activity.ï¿½ Likewise, pg_stat_walwriter leads to
misunderstanding because its information is not limited to WAL
writer.

How about simply pg_stat_wal?ï¿½ In the future, we may want to

include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now --
it became even more apparent when the checkpointer was split out to
it's own process, and that's not exactly a recent change. And it had
allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than which
process deals with it ("walwriter") is correct, unless the statistics
can be known to only *ever* affect one type of process. (And then
different processes can affect different columns in the view). As a
general rule -- and that's from what I can tell exactly what's being
proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".

I fixed the code to send the WAL statistics from not only backend and walwriter
but also checkpointer, walsender and autovacuum worker.

Good point! Thanks for updating the patch!

@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
onerel->rd_rel->relisshared,
Max(new_live_tuples, 0),
vacrelstats->new_dead_tuples);
+ pgstat_send_wal();

I guess that you changed heap_vacuum_rel() as above so that autovacuum
workers can send WAL stats. But heap_vacuum_rel() can be called by
the processes (e.g., backends) other than autovacuum workers? Also
what happens if autovacuum workers just do ANALYZE only? In that case,
heap_vacuum_rel() may not be called.

Currently autovacuum worker reports the stats at the exit via
pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
is not the process that basically keeps running during the service. It exits
after it does vacuum or analyze. So ISTM that it's not bad to report the stats
only at the exit, in autovacuum worker case. There is no need to add extra
code for WAL stats report by autovacuum worker. Thought?

@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);

+ /* Send wal statistics */
+ pgstat_send_wal();

AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData()
and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal()
into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best
for that purpose.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#21

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#20)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-09-07 16:19, Fujii Masao wrote:

On 2020/09/07 9:58, Masahiro Ikeda wrote:
Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:
On 2020/09/02 18:56, Masahiro Ikeda wrote:
+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block
comment [-Wcomment]
Thanks for the comment. I fixed.
Thanks for the fix! But why are those comments necessary?
Sorry about that. This comment is not necessary.
I removed it.

The pg_stat_walwriter is not security restricted now, so ordinary
users can access it.
It has the same security level as pg_stat_archiver. If you have any
comments, please let me know.

+ <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".
+/* ----------
+ * PgStat_MsgWalWriter            Sent by the walwriter to update 
statistics.
This comment seems not accurate because backends also send it.
+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
This comment seems not right because the counter is not updated in
XLogWrite().
Right. I fixed it to "Sent by each backend and background workers to
update WAL statistics."
In the future, other statistics will be included so I remove the
function's name.
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
What about changing this comment to "There must be only one record"?
Thanks, I fixed.

+ WalWriterStats.m_xlog_dirty_writes++;
LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
with WALWriteLock, isn't it better to increment that after releasing
the lock?

Thanks, I fixed.
+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_vacuum AS
In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not
pg_stat_progress_analyze.
OK, I fixed it.

     }
-
     /*
      * We found an existing collector stats file. Read it and put
all the

You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.
-                 errhint("Target must be \"archiver\" or 
\"bgwriter\".")));
+                 errhint("Target must be \"archiver\" or 
\"bgwriter\" or
\"walwriter\".")));
There are two "or" in the message, but the former should be replaced
with ","?
Thanks, I fixed.

On 2020-09-05 18:40, Magnus Hagander wrote:

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to

pg_stat_walwriter.

I think it is better to match naming scheme with other views

like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics

related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains

the backends' activity. Likewise, pg_stat_walwriter leads to
misunderstanding because its information is not limited to WAL
writer.

How about simply pg_stat_wal? In the future, we may want to

include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now --
it became even more apparent when the checkpointer was split out to
it's own process, and that's not exactly a recent change. And it had
allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than
which
process deals with it ("walwriter") is correct, unless the statistics
can be known to only *ever* affect one type of process. (And then
different processes can affect different columns in the view). As a
general rule -- and that's from what I can tell exactly what's being
proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".

I fixed the code to send the WAL statistics from not only backend and
walwriter
but also checkpointer, walsender and autovacuum worker.
Good point! Thanks for updating the patch!

@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams
*params,
onerel->rd_rel->relisshared,
Max(new_live_tuples, 0),
vacrelstats->new_dead_tuples);
+ pgstat_send_wal();

I guess that you changed heap_vacuum_rel() as above so that autovacuum
workers can send WAL stats. But heap_vacuum_rel() can be called by
the processes (e.g., backends) other than autovacuum workers? Also
what happens if autovacuum workers just do ANALYZE only? In that case,
heap_vacuum_rel() may not be called.

Currently autovacuum worker reports the stats at the exit via
pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
is not the process that basically keeps running during the service. It
exits
after it does vacuum or analyze. So ISTM that it's not bad to report
the stats
only at the exit, in autovacuum worker case. There is no need to add
extra
code for WAL stats report by autovacuum worker. Thought?

Thanks, I understood. I removed this code.

@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
else
RecentFlushPtr = GetXLogReplayRecPtr(NULL);
+		/* Send wal statistics */
+		pgstat_send_wal();
AFAIR logical walsender uses three loops in WalSndLoop(),
WalSndWriteData()
and WalSndWaitForWal(). But could you tell me why added
pgstat_send_wal()
into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the
best
for that purpose.

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Is it better to move it in WalSndLoop() like the attached patch?

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0005_pg_stat_wal_view.patchtext/x-diff; name=0005_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..6d56912221 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,13 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-wal-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3280,6 +3287,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-view">
+   <title><structname>pg_stat_wal</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-wal-view" xreflabel="pg_stat_wal">
+   <title><structname>pg_stat_wal</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_buffers_full</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4668,8 +4725,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>
+        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..b485ff49f9 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2194,6 +2194,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
 					LWLockRelease(WALWriteLock);
+					WalStats.m_wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
 				/* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..643445c189 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -979,6 +979,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_wal AS
+    SELECT
+        pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+        pg_stat_get_wal_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 069e27e427..450c19968b 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -238,6 +238,9 @@ BackgroundWriterMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		if (FirstCallSinceLastCheckpoint())
 		{
 			/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 624a3238b8..b82ba54523 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -494,6 +494,9 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send wal statistics to the stats collector. */
+		pgstat_send_wal();
+
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
 		 * checkpoint without sleeping.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 5f4b168fd1..e23446179f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,13 @@ char	   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -281,6 +288,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +361,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +947,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Send wal statistics */
+	pgstat_send_wal();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1382,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "wal") == 0)
+		msg.m_resettarget = RESET_WAL;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\" or \"wal\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2688,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_wal() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the wal statistics struct.
+ * ---------
+ */
+PgStat_WalStats *
+pgstat_fetch_stat_wal(void)
+{
+	backend_read_statsfile();
+
+	return &walStats;
+}
 
 /*
  * ---------
@@ -4419,6 +4448,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_wal() -
+ *
+ *		Send wal statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_wal(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
+	pgstat_send(&WalStats, sizeof(WalStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4658,6 +4719,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WAL:
+					pgstat_recv_wal(&msg.msg_wal, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4927,6 +4992,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write wal stats struct
+	 */
+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5186,11 +5257,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver and wal statistics so they start from zero in
 	 * case we can't load an existing statsfile.
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walStats, 0, sizeof(walStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5199,6 +5271,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5268,6 +5341,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walStats, 0, sizeof(walStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5632,6 +5716,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6208,6 +6303,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WAL)
+	{
+		/* Reset the wal statistics for the cluster. */
+		memset(&walStats, 0, sizeof(walStats));
+		walStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6422,6 +6523,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_wal() -
+ *
+ *	Process a WAL message.
+ * ----------
+ */
+static void
+pgstat_recv_wal(PgStat_MsgWal *msg, int len)
+{
+	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 45a2757969..8fead4ca51 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -243,6 +243,9 @@ WalWriterMain(void)
 		else if (left_till_hibernate > 0)
 			left_till_hibernate--;
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
 		 * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3f756b470a..548929762e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2324,6 +2324,9 @@ WalSndLoop(WalSndSendDataCallback send_data)
 				WalSndDone(send_data);
 		}
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		/* Check for replication timeout. */
 		WalSndCheckTimeOut();
 
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..aa41330796 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+Datum
+pg_stat_get_wal_buffers_full(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_wal()->wal_buffers_full);
+}
+
+Datum
+pg_stat_get_wal_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_wal()->stat_reset_timestamp);
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 687509ba92..13cc892abc 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,6 +5484,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '8000', descr => 'statistics: number of WAL writes when the wal buffers are full',
+  proname => 'pg_stat_get_wal_buffers_full', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_wal_buffers_full' },
+{ oid => '8001', descr => 'statistics: last reset for the walwriter',
+  proname => 'pg_stat_get_wal_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_wal_stat_reset_time' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..eb706068ba 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,16 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWal			Sent by each backend and background workers to update WAL statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWal
+{
+	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+} PgStat_MsgWal;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +608,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -745,6 +758,15 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * WAL statistics kept in the stats collector
+ */
+typedef struct PgStat_WalStats
+{
+	PgStat_Counter wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalStats;
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1265,6 +1287,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics counter is updated by backend and background workers
+ */
+extern PgStat_MsgWal WalStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1464,6 +1491,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_wal(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1478,6 +1506,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..1e4ac4432e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2129,6 +2129,8 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
+pg_stat_wal| SELECT pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+    pg_stat_get_wal_stat_reset_time() AS stats_reset;
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..81bdacf59d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..b9b875bc6a 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#22

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#21)

Re: New statistics for tuning WAL buffer size

On 2020/09/09 13:57, Masahiro Ikeda wrote:

On 2020-09-07 16:19, Fujii Masao wrote:
On 2020/09/07 9:58, Masahiro Ikeda wrote:
Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:
On 2020/09/02 18:56, Masahiro Ikeda wrote:
+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
Thanks for the comment. I fixed.
Thanks for the fix! But why are those comments necessary?
Sorry about that. This comment is not necessary.
I removed it.

The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
It has the same security level as pg_stat_archiver. If you have any comments, please let me know.

+ <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".
+/* ----------
+ * PgStat_MsgWalWriter            Sent by the walwriter to update statistics.
This comment seems not accurate because backends also send it.
+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
This comment seems not right because the counter is not updated in XLogWrite().
Right. I fixed it to "Sent by each backend and background workers to update WAL statistics."
In the future, other statistics will be included so I remove the function's name.
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
What about changing this comment to "There must be only one record"?
Thanks, I fixed.

+ WalWriterStats.m_xlog_dirty_writes++;
LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
with WALWriteLock, isn't it better to increment that after releasing the lock?

Thanks, I fixed.
+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_vacuum AS
In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.
OK, I fixed it.

     }
-
     /*
      * We found an existing collector stats file. Read it and put all the

You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.
-                 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+                 errhint("Target must be \"archiver\" or \"bgwriter\" or
\"walwriter\".")));
There are two "or" in the message, but the former should be replaced with ","?
Thanks, I fixed.

On 2020-09-05 18:40, Magnus Hagander wrote:

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to

pg_stat_walwriter.

I think it is better to match naming scheme with other views

like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics

related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains

the backends' activity. Likewise, pg_stat_walwriter leads to
misunderstanding because its information is not limited to WAL
writer.

How about simply pg_stat_wal? In the future, we may want to

include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now --
it became even more apparent when the checkpointer was split out to
it's own process, and that's not exactly a recent change. And it had
allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than which
process deals with it ("walwriter") is correct, unless the statistics
can be known to only *ever* affect one type of process. (And then
different processes can affect different columns in the view). As a
general rule -- and that's from what I can tell exactly what's being
proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".

I fixed the code to send the WAL statistics from not only backend and walwriter
but also checkpointer, walsender and autovacuum worker.
Good point! Thanks for updating the patch!

@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
                          onerel->rd_rel->relisshared,
                          Max(new_live_tuples, 0),
                          vacrelstats->new_dead_tuples);
+    pgstat_send_wal();

I guess that you changed heap_vacuum_rel() as above so that autovacuum
workers can send WAL stats. But heap_vacuum_rel() can be called by
the processes (e.g., backends) other than autovacuum workers? Also
what happens if autovacuum workers just do ANALYZE only? In that case,
heap_vacuum_rel() may not be called.

Currently autovacuum worker reports the stats at the exit via
pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
is not the process that basically keeps running during the service. It exits
after it does vacuum or analyze. So ISTM that it's not bad to report the stats
only at the exit, in autovacuum worker case. There is no need to add extra
code for WAL stats report by autovacuum worker. Thought?
Thanks, I understood. I removed this code.

@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
         else
             RecentFlushPtr = GetXLogReplayRecPtr(NULL);
+        /* Send wal statistics */
+        pgstat_send_wal();

AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData()
and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal()
into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best
for that purpose.

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.

Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the second argument opportunistic=true, so in this case WAL write by wal_buffers full seems to never happen. Right? If this understanding is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal(). Probably also walwriter doesn't need to do that.

The logical rep walsender can generate WAL and call AdvanceXLInsertBuffer() when it executes the replication commands like CREATE_REPLICATION_SLOT. But this case is already covered by pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(), with your patch. So no more calls to pgstat_send_wal() seems necessary for logical rep walsender.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#23

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Masahiro Ikeda (#21)

Re: New statistics for tuning WAL buffer size

Hello.

At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Is it better to move it in WalSndLoop() like the attached patch?

By the way, we are counting some wal-related numbers in
pgWalUsage.(bytes, records, fpi). Since now that we are going to have
a new view related to WAL statistics, wouln't it be more useful to
show them together in the view?

(Another reason to propose this is that a substantially one-column
table may look not-great..)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#24

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#23)

Re: New statistics for tuning WAL buffer size

On 2020/09/11 12:17, Kyotaro Horiguchi wrote:

Hello.

At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Is it better to move it in WalSndLoop() like the attached patch?

By the way, we are counting some wal-related numbers in
pgWalUsage.(bytes, records, fpi). Since now that we are going to have
a new view related to WAL statistics, wouln't it be more useful to
show them together in the view?

Probably yes. But IMO it's better to commit the current patch first, and then add those stats into the view after confirming exposing them is useful.

BTW, to expose the total WAL bytes, I think it's better to just save the LSN at when pg_stat_wal is reset rather than counting pgWalUsage.bytes. If we do that, we can easily total WAL bytes by subtracting that LSN from the latest LSN. Also saving the LSN at the reset timing causes obviously less overhead than counting pgWalUsage.bytes.

(Another reason to propose this is that a substantially one-column
table may look not-great..)

I'm ok with such "small" view. But if this is really problem, I'm ok to expose only functions pg_stat_get_wal_buffers_full() and pg_stat_get_wal_stat_reset_time(), without the view, at first.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#25

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Fujii Masao (#24)

Re: New statistics for tuning WAL buffer size

At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/09/11 12:17, Kyotaro Horiguchi wrote:

Hello.
At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Is it better to move it in WalSndLoop() like the attached patch?

By the way, we are counting some wal-related numbers in
pgWalUsage.(bytes, records, fpi). Since now that we are going to have
a new view related to WAL statistics, wouln't it be more useful to
show them together in the view?

Probably yes. But IMO it's better to commit the current patch first,
and then add those stats into the view after confirming exposing them
is useful.

I'm fine with that.

BTW, to expose the total WAL bytes, I think it's better to just save
the LSN at when pg_stat_wal is reset rather than counting
pgWalUsage.bytes. If we do that, we can easily total WAL bytes by
subtracting that LSN from the latest LSN. Also saving the LSN at the
reset timing causes obviously less overhead than counting
pgWalUsage.bytes.

pgWalUsage is always counting so it doesn't add any overhead. But
since it cannot be reset, the value needs to be saved at reset time
like LSN. I don't mind either way we take from performance
perspective.

(Another reason to propose this is that a substantially one-column
table may look not-great..)

I'm ok with such "small" view. But if this is really problem, I'm ok
to expose only functions pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time(), without the view, at first.

I don't mind that we have such small views as far as it is promised to
grow up:p

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#26

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#25)

Re: New statistics for tuning WAL buffer size

On 2020/09/11 16:54, Kyotaro Horiguchi wrote:

At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/09/11 12:17, Kyotaro Horiguchi wrote:

Hello.
At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Is it better to move it in WalSndLoop() like the attached patch?

By the way, we are counting some wal-related numbers in
pgWalUsage.(bytes, records, fpi). Since now that we are going to have
a new view related to WAL statistics, wouln't it be more useful to
show them together in the view?

Probably yes. But IMO it's better to commit the current patch first,
and then add those stats into the view after confirming exposing them
is useful.

I'm fine with that.

BTW, to expose the total WAL bytes, I think it's better to just save
the LSN at when pg_stat_wal is reset rather than counting
pgWalUsage.bytes. If we do that, we can easily total WAL bytes by
subtracting that LSN from the latest LSN. Also saving the LSN at the
reset timing causes obviously less overhead than counting
pgWalUsage.bytes.

pgWalUsage is always counting so it doesn't add any overhead.

Yes. And I'm a bit concerned about the overhead by frequent message sent for WAL bytes to the stats collector.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#27

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#22)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-09-11 01:40, Fujii Masao wrote:

On 2020/09/09 13:57, Masahiro Ikeda wrote:
On 2020-09-07 16:19, Fujii Masao wrote:
On 2020/09/07 9:58, Masahiro Ikeda wrote:
Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:
On 2020/09/02 18:56, Masahiro Ikeda wrote:
+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block
comment [-Wcomment]
Thanks for the comment. I fixed.
Thanks for the fix! But why are those comments necessary?
Sorry about that. This comment is not necessary.
I removed it.

The pg_stat_walwriter is not security restricted now, so ordinary
users can access it.
It has the same security level as pg_stat_archiver. If you have
any comments, please let me know.

+ <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".
+/* ----------
+ * PgStat_MsgWalWriter            Sent by the walwriter to update 
statistics.
This comment seems not accurate because backends also send it.
+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
This comment seems not right because the counter is not updated in
XLogWrite().
Right. I fixed it to "Sent by each backend and background workers to
update WAL statistics."
In the future, other statistics will be included so I remove the
function's name.
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
What about changing this comment to "There must be only one
record"?
Thanks, I fixed.

+ WalWriterStats.m_xlog_dirty_writes++;
LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be
protected
with WALWriteLock, isn't it better to increment that after
releasing the lock?

Thanks, I fixed.
+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_vacuum AS
In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not
pg_stat_progress_analyze.
OK, I fixed it.

     }
-
     /*
      * We found an existing collector stats file. Read it and put
all the

You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.
-                 errhint("Target must be \"archiver\" or 
\"bgwriter\".")));
+                 errhint("Target must be \"archiver\" or 
\"bgwriter\" or
\"walwriter\".")));
There are two "or" in the message, but the former should be
replaced with ","?
Thanks, I fixed.

On 2020-09-05 18:40, Magnus Hagander wrote:

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to

pg_stat_walwriter.

I think it is better to match naming scheme with other views

like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics

related to backend.

I prefer the view name pg_stat_walwriter for the consistency
with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains

the backends' activity. Likewise, pg_stat_walwriter leads to
misunderstanding because its information is not limited to WAL
writer.

How about simply pg_stat_wal? In the future, we may want to

include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now
--
it became even more apparent when the checkpointer was split out to
it's own process, and that's not exactly a recent change. And it
had
allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than
which
process deals with it ("walwriter") is correct, unless the
statistics
can be known to only *ever* affect one type of process. (And then
different processes can affect different columns in the view). As a
general rule -- and that's from what I can tell exactly what's
being
proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".

I fixed the code to send the WAL statistics from not only backend
and walwriter
but also checkpointer, walsender and autovacuum worker.
Good point! Thanks for updating the patch!

@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams
*params,
                          onerel->rd_rel->relisshared,
                          Max(new_live_tuples, 0),
                          vacrelstats->new_dead_tuples);
+    pgstat_send_wal();

I guess that you changed heap_vacuum_rel() as above so that
autovacuum
workers can send WAL stats. But heap_vacuum_rel() can be called by
the processes (e.g., backends) other than autovacuum workers? Also
what happens if autovacuum workers just do ANALYZE only? In that
case,
heap_vacuum_rel() may not be called.

Currently autovacuum worker reports the stats at the exit via
pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
is not the process that basically keeps running during the service.
It exits
after it does vacuum or analyze. So ISTM that it's not bad to report
the stats
only at the exit, in autovacuum worker case. There is no need to add
extra
code for WAL stats report by autovacuum worker. Thought?
Thanks, I understood. I removed this code.

@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
         else
             RecentFlushPtr = GetXLogReplayRecPtr(NULL);
+        /* Send wal statistics */
+        pgstat_send_wal();

AFAIR logical walsender uses three loops in WalSndLoop(),
WalSndWriteData()
and WalSndWaitForWal(). But could you tell me why added
pgstat_send_wal()
into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is
the best
for that purpose.

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
second argument opportunistic=true, so in this case WAL write by
wal_buffers full seems to never happen. Right? If this understanding
is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
Probably also walwriter doesn't need to do that.

The logical rep walsender can generate WAL and call
AdvanceXLInsertBuffer() when it executes the replication commands like
CREATE_REPLICATION_SLOT. But this case is already covered by
pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
with your patch. So no more calls to pgstat_send_wal() seems necessary
for logical rep walsender.

Thanks for your reviews. I didn't notice that.
I updated the patches.

On 2020-09-11 17:13, Fujii Masao wrote:

On 2020/09/11 16:54, Kyotaro Horiguchi wrote:

At Fri, 11 Sep 2020 13:48:49 +0900, Fujii Masao
<masao.fujii@oss.nttdata.com> wrote in

On 2020/09/11 12:17, Kyotaro Horiguchi wrote:

Hello.
At Wed, 09 Sep 2020 13:57:37 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Is it better to move it in WalSndLoop() like the attached patch?

By the way, we are counting some wal-related numbers in
pgWalUsage.(bytes, records, fpi). Since now that we are going to
have
a new view related to WAL statistics, wouln't it be more useful to
show them together in the view?

Probably yes. But IMO it's better to commit the current patch first,
and then add those stats into the view after confirming exposing them
is useful.

I'm fine with that.

BTW, to expose the total WAL bytes, I think it's better to just save
the LSN at when pg_stat_wal is reset rather than counting
pgWalUsage.bytes. If we do that, we can easily total WAL bytes by
subtracting that LSN from the latest LSN. Also saving the LSN at the
reset timing causes obviously less overhead than counting
pgWalUsage.bytes.

pgWalUsage is always counting so it doesn't add any overhead.

Yes. And I'm a bit concerned about the overhead by frequent message
sent for WAL bytes to the stats collector.

Thanks for the comments.
I agree that we need to add more wal-related statistics
after this patch is committed.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0006_pg_stat_wal_view.patchtext/x-diff; name=0006_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..6d56912221 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,13 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-wal-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3280,6 +3287,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-view">
+   <title><structname>pg_stat_wal</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-wal-view" xreflabel="pg_stat_wal">
+   <title><structname>pg_stat_wal</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_buffers_full</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4668,8 +4725,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>
+        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index a38371a64f..2047a68a86 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2194,6 +2194,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
 					LWLockRelease(WALWriteLock);
+					WalStats.m_wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
 				/* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..643445c189 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -979,6 +979,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_wal AS
+    SELECT
+        pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+        pg_stat_get_wal_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index c96568149f..d13fe63615 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -252,6 +252,9 @@ BackgroundWriterMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		if (FirstCallSinceLastCheckpoint())
 		{
 			/*
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 45f5deca72..25817b0789 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -508,6 +508,9 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send wal statistics to the stats collector. */
+		pgstat_send_wal();
+
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
 		 * checkpoint without sleeping.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..7127beca66 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,13 @@ char	   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -281,6 +288,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +361,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +947,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Send wal statistics */
+	pgstat_send_wal();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1382,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "wal") == 0)
+		msg.m_resettarget = RESET_WAL;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\" or \"wal\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2688,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_wal() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the wal statistics struct.
+ * ---------
+ */
+PgStat_WalStats *
+pgstat_fetch_stat_wal(void)
+{
+	backend_read_statsfile();
+
+	return &walStats;
+}
 
 /*
  * ---------
@@ -4419,6 +4448,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_wal() -
+ *
+ *		Send wal statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_wal(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
+	pgstat_send(&WalStats, sizeof(WalStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4658,6 +4719,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WAL:
+					pgstat_recv_wal(&msg.msg_wal, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4927,6 +4992,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write wal stats struct
+	 */
+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5186,11 +5257,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver and wal statistics so they start from zero in
 	 * case we can't load an existing statsfile.
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walStats, 0, sizeof(walStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5199,6 +5271,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5268,6 +5341,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walStats, 0, sizeof(walStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5633,6 +5717,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6213,6 +6308,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WAL)
+	{
+		/* Reset the wal statistics for the cluster. */
+		memset(&walStats, 0, sizeof(walStats));
+		walStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6427,6 +6528,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_wal() -
+ *
+ *	Process a WAL message.
+ * ----------
+ */
+static void
+pgstat_recv_wal(PgStat_MsgWal *msg, int len)
+{
+	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 358c0916ac..617fc7915a 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -257,6 +257,9 @@ WalWriterMain(void)
 		else if (left_till_hibernate > 0)
 			left_till_hibernate--;
 
+		/* Send wal statistics */
+		pgstat_send_wal();
+
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
 		 * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..aa41330796 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+Datum
+pg_stat_get_wal_buffers_full(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_wal()->wal_buffers_full);
+}
+
+Datum
+pg_stat_get_wal_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_wal()->stat_reset_timestamp);
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 687509ba92..22f70cc097 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,6 +5484,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full',
+  proname => 'pg_stat_get_wal_buffers_full', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_wal_buffers_full' },
+{ oid => '1137', descr => 'statistics: last reset for the walwriter',
+  proname => 'pg_stat_get_wal_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_wal_stat_reset_time' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..eb706068ba 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,16 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWal			Sent by each backend and background workers to update WAL statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWal
+{
+	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+} PgStat_MsgWal;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +608,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -745,6 +758,15 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * WAL statistics kept in the stats collector
+ */
+typedef struct PgStat_WalStats
+{
+	PgStat_Counter wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalStats;
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1265,6 +1287,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics counter is updated by backend and background workers
+ */
+extern PgStat_MsgWal WalStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1464,6 +1491,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_wal(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1478,6 +1506,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..1e4ac4432e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2129,6 +2129,8 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
+pg_stat_wal| SELECT pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+    pg_stat_get_wal_stat_reset_time() AS stats_reset;
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..81bdacf59d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..b9b875bc6a 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#28

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#27)

Re: New statistics for tuning WAL buffer size

On 2020/09/15 15:52, Masahiro Ikeda wrote:

On 2020-09-11 01:40, Fujii Masao wrote:
On 2020/09/09 13:57, Masahiro Ikeda wrote:
On 2020-09-07 16:19, Fujii Masao wrote:
On 2020/09/07 9:58, Masahiro Ikeda wrote:
Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:
On 2020/09/02 18:56, Masahiro Ikeda wrote:
+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block comment [-Wcomment]
Thanks for the comment. I fixed.
Thanks for the fix! But why are those comments necessary?
Sorry about that. This comment is not necessary.
I removed it.

The pg_stat_walwriter is not security restricted now, so ordinary users can access it.
It has the same security level as pg_stat_archiver. If you have any comments, please let me know.

+ <structfield>dirty_writes</structfield> <type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".
+/* ----------
+ * PgStat_MsgWalWriter            Sent by the walwriter to update statistics.
This comment seems not accurate because backends also send it.
+/*
+ * WAL writes statistics counter is updated in XLogWrite function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
This comment seems not right because the counter is not updated in XLogWrite().
Right. I fixed it to "Sent by each backend and background workers to update WAL statistics."
In the future, other statistics will be included so I remove the function's name.
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
What about changing this comment to "There must be only one record"?
Thanks, I fixed.

+ WalWriterStats.m_xlog_dirty_writes++;
LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be protected
with WALWriteLock, isn't it better to increment that after releasing the lock?

Thanks, I fixed.
+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_vacuum AS
In system_views.sql, the definition of pg_stat_walwriter should be
placed just after that of pg_stat_bgwriter not pg_stat_progress_analyze.
OK, I fixed it.

     }
-
     /*
      * We found an existing collector stats file. Read it and put all the

You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.
-                 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+                 errhint("Target must be \"archiver\" or \"bgwriter\" or
\"walwriter\".")));
There are two "or" in the message, but the former should be replaced with ","?
Thanks, I fixed.

On 2020-09-05 18:40, Magnus Hagander wrote:

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to

pg_stat_walwriter.

I think it is better to match naming scheme with other views

like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics

related to backend.

I prefer the view name pg_stat_walwriter for the consistency with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains

the backends' activity. Likewise, pg_stat_walwriter leads to
misunderstanding because its information is not limited to WAL
writer.

How about simply pg_stat_wal? In the future, we may want to

include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now --
it became even more apparent when the checkpointer was split out to
it's own process, and that's not exactly a recent change. And it had
allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than which
process deals with it ("walwriter") is correct, unless the statistics
can be known to only *ever* affect one type of process. (And then
different processes can affect different columns in the view). As a
general rule -- and that's from what I can tell exactly what's being
proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".

I fixed the code to send the WAL statistics from not only backend and walwriter
but also checkpointer, walsender and autovacuum worker.
Good point! Thanks for updating the patch!

@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
                          onerel->rd_rel->relisshared,
                          Max(new_live_tuples, 0),
                          vacrelstats->new_dead_tuples);
+    pgstat_send_wal();

I guess that you changed heap_vacuum_rel() as above so that autovacuum
workers can send WAL stats. But heap_vacuum_rel() can be called by
the processes (e.g., backends) other than autovacuum workers? Also
what happens if autovacuum workers just do ANALYZE only? In that case,
heap_vacuum_rel() may not be called.

Currently autovacuum worker reports the stats at the exit via
pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
is not the process that basically keeps running during the service. It exits
after it does vacuum or analyze. So ISTM that it's not bad to report the stats
only at the exit, in autovacuum worker case. There is no need to add extra
code for WAL stats report by autovacuum worker. Thought?
Thanks, I understood. I removed this code.

@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
         else
             RecentFlushPtr = GetXLogReplayRecPtr(NULL);
+        /* Send wal statistics */
+        pgstat_send_wal();

AFAIR logical walsender uses three loops in WalSndLoop(), WalSndWriteData()
and WalSndWaitForWal(). But could you tell me why added pgstat_send_wal()
into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is the best
for that purpose.

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
second argument opportunistic=true, so in this case WAL write by
wal_buffers full seems to never happen. Right? If this understanding
is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
Probably also walwriter doesn't need to do that.

Thanks for updating the patch! This patch adds pgstat_send_wal() in
walwriter main loop. But isn't this unnecessary because of the above reason?
That is, since walwriter calls AdvanceXLInsertBuffer() with
the second argument "opportunistic" = true via XLogBackgroundFlush(),
the event of full wal_buffers will never happen. No?

The logical rep walsender can generate WAL and call
AdvanceXLInsertBuffer() when it executes the replication commands like
CREATE_REPLICATION_SLOT. But this case is already covered by
pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
with your patch. So no more calls to pgstat_send_wal() seems necessary
for logical rep walsender.

Thanks for your reviews. I didn't notice that.
I updated the patches.

Sorry, the above my analysis might be incorrect. During logical replication,
walsender may access to the system table. Which may cause HOT pruning
or killing of dead index tuple. Also which can cause WAL and
full wal_buffers event. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#29

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#28)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-09-15 17:10, Fujii Masao wrote:

On 2020/09/15 15:52, Masahiro Ikeda wrote:
On 2020-09-11 01:40, Fujii Masao wrote:
On 2020/09/09 13:57, Masahiro Ikeda wrote:
On 2020-09-07 16:19, Fujii Masao wrote:
On 2020/09/07 9:58, Masahiro Ikeda wrote:
Thanks for the review and advice!

On 2020-09-03 16:05, Fujii Masao wrote:
On 2020/09/02 18:56, Masahiro Ikeda wrote:
+/* ----------
+ * Backend types
+ * ----------
You seem to forget to add "*/" into the above comment.
This issue could cause the following compiler warning.
../../src/include/pgstat.h:761:1: warning: '/*' within block
comment [-Wcomment]
Thanks for the comment. I fixed.
Thanks for the fix! But why are those comments necessary?
Sorry about that. This comment is not necessary.
I removed it.

The pg_stat_walwriter is not security restricted now, so
ordinary users can access it.
It has the same security level as pg_stat_archiver. If you have
any comments, please let me know.

+ <structfield>dirty_writes</structfield>
<type>bigint</type>

I guess that the column name "dirty_writes" derived from
the DTrace probe name. Isn't this name confusing? We should
rename it to "wal_buffers_full" or something?

I agree and rename it to "wal_buffers_full".
+/* ----------
+ * PgStat_MsgWalWriter            Sent by the walwriter to 
update statistics.
This comment seems not accurate because backends also send it.
+/*
+ * WAL writes statistics counter is updated in XLogWrite 
function
+ */
+extern PgStat_MsgWalWriter WalWriterStats;
This comment seems not right because the counter is not updated
in XLogWrite().
Right. I fixed it to "Sent by each backend and background workers
to update WAL statistics."
In the future, other statistics will be included so I remove the
function's name.
+-- There will surely and maximum one record
+select count(*) = 1 as ok from pg_stat_walwriter;
What about changing this comment to "There must be only one
record"?
Thanks, I fixed.

+ WalWriterStats.m_xlog_dirty_writes++;
LWLockRelease(WALWriteLock);

Since WalWriterStats.m_xlog_dirty_writes doesn't need to be
protected
with WALWriteLock, isn't it better to increment that after
releasing the lock?

Thanks, I fixed.
+CREATE VIEW pg_stat_walwriter AS
+    SELECT
+        pg_stat_get_xlog_dirty_writes() AS dirty_writes,
+        pg_stat_get_walwriter_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_vacuum AS
In system_views.sql, the definition of pg_stat_walwriter should
be
placed just after that of pg_stat_bgwriter not
pg_stat_progress_analyze.
OK, I fixed it.

     }
-
     /*
      * We found an existing collector stats file. Read it and
put all the

You seem to accidentally have removed the empty line here.

Sorry about that. I fixed it.
-                 errhint("Target must be \"archiver\" or 
\"bgwriter\".")));
+                 errhint("Target must be \"archiver\" or 
\"bgwriter\" or
\"walwriter\".")));
There are two "or" in the message, but the former should be
replaced with ","?
Thanks, I fixed.

On 2020-09-05 18:40, Magnus Hagander wrote:

On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote:

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I changed the view name from pg_stat_walwrites to

pg_stat_walwriter.

I think it is better to match naming scheme with other views

like

pg_stat_bgwriter,

which is for bgwriter statistics but it has the statistics

related to backend.

I prefer the view name pg_stat_walwriter for the consistency
with
other view names. But we also have pg_stat_wal_receiver. Which
makes me think that maybe pg_stat_wal_writer is better for
the consistency. Thought? IMO either of them works for me.
I'd like to hear more opinons about this.

I think pg_stat_bgwriter is now a misnomer, because it contains

the backends' activity. Likewise, pg_stat_walwriter leads to
misunderstanding because its information is not limited to WAL
writer.

How about simply pg_stat_wal? In the future, we may want to

include WAL reads in this view, e.g. reading undo logs in zheap.

Sounds reasonable.

+1.

pg_stat_bgwriter has had the "wrong name" for quite some time now
--
it became even more apparent when the checkpointer was split out
to
it's own process, and that's not exactly a recent change. And it
had
allocs in it from day one...

I think naming it for what the data in it is ("wal") rather than
which
process deals with it ("walwriter") is correct, unless the
statistics
can be known to only *ever* affect one type of process. (And then
different processes can affect different columns in the view). As
a
general rule -- and that's from what I can tell exactly what's
being
proposed.

Thanks for your comments. I agree with your opinions.
I changed the view name to "pg_stat_wal".

I fixed the code to send the WAL statistics from not only backend
and walwriter
but also checkpointer, walsender and autovacuum worker.
Good point! Thanks for updating the patch!

@@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams
*params,
                          onerel->rd_rel->relisshared,
                          Max(new_live_tuples, 0),
                          vacrelstats->new_dead_tuples);
+    pgstat_send_wal();

I guess that you changed heap_vacuum_rel() as above so that
autovacuum
workers can send WAL stats. But heap_vacuum_rel() can be called by
the processes (e.g., backends) other than autovacuum workers? Also
what happens if autovacuum workers just do ANALYZE only? In that
case,
heap_vacuum_rel() may not be called.

Currently autovacuum worker reports the stats at the exit via
pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker
is not the process that basically keeps running during the service.
It exits
after it does vacuum or analyze. So ISTM that it's not bad to
report the stats
only at the exit, in autovacuum worker case. There is no need to
add extra
code for WAL stats report by autovacuum worker. Thought?
Thanks, I understood. I removed this code.

@@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc)
         else
             RecentFlushPtr = GetXLogReplayRecPtr(NULL);
+        /* Send wal statistics */
+        pgstat_send_wal();

AFAIR logical walsender uses three loops in WalSndLoop(),
WalSndWriteData()
and WalSndWaitForWal(). But could you tell me why added
pgstat_send_wal()
into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is
the best
for that purpose.

I checked what function calls XLogBackgroundFlush() which calls
AdvanceXLInsertBuffer() to increment m_wal_buffers_full.

I found that WalSndWaitForWal() calls it, so I added it.
Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the
second argument opportunistic=true, so in this case WAL write by
wal_buffers full seems to never happen. Right? If this understanding
is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal().
Probably also walwriter doesn't need to do that.
Thanks for updating the patch! This patch adds pgstat_send_wal() in
walwriter main loop. But isn't this unnecessary because of the above
reason?
That is, since walwriter calls AdvanceXLInsertBuffer() with
the second argument "opportunistic" = true via XLogBackgroundFlush(),
the event of full wal_buffers will never happen. No?

Right, I fixed it.

The logical rep walsender can generate WAL and call
AdvanceXLInsertBuffer() when it executes the replication commands
like
CREATE_REPLICATION_SLOT. But this case is already covered by
pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(),
with your patch. So no more calls to pgstat_send_wal() seems
necessary
for logical rep walsender.

Thanks for your reviews. I didn't notice that.
I updated the patches.

Sorry, the above my analysis might be incorrect. During logical
replication,
walsender may access to the system table. Which may cause HOT pruning
or killing of dead index tuple. Also which can cause WAL and
full wal_buffers event. Thought?

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

WalSndLoop()
-> XLogSendLogical()
-> LogicalDecodingProcessRecord()
-> DecodeXactOp()
-> DecodeCommit()
-> ReorderBufferCommit()
-> ReorderBufferProcessTXN()
-> RelidByRelfilenode()
-> systable_getnext()

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

Is it necessary to introduce a mechanism to send in bulk?
But I worried about how to implement is best. Is it good to send wal
statistics per X recoreds?

I think there are other background processes that access the system
tables,
so I organized which process must send wal metrics and added
pgstat_send_wal() to the main loop of some background processes
for example, autovacuum launcher, logical replication launcher, and
logical replication worker's one.

(*) [x]: it needs to send it
[ ]: it don't need to send it

* [ ] postmaster
* [ ] background writer
* [x] checkpointer: it generates wal for checkpoint.
* [ ] walwriter
* [x] autovacuum launcher: it accesses to the system tables to get the
database list.
* [x] autovacuum worker: it generates wal for vacuum.
* [ ] stats collector
* [x] backend: it generate wal for query execution.
* [ ] startup
* [ ] archiver
* [x] walsender: it accesses to the system tables if logical replication
is performed.
* [ ] walreceiver
* [x] logical replication launcher: it accesses to the system tables to
get the subscription list.
* [x] logical replication worker: it accesses to the system tables to
get oid from relname.
* [x] parallel worker: it generates wal for query execution.

If my understanding is wrong, please let me know.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0007_pg_stat_wal_view.patchtext/x-diff; charset=us-ascii; name=0007_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 673a0e73e4..6d56912221 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,13 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-wal-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3280,6 +3287,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-view">
+   <title><structname>pg_stat_wal</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-wal-view" xreflabel="pg_stat_wal">
+   <title><structname>pg_stat_wal</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_buffers_full</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4668,8 +4725,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>
+        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..3a06cacefb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2195,6 +2195,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
 					LWLockRelease(WALWriteLock);
+					WalStats.m_wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
 				/* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..643445c189 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -979,6 +979,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_wal AS
+    SELECT
+        pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+        pg_stat_get_wal_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2cef56f115..8dca6628de 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -795,6 +795,13 @@ AutoVacLauncherMain(int argc, char *argv[])
 										   current_time, 0))
 				launch_worker(current_time);
 		}
+
+		/*
+		 * Send wal statistics because some xlog may be written.
+		 * It accesses the system table, which may cause HOT
+		 * pruning or killing of dead index tuple.
+		 */
+		pgstat_send_wal();
 	}
 
 	AutoVacLauncherShutdown();
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3e7dcd4f76..1f4fee1f3b 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -504,6 +504,9 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send wal statistics to the stats collector. */
+		pgstat_send_wal();
+
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
 		 * checkpoint without sleeping.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..7127beca66 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,13 @@ char	   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -281,6 +288,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +361,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +947,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Send wal statistics */
+	pgstat_send_wal();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1382,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "wal") == 0)
+		msg.m_resettarget = RESET_WAL;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\" or \"wal\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2688,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_wal() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the wal statistics struct.
+ * ---------
+ */
+PgStat_WalStats *
+pgstat_fetch_stat_wal(void)
+{
+	backend_read_statsfile();
+
+	return &walStats;
+}
 
 /*
  * ---------
@@ -4419,6 +4448,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_wal() -
+ *
+ *		Send wal statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_wal(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
+	pgstat_send(&WalStats, sizeof(WalStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4658,6 +4719,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WAL:
+					pgstat_recv_wal(&msg.msg_wal, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4927,6 +4992,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write wal stats struct
+	 */
+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5186,11 +5257,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver and wal statistics so they start from zero in
 	 * case we can't load an existing statsfile.
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walStats, 0, sizeof(walStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5199,6 +5271,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5268,6 +5341,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walStats, 0, sizeof(walStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5633,6 +5717,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6213,6 +6308,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WAL)
+	{
+		/* Reset the wal statistics for the cluster. */
+		memset(&walStats, 0, sizeof(walStats));
+		walStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6427,6 +6528,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_wal() -
+ *
+ *	Process a WAL message.
+ * ----------
+ */
+static void
+pgstat_recv_wal(PgStat_MsgWal *msg, int len)
+{
+	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index bdaf0312d6..23d7e2b537 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1037,6 +1037,13 @@ ApplyLauncherMain(Datum main_arg)
 			wait_time = wal_retrieve_retry_interval;
 		}
 
+		/*
+		 * Send wal statistics because some xlog may be written.
+		 * It accesses the system table, which may cause HOT
+		 * pruning or killing of dead index tuple.
+		 */
+		pgstat_send_wal();
+
 		/* Wait for more work. */
 		rc = WaitLatch(MyLatch,
 					   WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index d239d28c09..f580d8b01b 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2200,6 +2200,13 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
 		MemoryContextResetAndDeleteChildren(ApplyMessageContext);
 		MemoryContextSwitchTo(TopMemoryContext);
 
+		/*
+		 * Send wal statistics because some xlog may be written.
+		 * It accesses the system table, which may cause HOT
+		 * pruning or killing of dead index tuple.
+		 */
+		pgstat_send_wal();
+
 		/* Check if we need to exit the streaming loop. */
 		if (endofstream)
 		{
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 7c9d1b67df..1180383531 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2883,6 +2883,13 @@ XLogSendLogical(void)
 		LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader);
 
 		sentPtr = logical_decoding_ctx->reader->EndRecPtr;
+
+		/*
+		 * Send wal statistics because some xlog may be written.
+		 * It accesses the system table, which may cause HOT
+		 * pruning or killing of dead index tuple.
+		 */
+		pgstat_send_wal();
 	}
 
 	/*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..aa41330796 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+Datum
+pg_stat_get_wal_buffers_full(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_wal()->wal_buffers_full);
+}
+
+Datum
+pg_stat_get_wal_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_wal()->stat_reset_timestamp);
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 96d7efd427..dde0cb55ce 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5484,6 +5484,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full',
+  proname => 'pg_stat_get_wal_buffers_full', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_wal_buffers_full' },
+{ oid => '1137', descr => 'statistics: last reset for the walwriter',
+  proname => 'pg_stat_get_wal_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_wal_stat_reset_time' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..eb706068ba 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,16 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWal			Sent by each backend and background workers to update WAL statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWal
+{
+	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+} PgStat_MsgWal;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +608,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -745,6 +758,15 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * WAL statistics kept in the stats collector
+ */
+typedef struct PgStat_WalStats
+{
+	PgStat_Counter wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalStats;
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1265,6 +1287,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics counter is updated by backend and background workers
+ */
+extern PgStat_MsgWal WalStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1464,6 +1491,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_wal(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1478,6 +1506,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..1e4ac4432e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2129,6 +2129,8 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
+pg_stat_wal| SELECT pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+    pg_stat_get_wal_stat_reset_time() AS stats_reset;
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..81bdacf59d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..b9b875bc6a 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#30

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Masahiro Ikeda (#29)

Re: New statistics for tuning WAL buffer size

At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

...

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

I think that's too frequent. If we want to send any stats to the
collector, it is usually done at commit time using
pgstat_report_stat(), and the function avoids sending stats too
frequently. For logrep-worker, apply_handle_commit() is calling it. It
seems to be the place if we want to send the wal stats. Or it may be
better to call pgstat_send_wal() via pgstat_report_stat(), like
pg_stat_slru().

Currently logrep-laucher, logrep-worker and autovac-launcher (and some
other processes?) don't seem (AFAICS) sending scan stats at all but
according to the discussion here, we should let such processes send
stats.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#31

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#30)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-09-18 11:11, Kyotaro Horiguchi wrote:

At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

...

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

I think that's too frequent. If we want to send any stats to the
collector, it is usually done at commit time using
pgstat_report_stat(), and the function avoids sending stats too
frequently. For logrep-worker, apply_handle_commit() is calling it. It
seems to be the place if we want to send the wal stats. Or it may be
better to call pgstat_send_wal() via pgstat_report_stat(), like
pg_stat_slru().

Thanks for your comments.
Since I changed to use pgstat_report_stat() and DecodeCommit() is
calling it,
the frequency to send statistics is not so high.

Currently logrep-laucher, logrep-worker and autovac-launcher (and some
other processes?) don't seem (AFAICS) sending scan stats at all but
according to the discussion here, we should let such processes send
stats.

I added pgstat_report_stat() to logrep-laucher and autovac-launcher.
As you said, logrep-worker already calls apply_handle_commit() and
pgstat_report_stat().

The checkpointer doesn't seem to call pgstat_report_stat() currently,
but since there is a possibility to send wal statistics, I added
pgstat_report_stat().

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0008_pg_stat_wal_view.patchtext/x-diff; name=0008_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4e0193a967..dd292fe27c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,13 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
+      <entry>One row only, showing statistics about the WAL writing activity. See
+       <xref linkend="monitoring-pg-stat-wal-view"/> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3280,6 +3287,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-view">
+   <title><structname>pg_stat_wal</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal</structname> view will always have a
+   single row, containing data about the WAL writing activity of the cluster.
+  </para>
+
+  <table id="pg-stat-wal-view" xreflabel="pg_stat_wal">
+   <title><structname>pg_stat_wal</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_buffers_full</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4668,8 +4725,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>
+        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 61754312e2..3a06cacefb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2195,6 +2195,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
 					LWLockRelease(WALWriteLock);
+					WalStats.m_wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
 				/* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..643445c189 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -979,6 +979,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_wal AS
+    SELECT
+        pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+        pg_stat_get_wal_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2cef56f115..1b224d479d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -795,6 +795,8 @@ AutoVacLauncherMain(int argc, char *argv[])
 										   current_time, 0))
 				launch_worker(current_time);
 		}
+
+		pgstat_report_stat(false);
 	}
 
 	AutoVacLauncherShutdown();
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3e7dcd4f76..a4dce85955 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -503,6 +503,7 @@ CheckpointerMain(void)
 		 * stats message types.)
 		 */
 		pgstat_send_bgwriter();
+		pgstat_report_stat(false);
 
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..7127beca66 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -141,6 +141,13 @@ char	   *pgstat_stat_tmpname = NULL;
  */
 PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;
+
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
  * SLRUs, so we use this fixed list instead.  The "other" entry is used for
@@ -281,6 +288,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +361,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +947,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Send wal statistics */
+	pgstat_send_wal();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1382,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "wal") == 0)
+		msg.m_resettarget = RESET_WAL;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\" or \"wal\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2688,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_wal() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the wal statistics struct.
+ * ---------
+ */
+PgStat_WalStats *
+pgstat_fetch_stat_wal(void)
+{
+	backend_read_statsfile();
+
+	return &walStats;
+}
 
 /*
  * ---------
@@ -4419,6 +4448,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_wal() -
+ *
+ *		Send wal statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_wal(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
+	pgstat_send(&WalStats, sizeof(WalStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4658,6 +4719,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WAL:
+					pgstat_recv_wal(&msg.msg_wal, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4927,6 +4992,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write wal stats struct
+	 */
+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5186,11 +5257,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver and wal statistics so they start from zero in
 	 * case we can't load an existing statsfile.
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walStats, 0, sizeof(walStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5199,6 +5271,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5268,6 +5341,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walStats, 0, sizeof(walStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5633,6 +5717,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6213,6 +6308,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WAL)
+	{
+		/* Reset the wal statistics for the cluster. */
+		memset(&walStats, 0, sizeof(walStats));
+		walStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6427,6 +6528,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_wal() -
+ *
+ *	Process a WAL message.
+ * ----------
+ */
+static void
+pgstat_recv_wal(PgStat_MsgWal *msg, int len)
+{
+	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index f21f61d5e1..80ee18ce19 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -650,6 +650,7 @@ DecodeCommit(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 	/* replay actions of all transaction + subtransactions in order */
 	ReorderBufferCommit(ctx->reorder, xid, buf->origptr, buf->endptr,
 						commit_time, origin_id, origin_lsn);
+	pgstat_report_stat(false);
 }
 
 /*
diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c
index bdaf0312d6..dba86e8471 100644
--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -1025,6 +1025,8 @@ ApplyLauncherMain(Datum main_arg)
 			MemoryContextSwitchTo(oldctx);
 			/* Clean the temporary memory. */
 			MemoryContextDelete(subctx);
+
+			pgstat_report_stat(false);
 		}
 		else
 		{
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..aa41330796 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+Datum
+pg_stat_get_wal_buffers_full(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_wal()->wal_buffers_full);
+}
+
+Datum
+pg_stat_get_wal_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_wal()->stat_reset_timestamp);
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f48f5fb4d9..5b1830274a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5481,6 +5481,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full',
+  proname => 'pg_stat_get_wal_buffers_full', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_wal_buffers_full' },
+{ oid => '1137', descr => 'statistics: last reset for the walwriter',
+  proname => 'pg_stat_get_wal_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_wal_stat_reset_time' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..eb706068ba 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,16 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWal			Sent by each backend and background workers to update WAL statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWal
+{
+	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+} PgStat_MsgWal;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +608,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -745,6 +758,15 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * WAL statistics kept in the stats collector
+ */
+typedef struct PgStat_WalStats
+{
+	PgStat_Counter wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */
+} PgStat_WalStats;
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1265,6 +1287,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics counter is updated by backend and background workers
+ */
+extern PgStat_MsgWal WalStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1464,6 +1491,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_wal(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1478,6 +1506,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..1e4ac4432e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2129,6 +2129,8 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
+pg_stat_wal| SELECT pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+    pg_stat_get_wal_stat_reset_time() AS stats_reset;
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..81bdacf59d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..b9b875bc6a 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#32

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#31)

Re: New statistics for tuning WAL buffer size

On 2020/09/25 12:06, Masahiro Ikeda wrote:

On 2020-09-18 11:11, Kyotaro Horiguchi wrote:

At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

...

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

I think that's too frequent.ï¿½ If we want to send any stats to the
collector, it is usually done at commit time using
pgstat_report_stat(), and the function avoids sending stats too
frequently. For logrep-worker, apply_handle_commit() is calling it. It
seems to be the place if we want to send the wal stats.ï¿½ Or it may be
better to call pgstat_send_wal() via pgstat_report_stat(), like
pg_stat_slru().

Thanks for your comments.
Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
the frequency to send statistics is not so high.

On second thought, it's strange to include this change in pg_stat_wal patch.
Because pgstat_report_stat() sends various stats and that change would
affect not only pg_stat_wal but also other stats views. That is, if we really
want to make some processes call pgstat_report_stat() newly, which
should be implemented as a separate patch. But I'm not sure how useful
this change is because probably the stats are almost negligibly small
in those processes.

This thought seems valid for pgstat_send_wal(). I changed the thought
and am inclined to be ok not to call pgstat_send_wal() in some background
processes that are very unlikely to generate WAL. For example, logical-rep
launcher, logical-rep walsender, and autovacuum launcher. Thought?

Currently logrep-laucher, logrep-worker and autovac-launcher (and some
other processes?) don't seem (AFAICS) sending scan stats at all but
according to the discussion here, we should let such processes send
stats.

I added pgstat_report_stat() to logrep-laucher and autovac-launcher.
As you said, logrep-worker already calls apply_handle_commit() and pgstat_report_stat().

Right.

The checkpointer doesn't seem to call pgstat_report_stat() currently,
but since there is a possibility to send wal statistics, I added pgstat_report_stat().

IMO it's better to call pgstat_send_wal() in the checkpointer, instead,
because of the above reason.

Thanks for updating the patch! I'd like to share my review comments.

+ <xref linkend="monitoring-pg-stat-wal-view"/> for details.

Like the description for pg_stat_bgwriter, <link> tag should be used
instead of <xref>.

+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> are full
+      </para></entry>

I prefer the following description. Thought?

"Number of times WAL data was written to the disk because wal_buffers got full"

+ the <structname>pg_stat_archiver</structname> view ,or <literal>wal</literal>

A comma should be just after "view" (not just before "or").

+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;

What about merging the comments for BgWriterStats and WalStats into one because they are almost the same? For example,

-------------------------------
/*
* BgWriter and WAL global statistics counters.
* Stored directly in a stats message structure so they can be sent
* without needing to copy things around. We assume these init to zeroes.
*/
PgStat_MsgBgWriter BgWriterStats;
PgStat_MsgWal WalStats;
-------------------------------

BTW, originally there was the comment "(unused in other processes)"
for BgWriterStats. But it seems not true, so I removed it from
the above example.

+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */

Since the patch changes the pgstat file format,
PGSTAT_FILE_FORMAT_ID should also be changed?

-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver and wal statistics so they start from zero in

This is not the issue of this patch, but isn't it better to mention
also SLRU stats here? That is, what about "Clear out global, archiver,
WAL and SLRU statistics so they start from zero in"?

I found "wal statistics" and "wal stats" in some comments in the patch,
but isn't it better to use "WAL statistics" and "WAL stats", instead,
if there is no special reason to use lowercase?

+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))

In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats
should be declared and be used to store the WAL stats read via fread(),
instead.

+{ oid => '1136', descr => 'statistics: number of WAL writes when the wal buffers are full',

If we change the description of wal_buffers_full column in the document
as I proposed, we should also use the proposed description here.

+{ oid => '1137', descr => 'statistics: last reset for the walwriter',

"the walwriter" should be "WAL" or "WAL activity", etc?

+ * PgStat_MsgWal Sent by each backend and background workers to update WAL statistics.

If your intention here is to mention background processes like checkpointer,
"each backend and background workers" should be "backends and background
processes"?

+ PgStat_Counter m_wal_buffers_full; /* number of WAL write caused by full of WAL buffers */

I don't think this comment is necessary.

+	PgStat_Counter wal_buffers_full;	/* number of WAL write caused by full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset */

I don't think these comments are necessary.

+/*
+ * WAL writes statistics counter is updated by backend and background workers

Same as above.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#33

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Fujii Masao (#32)

Re: New statistics for tuning WAL buffer size

On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/25 12:06, Masahiro Ikeda wrote:

On 2020-09-18 11:11, Kyotaro Horiguchi wrote:

At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

...

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

I think that's too frequent. If we want to send any stats to the
collector, it is usually done at commit time using
pgstat_report_stat(), and the function avoids sending stats too
frequently. For logrep-worker, apply_handle_commit() is calling it. It
seems to be the place if we want to send the wal stats. Or it may be
better to call pgstat_send_wal() via pgstat_report_stat(), like
pg_stat_slru().

Thanks for your comments.
Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
the frequency to send statistics is not so high.

On second thought, it's strange to include this change in pg_stat_wal patch.
Because pgstat_report_stat() sends various stats and that change would
affect not only pg_stat_wal but also other stats views. That is, if we really
want to make some processes call pgstat_report_stat() newly, which
should be implemented as a separate patch. But I'm not sure how useful
this change is because probably the stats are almost negligibly small
in those processes.

This thought seems valid for pgstat_send_wal(). I changed the thought
and am inclined to be ok not to call pgstat_send_wal() in some background
processes that are very unlikely to generate WAL.

This makes sense to me. I think even if such background processes have
to write WAL due to wal_buffers, it will be accounted next time the
backend sends the stats.

One minor point, don't we need to reset the counter
WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
stats will be accounted multiple times.

--
With Regards,
Amit Kapila.

#34

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Amit Kapila (#33)

Re: New statistics for tuning WAL buffer size

At Sat, 26 Sep 2020 15:48:49 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in

On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/25 12:06, Masahiro Ikeda wrote:

On 2020-09-18 11:11, Kyotaro Horiguchi wrote:

At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

...

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

I think that's too frequent. If we want to send any stats to the
collector, it is usually done at commit time using
pgstat_report_stat(), and the function avoids sending stats too
frequently. For logrep-worker, apply_handle_commit() is calling it. It
seems to be the place if we want to send the wal stats. Or it may be
better to call pgstat_send_wal() via pgstat_report_stat(), like
pg_stat_slru().

Thanks for your comments.
Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
the frequency to send statistics is not so high.

On second thought, it's strange to include this change in pg_stat_wal patch.
Because pgstat_report_stat() sends various stats and that change would
affect not only pg_stat_wal but also other stats views. That is, if we really
want to make some processes call pgstat_report_stat() newly, which
should be implemented as a separate patch. But I'm not sure how useful
this change is because probably the stats are almost negligibly small
in those processes.

This thought seems valid for pgstat_send_wal(). I changed the thought
and am inclined to be ok not to call pgstat_send_wal() in some background
processes that are very unlikely to generate WAL.

This makes sense to me. I think even if such background processes have

This makes sense to me. I think even if such background processes have
to write WAL due to wal_buffers, it will be accounted next time the
backend sends the stats.

Where do they send the stats? (I think it's ok to omit seding stats at
all for such low-wal/heap activity processes.)

One minor point, don't we need to reset the counter
WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
stats will be accounted multiple times.

Isn't this doing that?

+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#35

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Amit Kapila (#33)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-09-26 19:18, Amit Kapila wrote:

On Fri, Sep 25, 2020 at 11:06 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/25 12:06, Masahiro Ikeda wrote:

On 2020-09-18 11:11, Kyotaro Horiguchi wrote:

At Fri, 18 Sep 2020 09:40:11 +0900, Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote in

Thanks. I confirmed that it causes HOT pruning or killing of
dead index tuple if DecodeCommit() is called.

As you said, DecodeCommit() may access the system table.

...

The wals are generated only when logical replication is performed.
So, I added pgstat_send_wal() in XLogSendLogical().

But, I concerned that it causes poor performance
since pgstat_send_wal() is called per wal record,

I think that's too frequent. If we want to send any stats to the
collector, it is usually done at commit time using
pgstat_report_stat(), and the function avoids sending stats too
frequently. For logrep-worker, apply_handle_commit() is calling it. It
seems to be the place if we want to send the wal stats. Or it may be
better to call pgstat_send_wal() via pgstat_report_stat(), like
pg_stat_slru().

Thanks for your comments.
Since I changed to use pgstat_report_stat() and DecodeCommit() is calling it,
the frequency to send statistics is not so high.

On second thought, it's strange to include this change in pg_stat_wal
patch.
Because pgstat_report_stat() sends various stats and that change would
affect not only pg_stat_wal but also other stats views. That is, if we
really
want to make some processes call pgstat_report_stat() newly, which
should be implemented as a separate patch. But I'm not sure how useful
this change is because probably the stats are almost negligibly small
in those processes.

This thought seems valid for pgstat_send_wal(). I changed the thought
and am inclined to be ok not to call pgstat_send_wal() in some
background
processes that are very unlikely to generate WAL.

OK, I removed to pgstat_report_stat() for autovaccum launcher,
logrep-worker and logrep-launcher.

This makes sense to me. I think even if such background processes have
to write WAL due to wal_buffers, it will be accounted next time the
backend sends the stats.

Thanks for your comments.

IIUC, since each process counts WalStats.m_wal_buffers_full,
backend can't send the counter which other background processes have to
write WAL due to wal_buffers.
Although we can't track all WAL activity, the impact on the statistics
is minimal so we can ignore it.

One minor point, don't we need to reset the counter
WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
stats will be accounted multiple times.

Now, the counter is reset in pgstat_send_wal.
Isn't it enough?

The checkpointer doesn't seem to call pgstat_report_stat() currently,
but since there is a possibility to send wal statistics, I added
pgstat_report_stat().

IMO it's better to call pgstat_send_wal() in the checkpointer, instead,
because of the above reason.

Ok, I changed.

Thanks for updating the patch! I'd like to share my review comments.

+ <xref linkend="monitoring-pg-stat-wal-view"/> for details.

Like the description for pg_stat_bgwriter, <link> tag should be used
instead of <xref>.

Thanks, fixed.

+      <para>
+       Number of WAL writes when the <xref linkend="guc-wal-buffers"/> 
are full
+      </para></entry>
I prefer the following description. Thought?

"Number of times WAL data was written to the disk because wal_buffers
got full"

Ok, I changed.

+ the <structname>pg_stat_archiver</structname> view ,or
<literal>wal</literal>

A comma should be just after "view" (not just before "or").

Sorry, anyway I think a comma is not necessary.
I removed it.

+/*
+ * WAL global statistics counter.
+ * This counter is incremented by both each backend and background.
+ * And then, sent to the stat collector process.
+ */
+PgStat_MsgWal WalStats;
What about merging the comments for BgWriterStats and WalStats into
one because they are almost the same? For example,

-------------------------------
/*
* BgWriter and WAL global statistics counters.
* Stored directly in a stats message structure so they can be sent
* without needing to copy things around. We assume these init to
zeroes.
*/
PgStat_MsgBgWriter BgWriterStats;
PgStat_MsgWal WalStats;
-------------------------------

BTW, originally there was the comment "(unused in other processes)"
for BgWriterStats. But it seems not true, so I removed it from
the above example.

Thanks, I changed.

+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
Since the patch changes the pgstat file format,
PGSTAT_FILE_FORMAT_ID should also be changed?

Sorry about that.
I incremented PGSTAT_FILE_FORMAT_ID by +1.

-	 * Clear out global and archiver statistics so they start from zero 
in
+	 * Clear out global, archiver and wal statistics so they start from 
zero in
This is not the issue of this patch, but isn't it better to mention
also SLRU stats here? That is, what about "Clear out global, archiver,
WAL and SLRU statistics so they start from zero in"?

Thanks, I changed.

I found "wal statistics" and "wal stats" in some comments in the patch,
but isn't it better to use "WAL statistics" and "WAL stats", instead,
if there is no special reason to use lowercase?

OK. I fixed it.

+	/*
+	 * Read wal stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
In pgstat_read_db_statsfile_timestamp(), the local variable myWalStats
should be declared and be used to store the WAL stats read via fread(),
instead.

Thanks, I changed it to declare myWalStats.

+{ oid => '1136', descr => 'statistics: number of WAL writes when the
wal buffers are full',

If we change the description of wal_buffers_full column in the document
as I proposed, we should also use the proposed description here.

OK, I fixed it.

+{ oid => '1137', descr => 'statistics: last reset for the walwriter',

"the walwriter" should be "WAL" or "WAL activity", etc?

Thanks, I fixed it.

+ * PgStat_MsgWal Sent by each backend and background workers to
update WAL statistics.

If your intention here is to mention background processes like
checkpointer,
"each backend and background workers" should be "backends and
background
processes"?

Thanks, I fixed it.

+ PgStat_Counter m_wal_buffers_full; /* number of WAL write caused by
full of WAL buffers */

I don't think this comment is necessary.

OK, I removed.

+	PgStat_Counter wal_buffers_full;	/* number of WAL write caused by
full of WAL buffers */
+	TimestampTz stat_reset_timestamp;	/* last time when the stats reset 
*/

I don't think these comments are necessary.

OK, I removed

+/*
+ * WAL writes statistics counter is updated by backend and background 
workers

Same as above.

I fixed it.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0009_pg_stat_wal_view.patchtext/x-diff; name=0009_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4e0193a967..e50710bdbd 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
+      <entry>One row only, showing statistics about WAL activity. See
+       <link linkend="monitoring-pg-stat-wal-view">
+       <structname>pg_stat_wal</structname></link> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3280,6 +3288,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-view">
+   <title><structname>pg_stat_wal</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal</structname> view will always have a
+   single row, containing data about WAL activity of the cluster.
+  </para>
+
+  <table id="pg-stat-wal-view" xreflabel="pg_stat_wal">
+   <title><structname>pg_stat_wal</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_buffers_full</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times WAL data was written to the disk because wal_buffers got full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4668,8 +4726,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view or <literal>wal</literal>
+        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 79a77ebbfe..64403690da 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2195,6 +2195,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
 					LWLockRelease(WALWriteLock);
+					WalStats.m_wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
 				/* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..643445c189 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -979,6 +979,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_wal AS
+    SELECT
+        pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+        pg_stat_get_wal_stat_reset_time() AS stats_reset;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3e7dcd4f76..429c8010ef 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -504,6 +504,9 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send WAL statistics to the stats collector. */
+		pgstat_send_wal();
+
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
 		 * checkpoint without sleeping.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..9d8a435304 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -135,11 +135,12 @@ char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
 
 /*
- * BgWriter global statistics counters (unused in other processes).
- * Stored directly in a stats message structure so it can be sent
- * without needing to copy things around.  We assume this inits to zeroes.
+ * BgWriter and WAL global statistics counters.
+ * Stored directly in a stats message structure so they can be sent
+ * without needing to copy things around.  We assume these init to zeroes.
  */
 PgStat_MsgBgWriter BgWriterStats;
+PgStat_MsgWal WalStats;
 
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
@@ -281,6 +282,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +355,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +941,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Send WAL statistics */
+	pgstat_send_wal();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1376,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "wal") == 0)
+		msg.m_resettarget = RESET_WAL;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\" or \"wal\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2682,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_wal() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the WAL statistics struct.
+ * ---------
+ */
+PgStat_WalStats *
+pgstat_fetch_stat_wal(void)
+{
+	backend_read_statsfile();
+
+	return &walStats;
+}
 
 /*
  * ---------
@@ -4419,6 +4442,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_wal() -
+ *
+ *		Send WAL statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_wal(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
+	pgstat_send(&WalStats, sizeof(WalStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4658,6 +4713,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WAL:
+					pgstat_recv_wal(&msg.msg_wal, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4927,6 +4986,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write WAL stats struct
+	 */
+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5186,11 +5251,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver, WAL and SLRU statistics so they start from zero in
 	 * case we can't load an existing statsfile.
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walStats, 0, sizeof(walStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5199,6 +5265,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5268,6 +5335,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read WAL stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walStats, 0, sizeof(walStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5578,6 +5656,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_StatDBEntry dbentry;
 	PgStat_GlobalStats myGlobalStats;
 	PgStat_ArchiverStats myArchiverStats;
+	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
 	FILE	   *fpin;
 	int32		format_id;
@@ -5633,6 +5712,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read WAL stats struct
+	 */
+	if (fread(&myWalStats, 1, sizeof(myWalStats), fpin) != sizeof(myWalStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6213,6 +6303,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WAL)
+	{
+		/* Reset the WAL statistics for the cluster. */
+		memset(&walStats, 0, sizeof(walStats));
+		walStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6427,6 +6523,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_wal() -
+ *
+ *	Process a WAL message.
+ * ----------
+ */
+static void
+pgstat_recv_wal(PgStat_MsgWal *msg, int len)
+{
+	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..aa41330796 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,18 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+Datum
+pg_stat_get_wal_buffers_full(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_stat_wal()->wal_buffers_full);
+}
+
+Datum
+pg_stat_get_wal_stat_reset_time(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_stat_wal()->stat_reset_timestamp);
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f48f5fb4d9..e65e0fc64f 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5481,6 +5481,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '1136', descr => 'statistics: Number of times WAL data was written to the disk because wal_buffers got full',
+  proname => 'pg_stat_get_wal_buffers_full', provolatile => 's', proparallel => 'r',
+  prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_wal_buffers_full' },
+{ oid => '1137', descr => 'statistics: last reset for WAL activity',
+  proname => 'pg_stat_get_wal_stat_reset_time', provolatile => 's',
+  proparallel => 'r', prorettype => 'timestamptz', proargtypes => '',
+  prosrc => 'pg_stat_get_wal_stat_reset_time' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..f7ddf17564 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,16 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWal			Sent by backends and background processes to update WAL statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWal
+{
+	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_buffers_full;
+} PgStat_MsgWal;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +608,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -614,7 +627,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9D
+#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9E
 
 /* ----------
  * PgStat_StatDBEntry			The collector's data per database
@@ -745,6 +758,15 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * WAL statistics kept in the stats collector
+ */
+typedef struct PgStat_WalStats
+{
+	PgStat_Counter wal_buffers_full;
+	TimestampTz stat_reset_timestamp;
+} PgStat_WalStats;
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1265,6 +1287,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL writes statistics counter is updated by backends and background processes
+ */
+extern PgStat_MsgWal WalStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1464,6 +1491,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_wal(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1478,6 +1506,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..1e4ac4432e 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2129,6 +2129,8 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
+pg_stat_wal| SELECT pg_stat_get_wal_buffers_full() AS wal_buffers_full,
+    pg_stat_get_wal_stat_reset_time() AS stats_reset;
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..81bdacf59d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..b9b875bc6a 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#36

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Masahiro Ikeda (#35)

Re: New statistics for tuning WAL buffer size

On Mon, Sep 28, 2020 at 7:00 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2020-09-26 19:18, Amit Kapila wrote

This makes sense to me. I think even if such background processes have
to write WAL due to wal_buffers, it will be accounted next time the
backend sends the stats.

Thanks for your comments.

IIUC, since each process counts WalStats.m_wal_buffers_full,
backend can't send the counter which other background processes have to
write WAL due to wal_buffers.

Right, I misunderstood it.

Although we can't track all WAL activity, the impact on the statistics
is minimal so we can ignore it.

Yeah, that is probably true.

One minor point, don't we need to reset the counter
WalStats.m_wal_buffers_full once we sent the stats, otherwise the same
stats will be accounted multiple times.

Now, the counter is reset in pgstat_send_wal.
Isn't it enough?

That should be enough.

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

--
With Regards,
Amit Kapila.

#37

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Amit Kapila (#36)

Re: New statistics for tuning WAL buffer size

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#38

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#37)

Re: New statistics for tuning WAL buffer size

On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

--
With Regards,
Amit Kapila.

#39

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Amit Kapila (#38)

Re: New statistics for tuning WAL buffer size

On 2020-09-28 12:43, Amit Kapila wrote:

On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
<amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?

Since this is refactoring, I think it's better to make another patch
after the current patch is merged.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#40

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Masahiro Ikeda (#39)

Re: New statistics for tuning WAL buffer size

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2020-09-28 12:43, Amit Kapila wrote:

On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
<amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?

I have given the idea for wal_stats because there is just one counter
in that. I think you can just try to evaluate the merits of each
approach and choose whichever you feel is good. This is just a
suggestion, if you don't like it feel free to proceed with the current
approach.

--
With Regards,
Amit Kapila.

#41

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Amit Kapila (#40)

Re: New statistics for tuning WAL buffer size

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

On 2020-09-28 12:43, Amit Kapila wrote:

On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
<amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?

I have given the idea for wal_stats because there is just one counter
in that. I think you can just try to evaluate the merits of each
approach and choose whichever you feel is good. This is just a
suggestion, if you don't like it feel free to proceed with the current
approach.

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes, records,
fpi),
I think that the current approach is good.

--
Masahiro Ikeda
NTT DATA CORPORATION

#42

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#41)

Re: New statistics for tuning WAL buffer size

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2020-09-28 12:43, Amit Kapila wrote:

On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
<amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?

I have given the idea for wal_stats because there is just one counter
in that. I think you can just try to evaluate the merits of each
approach and choose whichever you feel is good. This is just a
suggestion, if you don't like it feel free to proceed with the current
approach.

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi),
I think that the current approach is good.

I marked this patch as ready for committer.
Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#43

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Fujii Masao (#42)

Re: New statistics for tuning WAL buffer size

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2020-09-28 12:43, Amit Kapila wrote:

On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
<amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?

I have given the idea for wal_stats because there is just one counter
in that. I think you can just try to evaluate the merits of each
approach and choose whichever you feel is good. This is just a
suggestion, if you don't like it feel free to proceed with the current
approach.

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

--
With Regards,
Amit Kapila.

#44

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Amit Kapila (#43)

Re: New statistics for tuning WAL buffer size

On 2020/09/30 20:21, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

On 2020-09-28 12:43, Amit Kapila wrote:

On Mon, Sep 28, 2020 at 8:24 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Mon, 28 Sep 2020 08:11:23 +0530, Amit Kapila
<amit.kapila16@gmail.com> wrote in

One other thing that occurred to me today is can't we keep this as
part of PgStat_GlobalStats? We can use pg_stat_reset_shared('wal'); to
reset it. It seems to me this is a cluster-wide stats and somewhat
similar to some of the other stats we maintain there.

I like that direction, but PgStat_GlobalStats is actually
PgStat_BgWriterStats and cleard by a RESET_BGWRITER message.

Yeah, I think if we want to pursue this direction then we probably
need to have a separate message to set/reset WAL-related stuff. I
guess we probably need to have a separate reset timestamp for WAL. I
think the difference would be that we can have one structure to refer
to global_stats instead of referring to multiple structures and we
don't need to issue separate read/write calls but OTOH I don't see
many disadvantages of the current approach as well.

IIUC, if we keep wal stats as part of PgStat_GlobalStats,
don't we need to add PgStat_ArchiverStats and PgStat_SLRUStats
to PgStat_GlobalStats too?

I have given the idea for wal_stats because there is just one counter
in that. I think you can just try to evaluate the merits of each
approach and choose whichever you feel is good. This is just a
suggestion, if you don't like it feel free to proceed with the current
approach.

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes, records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#45

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Fujii Masao (#44)

Re: New statistics for tuning WAL buffer size

At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/09/30 20:21, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes,
records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

FWIW I prefer to expose it by one SRF function rather than by
subdivided functions. One of the reasons is the less oid consumption
and/or reduction of definitions for intrinsic functions.

Another reason is at least for me subdivided functions are not useful
so much for on-the-fly examination on psql console. I'm often annoyed
by realizing I can't recall the exact name of a function, say,
pg_last_wal_receive_lsn or such but function names cannot be
auto-completed on psql console. "select proname from pg_proc where
proname like.. " is one of my friends:p On the other hand "select *
from pg_stat_wal" requires no detailed memory.

However subdivided functions might be useful if I wanted use just one
number of wal-stats in a function, I think it is not a major usage and
we can use a SQL query on the view instead.

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry. That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

regrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#46

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#45)

RE: New statistics for tuning WAL buffer size

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry. That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

+1
The views for troubleshooting performance problems should be as light as possible. IIRC, we saw frequently searching pg_stat_replication consume unexpectedly high CPU power, because it calls pg_stat_get_activity(null) to get all sessions and join them with the walsenders. At that time, we had hundreds of client sessions. We expected pg_stat_replication to be very lightweight because it provides information about a few walsenders.

Regards
Takayuki Tsunakawa

#47

Amit Kapila

amit.kapila16@gmail.com

over 5 years ago

In reply to: Kyotaro Horiguchi (#45)

Re: New statistics for tuning WAL buffer size

On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/09/30 20:21, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes,
records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

FWIW I prefer to expose it by one SRF function rather than by
subdivided functions. One of the reasons is the less oid consumption
and/or reduction of definitions for intrinsic functions.

Another reason is at least for me subdivided functions are not useful
so much for on-the-fly examination on psql console. I'm often annoyed
by realizing I can't recall the exact name of a function, say,
pg_last_wal_receive_lsn or such but function names cannot be
auto-completed on psql console. "select proname from pg_proc where
proname like.. " is one of my friends:p On the other hand "select *
from pg_stat_wal" requires no detailed memory.

However subdivided functions might be useful if I wanted use just one
number of wal-stats in a function, I think it is not a major usage and
we can use a SQL query on the view instead.

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry. That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

All these are good reasons to expose it via one function and I think
that is why most of our existing views also use one function approach.

--
With Regards,
Amit Kapila.

#48

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Amit Kapila (#47)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-10-01 11:33, Amit Kapila wrote:

On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao
<masao.fujii@oss.nttdata.com> wrote in

On 2020/09/30 20:21, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes,
records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

FWIW I prefer to expose it by one SRF function rather than by
subdivided functions. One of the reasons is the less oid consumption
and/or reduction of definitions for intrinsic functions.

Another reason is at least for me subdivided functions are not useful
so much for on-the-fly examination on psql console. I'm often annoyed
by realizing I can't recall the exact name of a function, say,
pg_last_wal_receive_lsn or such but function names cannot be
auto-completed on psql console. "select proname from pg_proc where
proname like.. " is one of my friends:p On the other hand "select *
from pg_stat_wal" requires no detailed memory.

However subdivided functions might be useful if I wanted use just one
number of wal-stats in a function, I think it is not a major usage and
we can use a SQL query on the view instead.

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry. That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

All these are good reasons to expose it via one function and I think
that is why most of our existing views also use one function approach.

Thanks for your comments.
I didn't notice there are the above disadvantages to provide individual
functions.

I changed the latest patch to expose it via one function.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0010_pg_stat_wal_view.patchtext/x-diff; name=0010_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 4e0193a967..495018009a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -424,6 +424,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_wal</structname><indexterm><primary>pg_stat_wal</primary></indexterm></entry>
+      <entry>One row only, showing statistics about WAL activity. See
+       <link linkend="monitoring-pg-stat-wal-view">
+       <structname>pg_stat_wal</structname></link> for details.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_database</structname><indexterm><primary>pg_stat_database</primary></indexterm></entry>
       <entry>One row per database, showing database-wide statistics. See
@@ -3280,6 +3288,56 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
 
  </sect2>
 
+ <sect2 id="monitoring-pg-stat-wal-view">
+   <title><structname>pg_stat_wal</structname></title>
+
+  <indexterm>
+   <primary>pg_stat_wal</primary>
+  </indexterm>
+
+  <para>
+   The <structname>pg_stat_wal</structname> view will always have a
+   single row, containing data about WAL activity of the cluster.
+  </para>
+
+  <table id="pg-stat-wal-view" xreflabel="pg_stat_wal">
+   <title><structname>pg_stat_wal</structname> View</title>
+   <tgroup cols="1">
+    <thead>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       Column Type
+      </para>
+      <para>
+       Description
+      </para></entry>
+     </row>
+    </thead>
+
+    <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_buffers_full</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Number of times WAL data was written to the disk because WAL buffers got full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
+      </para>
+      <para>
+       Time at which these statistics were last reset
+      </para></entry>
+     </row>
+     </tbody>
+   </tgroup>
+  </table>
+
+</sect2>
+
  <sect2 id="monitoring-pg-stat-database-view">
   <title><structname>pg_stat_database</structname></title>
 
@@ -4668,8 +4726,9 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
         argument.  The argument can be <literal>bgwriter</literal> to reset
         all the counters shown in
         the <structname>pg_stat_bgwriter</structname>
-        view, or <literal>archiver</literal> to reset all the counters shown in
-        the <structname>pg_stat_archiver</structname> view.
+        view, <literal>archiver</literal> to reset all the counters shown in
+        the <structname>pg_stat_archiver</structname> view or <literal>wal</literal>
+        to reset all the counters shown in the <structname>pg_stat_wal</structname> view.
        </para>
        <para>
         This function is restricted to superusers by default, but other users
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 79a77ebbfe..64403690da 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2195,6 +2195,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
 					WriteRqst.Flush = 0;
 					XLogWrite(WriteRqst, false);
 					LWLockRelease(WALWriteLock);
+					WalStats.m_wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
 				}
 				/* Re-acquire WALBufMappingLock and retry */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ed4f3f142d..923c2e2be1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -979,6 +979,12 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_alloc() AS buffers_alloc,
         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
 
+CREATE VIEW pg_stat_wal AS
+    SELECT
+        w.wal_buffers_full,
+        w.stats_reset
+    FROM pg_stat_get_wal() w;
+
 CREATE VIEW pg_stat_progress_analyze AS
     SELECT
         S.pid AS pid, S.datid AS datid, D.datname AS datname,
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3e7dcd4f76..429c8010ef 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -504,6 +504,9 @@ CheckpointerMain(void)
 		 */
 		pgstat_send_bgwriter();
 
+		/* Send WAL statistics to the stats collector. */
+		pgstat_send_wal();
+
 		/*
 		 * If any checkpoint flags have been set, redo the loop to handle the
 		 * checkpoint without sleeping.
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index e6be2b7836..9d8a435304 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -135,11 +135,12 @@ char	   *pgstat_stat_filename = NULL;
 char	   *pgstat_stat_tmpname = NULL;
 
 /*
- * BgWriter global statistics counters (unused in other processes).
- * Stored directly in a stats message structure so it can be sent
- * without needing to copy things around.  We assume this inits to zeroes.
+ * BgWriter and WAL global statistics counters.
+ * Stored directly in a stats message structure so they can be sent
+ * without needing to copy things around.  We assume these init to zeroes.
  */
 PgStat_MsgBgWriter BgWriterStats;
+PgStat_MsgWal WalStats;
 
 /*
  * List of SLRU names that we keep stats for.  There is no central registry of
@@ -281,6 +282,7 @@ static int	localNumBackends = 0;
  */
 static PgStat_ArchiverStats archiverStats;
 static PgStat_GlobalStats globalStats;
+static PgStat_WalStats walStats;
 static PgStat_SLRUStats slruStats[SLRU_NUM_ELEMENTS];
 
 /*
@@ -353,6 +355,7 @@ static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
+static void pgstat_recv_wal(PgStat_MsgWal *msg, int len);
 static void pgstat_recv_slru(PgStat_MsgSLRU *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
@@ -938,6 +941,9 @@ pgstat_report_stat(bool force)
 	/* Now, send function statistics */
 	pgstat_send_funcstats();
 
+	/* Send WAL statistics */
+	pgstat_send_wal();
+
 	/* Finally send SLRU statistics */
 	pgstat_send_slru();
 }
@@ -1370,11 +1376,13 @@ pgstat_reset_shared_counters(const char *target)
 		msg.m_resettarget = RESET_ARCHIVER;
 	else if (strcmp(target, "bgwriter") == 0)
 		msg.m_resettarget = RESET_BGWRITER;
+	else if (strcmp(target, "wal") == 0)
+		msg.m_resettarget = RESET_WAL;
 	else
 		ereport(ERROR,
 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 				 errmsg("unrecognized reset target: \"%s\"", target),
-				 errhint("Target must be \"archiver\" or \"bgwriter\".")));
+				 errhint("Target must be \"archiver\", \"bgwriter\" or \"wal\".")));
 
 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_RESETSHAREDCOUNTER);
 	pgstat_send(&msg, sizeof(msg));
@@ -2674,6 +2682,21 @@ pgstat_fetch_global(void)
 	return &globalStats;
 }
 
+/*
+ * ---------
+ * pgstat_fetch_stat_wal() -
+ *
+ *	Support function for the SQL-callable pgstat* functions. Returns
+ *	a pointer to the WAL statistics struct.
+ * ---------
+ */
+PgStat_WalStats *
+pgstat_fetch_stat_wal(void)
+{
+	backend_read_statsfile();
+
+	return &walStats;
+}
 
 /*
  * ---------
@@ -4419,6 +4442,38 @@ pgstat_send_bgwriter(void)
 	MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));
 }
 
+/* ----------
+ * pgstat_send_wal() -
+ *
+ *		Send WAL statistics to the collector
+ * ----------
+ */
+void
+pgstat_send_wal(void)
+{
+	/* We assume this initializes to zeroes */
+	static const PgStat_MsgWal all_zeroes;
+
+	/*
+	 * This function can be called even if nothing at all has happened. In
+	 * this case, avoid sending a completely empty message to the stats
+	 * collector.
+	 */
+	if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
+		return;
+
+	/*
+	 * Prepare and send the message
+	 */
+	pgstat_setheader(&WalStats.m_hdr, PGSTAT_MTYPE_WAL);
+	pgstat_send(&WalStats, sizeof(WalStats));
+
+	/*
+	 * Clear out the statistics buffer, so it can be re-used.
+	 */
+	MemSet(&WalStats, 0, sizeof(WalStats));
+}
+
 /* ----------
  * pgstat_send_slru() -
  *
@@ -4658,6 +4713,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_bgwriter(&msg.msg_bgwriter, len);
 					break;
 
+				case PGSTAT_MTYPE_WAL:
+					pgstat_recv_wal(&msg.msg_wal, len);
+					break;
+
 				case PGSTAT_MTYPE_SLRU:
 					pgstat_recv_slru(&msg.msg_slru, len);
 					break;
@@ -4927,6 +4986,12 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 	rc = fwrite(&archiverStats, sizeof(archiverStats), 1, fpout);
 	(void) rc;					/* we'll check for error with ferror */
 
+	/*
+	 * Write WAL stats struct
+	 */
+	rc = fwrite(&walStats, sizeof(walStats), 1, fpout);
+	(void) rc;					/* we'll check for error with ferror */
+
 	/*
 	 * Write SLRU stats struct
 	 */
@@ -5186,11 +5251,12 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 						 HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
 
 	/*
-	 * Clear out global and archiver statistics so they start from zero in
+	 * Clear out global, archiver, WAL and SLRU statistics so they start from zero in
 	 * case we can't load an existing statsfile.
 	 */
 	memset(&globalStats, 0, sizeof(globalStats));
 	memset(&archiverStats, 0, sizeof(archiverStats));
+	memset(&walStats, 0, sizeof(walStats));
 	memset(&slruStats, 0, sizeof(slruStats));
 
 	/*
@@ -5199,6 +5265,7 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 	 */
 	globalStats.stat_reset_timestamp = GetCurrentTimestamp();
 	archiverStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
+	walStats.stat_reset_timestamp = globalStats.stat_reset_timestamp;
 
 	/*
 	 * Set the same reset timestamp for all SLRU items too.
@@ -5268,6 +5335,17 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
 		goto done;
 	}
 
+	/*
+	 * Read WAL stats struct
+	 */
+	if (fread(&walStats, 1, sizeof(walStats), fpin) != sizeof(walStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		memset(&walStats, 0, sizeof(walStats));
+		goto done;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -5578,6 +5656,7 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 	PgStat_StatDBEntry dbentry;
 	PgStat_GlobalStats myGlobalStats;
 	PgStat_ArchiverStats myArchiverStats;
+	PgStat_WalStats myWalStats;
 	PgStat_SLRUStats mySLRUStats[SLRU_NUM_ELEMENTS];
 	FILE	   *fpin;
 	int32		format_id;
@@ -5633,6 +5712,17 @@ pgstat_read_db_statsfile_timestamp(Oid databaseid, bool permanent,
 		return false;
 	}
 
+	/*
+	 * Read WAL stats struct
+	 */
+	if (fread(&myWalStats, 1, sizeof(myWalStats), fpin) != sizeof(myWalStats))
+	{
+		ereport(pgStatRunningInCollector ? LOG : WARNING,
+				(errmsg("corrupted statistics file \"%s\"", statfile)));
+		FreeFile(fpin);
+		return false;
+	}
+
 	/*
 	 * Read SLRU stats struct
 	 */
@@ -6213,6 +6303,12 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
 		memset(&archiverStats, 0, sizeof(archiverStats));
 		archiverStats.stat_reset_timestamp = GetCurrentTimestamp();
 	}
+	else if (msg->m_resettarget == RESET_WAL)
+	{
+		/* Reset the WAL statistics for the cluster. */
+		memset(&walStats, 0, sizeof(walStats));
+		walStats.stat_reset_timestamp = GetCurrentTimestamp();
+	}
 
 	/*
 	 * Presumably the sender of this message validated the target, don't
@@ -6427,6 +6523,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_alloc += msg->m_buf_alloc;
 }
 
+/* ----------
+ * pgstat_recv_wal() -
+ *
+ *	Process a WAL message.
+ * ----------
+ */
+static void
+pgstat_recv_wal(PgStat_MsgWal *msg, int len)
+{
+	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+}
+
 /* ----------
  * pgstat_recv_slru() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 95738a4e34..24e191ea30 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1697,6 +1697,42 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(pgstat_fetch_global()->buf_alloc);
 }
 
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+#define PG_STAT_GET_WAL_COLS	2
+	TupleDesc	tupdesc;
+	Datum		values[PG_STAT_GET_WAL_COLS];
+	bool		nulls[PG_STAT_GET_WAL_COLS];
+	PgStat_WalStats *wal_stats;
+
+	/* Initialise values and NULL flags arrays */
+	MemSet(values, 0, sizeof(values));
+	MemSet(nulls, 0, sizeof(nulls));
+
+	/* Initialise attributes information in the tuple descriptor */
+	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_buffers_full",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "stats_reset",
+					   TIMESTAMPTZOID, -1, 0);
+
+	BlessTupleDesc(tupdesc);
+
+	/* Get statistics about WAL activity */
+	wal_stats = pgstat_fetch_stat_wal();
+
+	/* Fill values and NULLs */
+	values[0] = Int64GetDatum(wal_stats->wal_buffers_full);
+	values[1] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+
+	/* Returns the record as Datum */
+	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
+}
+
 /*
  * Returns statistics of SLRU caches.
  */
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f48f5fb4d9..d6f3e2d286 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5481,6 +5481,14 @@
   proname => 'pg_stat_get_buf_alloc', provolatile => 's', proparallel => 'r',
   prorettype => 'int8', proargtypes => '', prosrc => 'pg_stat_get_buf_alloc' },
 
+{ oid => '1136', descr => 'statistics: information about WAL activity',
+  proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
+  proparallel => 'r', prorettype => 'record', proargtypes => '',
+  proallargtypes => '{int8,timestamptz}',
+  proargmodes => '{o,o}',
+  proargnames => '{wal_buffers_full,stats_reset}',
+  prosrc => 'pg_stat_get_wal' },
+
 { oid => '2306', descr => 'statistics: information about SLRU caches',
   proname => 'pg_stat_get_slru', prorows => '100', proisstrict => 'f',
   proretset => 't', provolatile => 's', proparallel => 'r',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0dfbac46b4..343eef507e 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -61,6 +61,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
+	PGSTAT_MTYPE_WAL,
 	PGSTAT_MTYPE_SLRU,
 	PGSTAT_MTYPE_FUNCSTAT,
 	PGSTAT_MTYPE_FUNCPURGE,
@@ -122,7 +123,8 @@ typedef struct PgStat_TableCounts
 typedef enum PgStat_Shared_Reset_Target
 {
 	RESET_ARCHIVER,
-	RESET_BGWRITER
+	RESET_BGWRITER,
+	RESET_WAL
 } PgStat_Shared_Reset_Target;
 
 /* Possible object types for resetting single counters */
@@ -436,6 +438,16 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_checkpoint_sync_time;
 } PgStat_MsgBgWriter;
 
+/* ----------
+ * PgStat_MsgWal			Sent by backends and background processes to update WAL statistics.
+ * ----------
+ */
+typedef struct PgStat_MsgWal
+{
+	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_buffers_full;
+} PgStat_MsgWal;
+
 /* ----------
  * PgStat_MsgSLRU			Sent by a backend to update SLRU statistics.
  * ----------
@@ -596,6 +608,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
+	PgStat_MsgWal msg_wal;
 	PgStat_MsgSLRU msg_slru;
 	PgStat_MsgFuncstat msg_funcstat;
 	PgStat_MsgFuncpurge msg_funcpurge;
@@ -614,7 +627,7 @@ typedef union PgStat_Msg
  * ------------------------------------------------------------
  */
 
-#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9D
+#define PGSTAT_FILE_FORMAT_ID	0x01A5BC9E
 
 /* ----------
  * PgStat_StatDBEntry			The collector's data per database
@@ -745,6 +758,15 @@ typedef struct PgStat_GlobalStats
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
+/*
+ * WAL statistics kept in the stats collector
+ */
+typedef struct PgStat_WalStats
+{
+	PgStat_Counter wal_buffers_full;
+	TimestampTz stat_reset_timestamp;
+} PgStat_WalStats;
+
 /*
  * SLRU statistics kept in the stats collector
  */
@@ -1265,6 +1287,11 @@ extern char *pgstat_stat_filename;
  */
 extern PgStat_MsgBgWriter BgWriterStats;
 
+/*
+ * WAL statistics counter is updated by backends and background processes
+ */
+extern PgStat_MsgWal WalStats;
+
 /*
  * Updated by pgstat_count_buffer_*_time macros
  */
@@ -1464,6 +1491,7 @@ extern void pgstat_twophase_postabort(TransactionId xid, uint16 info,
 
 extern void pgstat_send_archiver(const char *xlog, bool failed);
 extern void pgstat_send_bgwriter(void);
+extern void pgstat_send_wal(void);
 
 /* ----------
  * Support functions for the SQL-callable functions to
@@ -1478,6 +1506,7 @@ extern PgStat_StatFuncEntry *pgstat_fetch_stat_funcentry(Oid funcid);
 extern int	pgstat_fetch_stat_numbackends(void);
 extern PgStat_ArchiverStats *pgstat_fetch_stat_archiver(void);
 extern PgStat_GlobalStats *pgstat_fetch_global(void);
+extern PgStat_WalStats *pgstat_fetch_stat_wal(void);
 extern PgStat_SLRUStats *pgstat_fetch_slru(void);
 
 extern void pgstat_count_slru_page_zeroed(int slru_idx);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 2a18dc423e..af4192f9a8 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2129,6 +2129,9 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
+pg_stat_wal| SELECT w.wal_buffers_full,
+    w.stats_reset
+   FROM pg_stat_get_wal() w(wal_buffers_full, stats_reset);
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 1cffc3349d..81bdacf59d 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -76,6 +76,13 @@ select count(*) >= 0 as ok from pg_prepared_xacts;
  t
 (1 row)
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+ ok 
+----
+ t
+(1 row)
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';
diff --git a/src/test/regress/sql/sysviews.sql b/src/test/regress/sql/sysviews.sql
index ac4a0e1cbb..b9b875bc6a 100644
--- a/src/test/regress/sql/sysviews.sql
+++ b/src/test/regress/sql/sysviews.sql
@@ -37,6 +37,9 @@ select count(*) = 0 as ok from pg_prepared_statements;
 -- See also prepared_xacts.sql
 select count(*) >= 0 as ok from pg_prepared_xacts;
 
+-- There must be only one record
+select count(*) = 1 as ok from pg_stat_wal;
+
 -- This is to record the prevailing planner enable_foo settings during
 -- a regression test run.
 select name, setting from pg_settings where name like 'enable%';

#49

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#48)

Re: New statistics for tuning WAL buffer size

On 2020/10/01 12:56, Masahiro Ikeda wrote:

On 2020-10-01 11:33, Amit Kapila wrote:

On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/09/30 20:21, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes,
records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

FWIW I prefer to expose it by one SRF function rather than by
subdivided functions.ï¿½ One of the reasons is the less oid consumption
and/or reduction of definitions for intrinsic functions.

Another reason is at least for me subdivided functions are not useful
so much for on-the-fly examination on psql console.ï¿½ I'm often annoyed
by realizing I can't recall the exact name of a function, say,
pg_last_wal_receive_lsn or such but function names cannot be
auto-completed on psql console. "select proname from pg_proc where
proname like.. " is one of my friends:p On the other hand "select *
from pg_stat_wal" requires no detailed memory.

However subdivided functions might be useful if I wanted use just one
number of wal-stats in a function, I think it is not a major usage and
we can use a SQL query on the view instead.

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry.ï¿½ That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

All these are good reasons to expose it via one function and I think

Understood. +1 to expose it as one function.

that is why most of our existing views also use one function approach.

Thanks for your comments.
I didn't notice there are the above disadvantages to provide individual functions.

I changed the latest patch to expose it via one function.

Thanks for updating the patch! LGTM.
Barring any other objection, I will commit it.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#50

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: tsunakawa.takay@fujitsu.com (#46)

Re: New statistics for tuning WAL buffer size

On 2020/10/01 10:50, tsunakawa.takay@fujitsu.com wrote:

From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry. That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

+1
The views for troubleshooting performance problems should be as light as possible. IIRC, we saw frequently searching pg_stat_replication consume unexpectedly high CPU power, because it calls pg_stat_get_activity(null) to get all sessions and join them with the walsenders. At that time, we had hundreds of client sessions. We expected pg_stat_replication to be very lightweight because it provides information about a few walsenders.

I think that we can improve that, for example, by storing backend id
into WalSndCtl and making pg_stat_get_wal_senders() directly
get the walsender's LocalPgBackendStatus with the backend id,
rather than joining pg_stat_get_activity() and pg_stat_get_wal_senders().

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#51

tsunakawa.takay@fujitsu.com

over 5 years ago

In reply to: Fujii Masao (#50)

RE: New statistics for tuning WAL buffer size

From: Fujii Masao <masao.fujii@oss.nttdata.com>

I think that we can improve that, for example, by storing backend id
into WalSndCtl and making pg_stat_get_wal_senders() directly
get the walsender's LocalPgBackendStatus with the backend id,
rather than joining pg_stat_get_activity() and pg_stat_get_wal_senders().

Yeah, I had something like that in mind. I think I'll take note of this as my private homework. (Of course, anyone can do it.)

Regards
Takayuki Tsunakawa

#52

Fujii Masao

masao.fujii@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#49)

Re: New statistics for tuning WAL buffer size

On 2020/10/01 13:35, Fujii Masao wrote:

On 2020/10/01 12:56, Masahiro Ikeda wrote:

On 2020-10-01 11:33, Amit Kapila wrote:

On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in

On 2020/09/30 20:21, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes,
records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

FWIW I prefer to expose it by one SRF function rather than by
subdivided functions.ï¿½ One of the reasons is the less oid consumption
and/or reduction of definitions for intrinsic functions.

Another reason is at least for me subdivided functions are not useful
so much for on-the-fly examination on psql console.ï¿½ I'm often annoyed
by realizing I can't recall the exact name of a function, say,
pg_last_wal_receive_lsn or such but function names cannot be
auto-completed on psql console. "select proname from pg_proc where
proname like.. " is one of my friends:p On the other hand "select *
from pg_stat_wal" requires no detailed memory.

However subdivided functions might be useful if I wanted use just one
number of wal-stats in a function, I think it is not a major usage and
we can use a SQL query on the view instead.

Another reason that I mildly want to object to subdivided functions is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry.ï¿½ That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

All these are good reasons to expose it via one function and I think

Understood. +1 to expose it as one function.

that is why most of our existing views also use one function approach.

Thanks for your comments.
I didn't notice there are the above disadvantages to provide individual functions.

I changed the latest patch to expose it via one function.

Thanks for updating the patch! LGTM.
Barring any other objection, I will commit it.

I updated typedefs.list and pushed the patch. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#53

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Fujii Masao (#52)

Re: New statistics for tuning WAL buffer size

On 2020-10-02 10:21, Fujii Masao wrote:

On 2020/10/01 13:35, Fujii Masao wrote:

On 2020/10/01 12:56, Masahiro Ikeda wrote:

On 2020-10-01 11:33, Amit Kapila wrote:

On Thu, Oct 1, 2020 at 6:53 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Thu, 1 Oct 2020 09:05:19 +0900, Fujii Masao
<masao.fujii@oss.nttdata.com> wrote in

On 2020/09/30 20:21, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 9:23 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:

On 2020/09/29 11:51, Masahiro Ikeda wrote:

On 2020-09-29 11:43, Amit Kapila wrote:

On Tue, Sep 29, 2020 at 7:39 AM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

Thanks for your suggestion.
I understood that the point is that WAL-related stats have just one
counter now.

Since we may add some WAL-related stats like pgWalUsage.(bytes,
records, fpi),
I think that the current approach is good.

+1

Okay, it makes sense to keep it in the current form if we have a plan
to extend this view with additional stats. However, why don't we
expose it with a function similar to pg_stat_get_archiver() instead of
providing individual functions like pg_stat_get_wal_buffers_full() and
pg_stat_get_wal_stat_reset_time?

We can adopt either of those approaches for pg_stat_wal. I think that
the former is a bit more flexible because we can collect only one of
WAL information even when pg_stat_wal will contain many information
in the future, by using the function. But you thought there are some
reasons that the latter is better for pg_stat_wal?

FWIW I prefer to expose it by one SRF function rather than by
subdivided functions. One of the reasons is the less oid
consumption
and/or reduction of definitions for intrinsic functions.

Another reason is at least for me subdivided functions are not
useful
so much for on-the-fly examination on psql console. I'm often
annoyed
by realizing I can't recall the exact name of a function, say,
pg_last_wal_receive_lsn or such but function names cannot be
auto-completed on psql console. "select proname from pg_proc where
proname like.. " is one of my friends:p On the other hand "select *
from pg_stat_wal" requires no detailed memory.

However subdivided functions might be useful if I wanted use just
one
number of wal-stats in a function, I think it is not a major usage
and
we can use a SQL query on the view instead.

Another reason that I mildly want to object to subdivided functions
is
I was annoyed that a stats view makes many individual calls to
functions that internally share the same statistics entry. That
behavior required me to provide an entry-caching feature to my
shared-memory statistics patch.

All these are good reasons to expose it via one function and I think

Understood. +1 to expose it as one function.

that is why most of our existing views also use one function
approach.

Thanks for your comments.
I didn't notice there are the above disadvantages to provide
individual functions.

I changed the latest patch to expose it via one function.

Thanks for updating the patch! LGTM.
Barring any other objection, I will commit it.

I updated typedefs.list and pushed the patch. Thanks!

Thanks to all reviewers!

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#54

Masahiro Ikeda

ikedamsh@oss.nttdata.com

over 5 years ago

In reply to: Masahiro Ikeda (#53)

Re: New statistics for tuning WAL buffer size

Hi,

I think it's better to add other WAL statistics to the pg_stat_wal view.
I'm thinking to add the following statistics. Please let me know your
thoughts.

1. Basic wal statistics

* wal_records: Total number of WAL records generated
* wal_fpi: Total number of WAL full page images generated
* wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN, auto_explain,
autovacuum
and pg_stat_statements now, if users want to see the performance trends
for the entire database,
they must preprocess the statistics.

Is it useful to add the sum of the above statistics to the pg_stat_wal
view?

2. Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

Just idea; it may be worth exposing the number of when new WAL file is
created and zero-filled. This initialization may have impact on the
performance of write-heavy workload generating lots of WAL. If this
number is reported high, to reduce the number of this initialization,
we can tune WAL-related parameters so that more "recycled" WAL files
can be hold.

3. Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high　relative to the startup time, "wal_segment_size"
must be bigger.

4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and "commit_delay"
for query executions.
If the number of WAL is flushed is high, users can know
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.

I think it's better to separate this for backends and wal writer.

5. Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher,　users can detect the possibility of
disk failure.

Since users can see how much flash time occupies of the query execution
time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this
for backends and wal writer.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#55

Masahiro Ikeda

ikedamsh@oss.nttdata.com

about 5 years ago

In reply to: Masahiro Ikeda (#54)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-10-06 15:57, Masahiro Ikeda wrote:

Hi,

I think it's better to add other WAL statistics to the pg_stat_wal
view.
I'm thinking to add the following statistics. Please let me know your
thoughts.

1. Basic wal statistics

* wal_records: Total number of WAL records generated
* wal_fpi: Total number of WAL full page images generated
* wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN,
auto_explain, autovacuum
and pg_stat_statements now, if users want to see the performance
trends for the entire database,
they must preprocess the statistics.

Is it useful to add the sum of the above statistics to the pg_stat_wal
view?

2. Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

Just idea; it may be worth exposing the number of when new WAL file is
created and zero-filled. This initialization may have impact on the
performance of write-heavy workload generating lots of WAL. If this
number is reported high, to reduce the number of this initialization,
we can tune WAL-related parameters so that more "recycled" WAL files
can be hold.

3. Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high　relative to the startup time, "wal_segment_size"
must be bigger.

4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and "commit_delay"
for query executions.
If the number of WAL is flushed is high, users can know
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.

I think it's better to separate this for backends and wal writer.

5. Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher,　users can detect the possibility of
disk failure.

Since users can see how much flash time occupies of the query execution
time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this
for backends and wal writer.

I made a patch for collecting the above statistics.
If you have any comments, please let me know.

I think it's better to separate some statistics for backend and
backgrounds because
tuning target parameters like "synchronous_commit", "wal_writer_delay"
and so on are different.
But first, I want to get a consensus to collect them.

Best regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0001_add_statistics_to_pg_stat_wal_view.patchtext/x-diff; name=0001_add_statistics_to_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6656676..31a37de 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3406,12 +3406,95 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
     </thead>
 
     <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_records</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL records generated
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_fpi</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL full page images generated
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_bytes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of WAL bytes generated
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>wal_buffers_full</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of times WAL data was written to the disk because WAL buffers got full
+       Total number of WAL data written to the disk because WAL buffers got full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_init</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL file segment created or opened a pre-existing one
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_init_zero</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL file segment created to be filled with zeroes
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data written to the disk
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write_time</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of time that has been spent in the portion of
+       WAL data was written to disk, in milliseconds
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_sync</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data synced to the disk
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_sync_time</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of time that has been spent in the portion of
+       WAL data was synced to disk, in milliseconds
       </para></entry>
      </row>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 52a67b1..840e684 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1262,6 +1262,10 @@ XLogInsertRecord(XLogRecData *rdata,
 		pgWalUsage.wal_bytes += rechdr->xl_tot_len;
 		pgWalUsage.wal_records++;
 		pgWalUsage.wal_fpi += num_fpi;
+
+		WalStats.m_wal_bytes += rechdr->xl_tot_len;
+		WalStats.m_wal_records++;
+		WalStats.m_wal_fpi += num_fpi;
 	}
 
 	return EndPos;
@@ -2422,6 +2426,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 	int			npages;
 	int			startidx;
 	uint32		startoffset;
+	TimestampTz start_time;
+	TimestampTz end_time;
+	long		secs;
+	int			usecs;
 
 	/* We should always be inside a critical section here */
 	Assert(CritSectionCount > 0);
@@ -2535,9 +2543,11 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			do
 			{
 				errno = 0;
+				start_time = GetCurrentTimestamp();
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
 				pgstat_report_wait_end();
+				end_time = GetCurrentTimestamp();
 				if (written <= 0)
 				{
 					char		xlogfname[MAXFNAMELEN];
@@ -2559,6 +2569,13 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 				nleft -= written;
 				from += written;
 				startoffset += written;
+
+				/* Accumulate write time in milliseconds. */
+				TimestampDifference(start_time,
+									end_time,
+									&secs, &usecs);
+				WalStats.m_wal_write_time += ((int) secs * 1000) + (usecs / 1000);
+				WalStats.m_wal_write++;
 			} while (nleft > 0);
 
 			npages = 0;
@@ -2578,7 +2595,16 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			 */
 			if (finishing_seg)
 			{
+				start_time = GetCurrentTimestamp();
 				issue_xlog_fsync(openLogFile, openLogSegNo);
+				end_time = GetCurrentTimestamp();
+
+				/* Accumulate write time in milliseconds. */
+				TimestampDifference(start_time,
+									end_time,
+									&secs, &usecs);
+				WalStats.m_wal_sync_time += ((int) secs * 1000) + (usecs / 1000);
+				WalStats.m_wal_sync++;
 
 				/* signal that we need to wakeup walsenders later */
 				WalSndWakeupRequest();
@@ -2649,7 +2675,16 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 				ReserveExternalFD();
 			}
 
+			start_time = GetCurrentTimestamp();
 			issue_xlog_fsync(openLogFile, openLogSegNo);
+			end_time = GetCurrentTimestamp();
+
+			/* Accumulate write time in milliseconds. */
+			TimestampDifference(start_time,
+								end_time,
+								&secs, &usecs);
+			WalStats.m_wal_sync_time += ((int) secs * 1000) + (usecs / 1000);
+			WalStats.m_wal_sync++;
 		}
 
 		/* signal that we need to wakeup walsenders later */
@@ -3278,7 +3313,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 		else
+		{
+			WalStats.m_wal_init++;
 			return fd;
+		}
 	}
 
 	/*
@@ -3325,6 +3363,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 				break;
 			}
 		}
+		WalStats.m_wal_init_zero++;
 	}
 	else
 	{
@@ -3418,6 +3457,7 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 				 errmsg("could not open file \"%s\": %m", path)));
 
 	elog(DEBUG2, "done creating and filling new WAL file");
+	WalStats.m_wal_init++;
 
 	return fd;
 }
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c293907..b880533 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -990,7 +990,16 @@ CREATE VIEW pg_stat_bgwriter AS
 
 CREATE VIEW pg_stat_wal AS
     SELECT
+        w.wal_records,
+        w.wal_fpi,
+        w.wal_bytes,
         w.wal_buffers_full,
+        w.wal_init,
+        w.wal_init_zero,
+        w.wal_write,
+        w.wal_write_time,
+        w.wal_sync,
+        w.wal_sync_time,
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0eb..2cf85b6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -6756,7 +6756,16 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 static void
 pgstat_recv_wal(PgStat_MsgWal *msg, int len)
 {
+	walStats.wal_records += msg->m_wal_records;
+	walStats.wal_fpi += msg->m_wal_fpi;
+	walStats.wal_bytes += msg->m_wal_bytes;
 	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+	walStats.wal_init += msg->m_wal_init;
+	walStats.wal_init_zero += msg->m_wal_init_zero;
+	walStats.wal_write += msg->m_wal_write;
+	walStats.wal_write_time += msg->m_wal_write_time;
+	walStats.wal_sync += msg->m_wal_sync;
+	walStats.wal_sync_time += msg->m_wal_sync_time;
 }
 
 /* ----------
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a52832f..ce9f4b7 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -253,6 +253,9 @@ WalWriterMain(void)
 		else if (left_till_hibernate > 0)
 			left_till_hibernate--;
 
+		/* Send WAL statistics */
+		pgstat_send_wal();
+
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
 		 * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0d0d2e6..8ed25d7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1703,7 +1703,7 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_wal(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_WAL_COLS	2
+#define PG_STAT_GET_WAL_COLS	11
 	TupleDesc	tupdesc;
 	Datum		values[PG_STAT_GET_WAL_COLS];
 	bool		nulls[PG_STAT_GET_WAL_COLS];
@@ -1715,9 +1715,27 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 
 	/* Initialise attributes information in the tuple descriptor */
 	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_buffers_full",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "wal_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_init",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_init_zero",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_write_time",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "wal_sync",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "wal_sync_time",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 11, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 
 	BlessTupleDesc(tupdesc);
@@ -1726,8 +1744,17 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 	wal_stats = pgstat_fetch_stat_wal();
 
 	/* Fill values and NULLs */
-	values[0] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[1] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+	values[0] = Int64GetDatum(wal_stats->wal_records);
+	values[1] = Int64GetDatum(wal_stats->wal_fpi);
+	values[2] = Int64GetDatum(wal_stats->wal_bytes);
+	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
+	values[4] = Int64GetDatum(wal_stats->wal_init);
+	values[5] = Int64GetDatum(wal_stats->wal_init_zero);
+	values[6] = Int64GetDatum(wal_stats->wal_write);
+	values[7] = Int64GetDatum(wal_stats->wal_write_time);
+	values[8] = Int64GetDatum(wal_stats->wal_sync);
+	values[9] = Int64GetDatum(wal_stats->wal_sync_time);
+	values[10] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340ba..ee55eb1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5492,9 +5492,9 @@
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int8,timestamptz}',
-  proargmodes => '{o,o}',
-  proargnames => '{wal_buffers_full,stats_reset}',
+  proallargtypes => '{int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_init,wal_init_zero,wal_write,wal_write_time,wal_sync,wal_sync_time,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
 
 { oid => '2306', descr => 'statistics: information about SLRU caches',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4..b35ae99 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -459,7 +459,16 @@ typedef struct PgStat_MsgBgWriter
 typedef struct PgStat_MsgWal
 {
 	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_records;
+	PgStat_Counter m_wal_fpi;
+	PgStat_Counter m_wal_bytes;
 	PgStat_Counter m_wal_buffers_full;
+	PgStat_Counter m_wal_init;
+	PgStat_Counter m_wal_init_zero;
+	PgStat_Counter m_wal_write;
+	PgStat_Counter m_wal_write_time;	/* accumulate times in milliseconds */
+	PgStat_Counter m_wal_sync;
+	PgStat_Counter m_wal_sync_time; /* accumulate times in milliseconds */
 } PgStat_MsgWal;
 
 /* ----------
@@ -795,7 +804,22 @@ typedef struct PgStat_GlobalStats
  */
 typedef struct PgStat_WalStats
 {
+	PgStat_Counter wal_records;
+	PgStat_Counter wal_fpi;
+	PgStat_Counter wal_bytes;
 	PgStat_Counter wal_buffers_full;
+	PgStat_Counter wal_init;
+	PgStat_Counter wal_init_zero;
+
+	/*
+	 * TODO: Is it better to separate following metrics
+	 * for backends and background processes?
+	 */
+	PgStat_Counter wal_write;
+	PgStat_Counter wal_write_time;
+	PgStat_Counter wal_sync;
+	PgStat_Counter wal_sync_time;
+
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index cf2a9b4..b15f04c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2135,9 +2135,18 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
-pg_stat_wal| SELECT w.wal_buffers_full,
+pg_stat_wal| SELECT w.wal_records,
+    w.wal_fpi,
+    w.wal_bytes,
+    w.wal_buffers_full,
+    w.wal_init,
+    w.wal_init_zero,
+    w.wal_write,
+    w.wal_write_time,
+    w.wal_sync,
+    w.wal_sync_time,
     w.stats_reset
-   FROM pg_stat_get_wal() w(wal_buffers_full, stats_reset);
+   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_init, wal_init_zero, wal_write, wal_write_time, wal_sync, wal_sync_time, stats_reset);
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,

#56

Fujii Masao

masao.fujii@oss.nttdata.com

about 5 years ago

In reply to: Masahiro Ikeda (#55)

Re: New statistics for tuning WAL buffer size

On 2020/10/13 11:57, Masahiro Ikeda wrote:

On 2020-10-06 15:57, Masahiro Ikeda wrote:

Hi,

I think it's better to add other WAL statistics to the pg_stat_wal view.
I'm thinking to add the following statistics. Please let me know your thoughts.

1. Basic wal statistics

* wal_records: Total number of WAL records generated
* wal_fpi: Total number of WAL full page images generated
* wal_bytes: Total amount of WAL bytes generated

To understand DB's performance, first, we will check the performance
trends for the entire database instance.
For example, if the number of wal_fpi becomes higher, users may tune
"wal_compression", "checkpoint_timeout" and so on.

Although users can check the above statistics via EXPLAIN,
auto_explain, autovacuum
and pg_stat_statements now, if users want to see the performance
trends for the entire database,
they must preprocess the statistics.

Is it useful to add the sum of the above statistics to the pg_stat_wal view?

2. Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

Just idea; it may be worth exposing the number of when new WAL file is created and zero-filled. This initialization may have impact on the performance of write-heavy workload generating lots of WAL. If this number is reported high, to reduce the number of this initialization, we can tune WAL-related parameters so that more "recycled" WAL files can be hold.

But it might be better to track the number of when new WAL file is
created whether it's zero-filled or not, if file creation and sync itself
takes time.

3. Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high　relative to the startup time, "wal_segment_size"
must be bigger.

You're thinking to count all the WAL file switch? That number is equal
to the number of WAL files generated since the last reset of pg_stat_wal?

4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and "commit_delay"
for query executions.
If the number of WAL is flushed is high, users can know
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for performance.

I think it's better to separate this for backends and wal writer.

5. Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher,　users can detect the possibility of
disk failure.

This should be tracked, e.g., only when track_io_timing is enabled?
Otherwise, tracking that may cause performance overhead.

Since users can see how much flash time occupies of the query execution time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this
for backends and wal writer.

I'm afraid that this counter for a backend may be a bit confusing. Because
when the counter indicates small time, we may think that walwriter almost
write WAL data and a backend doesn't take time to write WAL. But a backend
may be just waiting for walwriter to write WAL.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#57

Masahiro Ikeda

ikedamsh@oss.nttdata.com

about 5 years ago

In reply to: Fujii Masao (#56)

1 attachment(s)

Re: New statistics for tuning WAL buffer size

On 2020-10-15 19:49, Fujii Masao wrote:

On 2020/10/13 11:57, Masahiro Ikeda wrote:

On 2020-10-06 15:57, Masahiro Ikeda wrote:

2. Number of when new WAL file is created and zero-filled.

As Fujii-san already commented, I think it's good for tuning.

Just idea; it may be worth exposing the number of when new WAL file
is created and zero-filled. This initialization may have impact on
the performance of write-heavy workload generating lots of WAL. If
this number is reported high, to reduce the number of this
initialization, we can tune WAL-related parameters so that more
"recycled" WAL files can be hold.

+1

But it might be better to track the number of when new WAL file is
created whether it's zero-filled or not, if file creation and sync
itself
takes time.

OK, I changed to track the number of when new WAL file is created.

3. Number of when to switch the WAL logfile segment.

This is similar to 2, but this counts the number of when WAL file is
recylcled too.
I think it's useful for tuning "wal_segment_size"
if the number is high　relative to the startup time,
"wal_segment_size"
must be bigger.

You're thinking to count all the WAL file switch? That number is equal
to the number of WAL files generated since the last reset of
pg_stat_wal?

Yes. I think it might be better to count it because I think the ratio in
which a new WAL file is created is important.
To calculate it, we need the count all the WAL file switch.

4. Number of when WAL is flushed

I think it's useful for tuning "synchronous_commit" and
"commit_delay"
for query executions.
If the number of WAL is flushed is high, users can know
"synchronous_commit" is useful for the workload.

Also, it's useful for tuning "wal_writer_delay" and
"wal_writer_flush_after" for wal writer.
If the number is high, users can change the parameter for
performance.

I think it's better to separate this for backends and wal writer.

+1

Thanks, I separated the statistics for backends and wal writer.
When checkpointer process flushes the WAL, the statistics for backends
are counted now.
Although I think its impact is not big, is it better to make statistics
for checkpointer?

5. Wait time when WAL is flushed.

This is the accumulated time when wal is flushed.
If the time becomes much higher,　users can detect the possibility of
disk failure.

This should be tracked, e.g., only when track_io_timing is enabled?
Otherwise, tracking that may cause performance overhead.

OK, I changed the implementation.

Since users can see how much flash time occupies of the query
execution time,
it may lead to query tuning and so on.

Since there is the above reason, I think it's better to separate this
for backends and wal writer.

I'm afraid that this counter for a backend may be a bit confusing.
Because
when the counter indicates small time, we may think that walwriter
almost
write WAL data and a backend doesn't take time to write WAL. But a
backend
may be just waiting for walwriter to write WAL.

Thanks for your comments. I agreed.

Now, the following is the view implemented in the attached patch.
If you have any other comments, please let me know.

```
postgres=# SELECT * FROM pg_stat_wal;
-[ RECORD 1 ]-------+------------------------------
wal_records | 1000128 # Total number of WAL records
generated
wal_fpi | 1 # Total number of WAL full page
images generated
wal_bytes | 124013682 #Total amount of WAL bytes generated
wal_buffers_full | 7952 #Total number of WAL data written to
the disk because WAL buffers got full
wal_file | 14 #Total number of WAL file segment created or
opened a pre-existing one
wal_init_file | 7 #Total number of WAL file segment created
wal_write_backend | 7956 #Total number of WAL data written to the
disk by backends
wal_write_walwriter | 27　　　　　#Total number of WAL data written to the
disk by walwriter
wal_write_time | 40 # Total amount of time that has been spent
in the portion of WAL data was written to disk by backend and walwriter,
in milliseconds
wal_sync_backend | 1 # Total number of WAL data synced to the disk
by backends
wal_sync_walwriter | 6 #Total number of WAL data synced to the disk
by walwriter
wal_sync_time | 0 # Total amount of time that has been spent in
the portion of WAL data was synced to disk by backend and walwriter, in
milliseconds
stats_reset | 2020-10-16 19:41:01.892272+09
```

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Attachments:

0002_add_statistics_to_pg_stat_wal_view.patchtext/x-diff; name=0002_add_statistics_to_pg_stat_wal_view.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6656676..ee70b7b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3406,12 +3406,115 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
     </thead>
 
     <tbody>
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_records</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL records generated
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_fpi</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL full page images generated
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_bytes</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of WAL bytes generated
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>wal_buffers_full</structfield> <type>bigint</type>
       </para>
       <para>
-       Number of times WAL data was written to the disk because WAL buffers got full
+       Total number of WAL data written to the disk because WAL buffers got full
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_file</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL file segment created or opened a pre-existing one
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_init_file</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL file segment created
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write_backend</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data written to the disk by backends
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write_walwriter</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data written to the disk by walwriter
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_write_time</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of time that has been spent in the portion of
+       WAL data was written to disk by backend and walwriter, in milliseconds
+       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_sync_backend</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data synced to the disk by backends
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_sync_walwriter</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total number of WAL data synced to the disk by walwriter
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wal_sync_time</structfield> <type>bigint</type>
+      </para>
+      <para>
+       Total amount of time that has been spent in the portion of
+       WAL data was synced to disk by backend and walwriter, in milliseconds
+       (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero)
       </para></entry>
      </row>
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 52a67b1..f7316d2 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1262,6 +1262,10 @@ XLogInsertRecord(XLogRecData *rdata,
 		pgWalUsage.wal_bytes += rechdr->xl_tot_len;
 		pgWalUsage.wal_records++;
 		pgWalUsage.wal_fpi += num_fpi;
+
+		WalStats.m_wal_bytes += rechdr->xl_tot_len;
+		WalStats.m_wal_records++;
+		WalStats.m_wal_fpi += num_fpi;
 	}
 
 	return EndPos;
@@ -2527,6 +2531,8 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			Size		nbytes;
 			Size		nleft;
 			int			written;
+			instr_time	start;
+			instr_time	duration;
 
 			/* OK to write the page(s) */
 			from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
@@ -2535,9 +2541,28 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
 			do
 			{
 				errno = 0;
+				if (track_io_timing)
+					INSTR_TIME_SET_CURRENT(start);
+
 				pgstat_report_wait_start(WAIT_EVENT_WAL_WRITE);
 				written = pg_pwrite(openLogFile, from, nleft, startoffset);
 				pgstat_report_wait_end();
+
+				if (track_io_timing)
+				{
+					INSTR_TIME_SET_CURRENT(duration);
+					INSTR_TIME_SUBTRACT(duration, start);
+					WalStats.m_wal_write_time += INSTR_TIME_GET_MILLISEC(duration);
+				}
+
+				if (AmWalWriterProcess()){
+					WalStats.m_wal_write_walwriter++;
+				}
+				else
+				{
+					WalStats.m_wal_write_backend++;
+				}
+
 				if (written <= 0)
 				{
 					char		xlogfname[MAXFNAMELEN];
@@ -3278,7 +3303,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 						 errmsg("could not open file \"%s\": %m", path)));
 		}
 		else
+		{
+			WalStats.m_wal_file++;
 			return fd;
+		}
 	}
 
 	/*
@@ -3418,6 +3446,8 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 				 errmsg("could not open file \"%s\": %m", path)));
 
 	elog(DEBUG2, "done creating and filling new WAL file");
+	WalStats.m_wal_init_file++;
+	WalStats.m_wal_file++;
 
 	return fd;
 }
@@ -10427,8 +10457,13 @@ assign_xlog_sync_method(int new_sync_method, void *extra)
 void
 issue_xlog_fsync(int fd, XLogSegNo segno)
 {
+	instr_time	start;
+	instr_time	duration;
 	char	   *msg = NULL;
 
+	if (track_io_timing)
+		INSTR_TIME_SET_CURRENT(start);
+
 	pgstat_report_wait_start(WAIT_EVENT_WAL_SYNC);
 	switch (sync_method)
 	{
@@ -10472,6 +10507,21 @@ issue_xlog_fsync(int fd, XLogSegNo segno)
 	}
 
 	pgstat_report_wait_end();
+
+	if (track_io_timing)
+	{
+		INSTR_TIME_SET_CURRENT(duration);
+		INSTR_TIME_SUBTRACT(duration, start);
+		WalStats.m_wal_sync_time += INSTR_TIME_GET_MICROSEC(duration);
+	}
+
+	if (AmWalWriterProcess()){
+		WalStats.m_wal_sync_walwriter++;
+	}
+	else
+	{
+		WalStats.m_wal_sync_backend++;
+	}
 }
 
 /*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c293907..f0acfa8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -990,7 +990,18 @@ CREATE VIEW pg_stat_bgwriter AS
 
 CREATE VIEW pg_stat_wal AS
     SELECT
+        w.wal_records,
+        w.wal_fpi,
+        w.wal_bytes,
         w.wal_buffers_full,
+        w.wal_file,
+        w.wal_init_file,
+        w.wal_write_backend,
+        w.wal_write_walwriter,
+        w.wal_write_time,
+        w.wal_sync_backend,
+        w.wal_sync_walwriter,
+        w.wal_sync_time,
         w.stats_reset
     FROM pg_stat_get_wal() w;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 822f0eb..f74b431 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -6756,7 +6756,18 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 static void
 pgstat_recv_wal(PgStat_MsgWal *msg, int len)
 {
+	walStats.wal_records += msg->m_wal_records;
+	walStats.wal_fpi += msg->m_wal_fpi;
+	walStats.wal_bytes += msg->m_wal_bytes;
 	walStats.wal_buffers_full += msg->m_wal_buffers_full;
+	walStats.wal_file += msg->m_wal_file;
+	walStats.wal_init_file += msg->m_wal_init_file;
+	walStats.wal_write_backend += msg->m_wal_write_backend;
+	walStats.wal_write_walwriter += msg->m_wal_write_walwriter;
+	walStats.wal_write_time += msg->m_wal_write_time;
+	walStats.wal_sync_backend += msg->m_wal_sync_backend;
+	walStats.wal_sync_walwriter += msg->m_wal_sync_walwriter;
+	walStats.wal_sync_time += msg->m_wal_sync_time;
 }
 
 /* ----------
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index a52832f..ce9f4b7 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -253,6 +253,9 @@ WalWriterMain(void)
 		else if (left_till_hibernate > 0)
 			left_till_hibernate--;
 
+		/* Send WAL statistics */
+		pgstat_send_wal();
+
 		/*
 		 * Sleep until we are signaled or WalWriterDelay has elapsed.  If we
 		 * haven't done anything useful for quite some time, lengthen the
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0d0d2e6..e21f698 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1703,7 +1703,7 @@ pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_wal(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_WAL_COLS	2
+#define PG_STAT_GET_WAL_COLS	13
 	TupleDesc	tupdesc;
 	Datum		values[PG_STAT_GET_WAL_COLS];
 	bool		nulls[PG_STAT_GET_WAL_COLS];
@@ -1715,9 +1715,31 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 
 	/* Initialise attributes information in the tuple descriptor */
 	tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_buffers_full",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
 					   INT8OID, -1, 0);
-	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "stats_reset",
+	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "wal_bytes",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_file",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_init_file",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_backend",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_write_walwriter",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 9, "wal_write_time",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 10, "wal_sync_backend",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 11, "wal_sync_walwriter",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 12, "wal_sync_time",
+					   INT8OID, -1, 0);
+	TupleDescInitEntry(tupdesc, (AttrNumber) 13, "stats_reset",
 					   TIMESTAMPTZOID, -1, 0);
 
 	BlessTupleDesc(tupdesc);
@@ -1726,8 +1748,19 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
 	wal_stats = pgstat_fetch_stat_wal();
 
 	/* Fill values and NULLs */
-	values[0] = Int64GetDatum(wal_stats->wal_buffers_full);
-	values[1] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+	values[0] = Int64GetDatum(wal_stats->wal_records);
+	values[1] = Int64GetDatum(wal_stats->wal_fpi);
+	values[2] = Int64GetDatum(wal_stats->wal_bytes);
+	values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
+	values[4] = Int64GetDatum(wal_stats->wal_file);
+	values[5] = Int64GetDatum(wal_stats->wal_init_file);
+	values[6] = Int64GetDatum(wal_stats->wal_write_backend);
+	values[7] = Int64GetDatum(wal_stats->wal_write_walwriter);
+	values[8] = Int64GetDatum(wal_stats->wal_write_time);
+	values[9] = Int64GetDatum(wal_stats->wal_sync_backend);
+	values[10] = Int64GetDatum(wal_stats->wal_sync_walwriter);
+	values[11] = Int64GetDatum(wal_stats->wal_sync_time);
+	values[12] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
 
 	/* Returns the record as Datum */
 	PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 22340ba..f8c3ccb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5492,9 +5492,9 @@
 { oid => '1136', descr => 'statistics: information about WAL activity',
   proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
   proparallel => 'r', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int8,timestamptz}',
-  proargmodes => '{o,o}',
-  proargnames => '{wal_buffers_full,stats_reset}',
+  proallargtypes => '{int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,int8,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_file,wal_init_file,wal_write_backend,wal_write_walwriter,wal_write_time,wal_sync_backend,wal_sync_walwriter,wal_sync_time,stats_reset}',
   prosrc => 'pg_stat_get_wal' },
 
 { oid => '2306', descr => 'statistics: information about SLRU caches',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a821ff4..25490ed 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -459,7 +459,18 @@ typedef struct PgStat_MsgBgWriter
 typedef struct PgStat_MsgWal
 {
 	PgStat_MsgHdr m_hdr;
+	PgStat_Counter m_wal_records;
+	PgStat_Counter m_wal_fpi;
+	PgStat_Counter m_wal_bytes;
 	PgStat_Counter m_wal_buffers_full;
+	PgStat_Counter m_wal_file;
+	PgStat_Counter m_wal_init_file;
+	PgStat_Counter m_wal_write_backend;
+	PgStat_Counter m_wal_write_walwriter;
+	PgStat_Counter m_wal_write_time;	/* accumulate times in milliseconds */
+	PgStat_Counter m_wal_sync_backend;
+	PgStat_Counter m_wal_sync_walwriter;
+	PgStat_Counter m_wal_sync_time; /* accumulate times in milliseconds */
 } PgStat_MsgWal;
 
 /* ----------
@@ -795,7 +806,19 @@ typedef struct PgStat_GlobalStats
  */
 typedef struct PgStat_WalStats
 {
+	PgStat_Counter wal_records;
+	PgStat_Counter wal_fpi;
+	PgStat_Counter wal_bytes;
 	PgStat_Counter wal_buffers_full;
+	PgStat_Counter wal_file;
+	PgStat_Counter wal_init_file;
+	PgStat_Counter wal_write_backend;
+	PgStat_Counter wal_write_walwriter;
+	PgStat_Counter wal_write_time;
+	PgStat_Counter wal_sync_backend;
+	PgStat_Counter wal_sync_walwriter;
+	PgStat_Counter wal_sync_time;
+
 	TimestampTz stat_reset_timestamp;
 } PgStat_WalStats;
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index cf2a9b4..019fd50 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2135,9 +2135,20 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.autoanalyze_count
    FROM pg_stat_all_tables
   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));
-pg_stat_wal| SELECT w.wal_buffers_full,
+pg_stat_wal| SELECT w.wal_records,
+    w.wal_fpi,
+    w.wal_bytes,
+    w.wal_buffers_full,
+    w.wal_file,
+    w.wal_init_file,
+    w.wal_write_backend,
+    w.wal_write_walwriter,
+    w.wal_write_time,
+    w.wal_sync_backend,
+    w.wal_sync_walwriter,
+    w.wal_sync_time,
     w.stats_reset
-   FROM pg_stat_get_wal() w(wal_buffers_full, stats_reset);
+   FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_file, wal_init_file, wal_write_backend, wal_write_walwriter, wal_write_time, wal_sync_backend, wal_sync_walwriter, wal_sync_time, stats_reset);
 pg_stat_wal_receiver| SELECT s.pid,
     s.status,
     s.receive_start_lsn,