per backend WAL statistics
Hi hackers,
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics), we can more easily add per backend statistics.
Please find attached a patch to implement $SUBJECT.
It's using the same layer as pg_stat_wal, except that it is now possible to know
how much WAL activity is happening in each backend rather than an overall
aggregate of all the activity.
A function called pg_stat_get_backend_wal() is added to access this data
depending on the PID of a backend.
=== Outcome ===
With this in place, one could for example:
1. Get the WAL statistics for a backend pid:
postgres=# select * from pg_stat_get_backend_wal(473278);
-[ RECORD 1 ]----+---------
wal_records | 300008
wal_fpi | 7
wal_bytes | 17753097
wal_buffers_full | 933
wal_write | 937
wal_sync | 2
wal_write_time | 0
wal_sync_time | 0
stats_reset |
2. Get the wal_bytes generated by application:
postgres=# SELECT application_name, wal_bytes, round(100 * wal_bytes/sum(wal_bytes) over(),2) AS "%"
FROM (SELECT application_name, sum(wal_bytes) AS wal_bytes
FROM pg_stat_activity, pg_stat_get_backend_wal(pid)
WHERE wal_bytes != 0 GROUP BY application_name);
application_name | wal_bytes | %
------------------+-----------+-------
app1 | 17708761 | 39.95
app2 | 26614797 | 60.05
(2 rows)
3. Get the wal_bytes generated by database:
postgres=# SELECT datname, wal_bytes, round(100 * wal_bytes/sum(wal_bytes) over(),2) AS "%"
FROM (SELECT datname, sum(wal_bytes) AS wal_bytes
FROM pg_stat_activity, pg_stat_get_backend_wal(pid)
WHERE wal_bytes != 0 GROUP BY datname);
datname | wal_bytes | %
---------+-----------+-------
db1 | 35461858 | 80.01
db2 | 8861700 | 19.99
(2 rows)
and much more...
=== Implementation ===
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
The patch is made of 3 sub-patches:
0001: to extract the logic filling pg_stat_get_wal()'s tuple into its own routine.
It adds pg_stat_wal_build_tuple(), a helper routine for pg_stat_get_wal(), that
fills its tuple based on the contents of PgStat_WalStats. Same idea as ff7c40d7fd.
0002: PGSTAT_KIND_BACKEND code refactoring. It refactors some come related to per
backend statistics. It makes the code more generic or more IO statistics focused
as it will be used in 0003 that will introduce per backend WAL statistics. It does
not add any new feature, that's 100% code refactoring to ease 0003 review.
0003: it adds the per backend WAL statistics and the new pg_stat_get_backend_wal()
function, documentation and related test.
=== Remarks ===
R1:
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
R2:
Instead of relying on a new PendingBackendWalStats, we could rely on the
existing PendingWalStats variable. But that would complicate the flush of
per backend and existing wal stats as that would need some coordination. I think
that it's better that each kind has its own pending variable.
R3:
Instead of incrementing the PendingBackendWalStats members individually we could
also "just" assign the PendingWalStats ones once incremented. I thought it's
better to make them "fully independent" though.
R4:
0002 introduces a new PgStat_BackendPending struct. Due to R1, that's not needed
per say but could have been if pgstat_prep_backend_pending() would have been
used. I keep this change as we may want to add more per backend stats in the future.
Looking forward to your feedback,
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From 5c2241fcee52f7fb3cfea9890d495f51b981a9c0 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v1 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 48 +++++++++++++++++++----------
1 file changed, 32 insertions(+), 16 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3245f3a8d8..cccfbf0706 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1560,17 +1560,19 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS] = {0};
bool nulls[PG_STAT_GET_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
@@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats->wal_write);
- values[5] = Int64GetDatum(wal_stats->wal_sync);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_stats.wal_write);
+ values[5] = Int64GetDatum(wal_stats.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
- values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[8] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v1-0002-PGSTAT_KIND_BACKEND-code-refactoring.patchtext/x-diff; charset=us-asciiDownload
From 29d2d565ba106028907524134b267998d36e9762 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 08:44:29 +0000
Subject: [PATCH v1 2/3] PGSTAT_KIND_BACKEND code refactoring
This commit refactors some come related to per backend statistics. It makes
the code more generic or more IO statistics focused as it will be used in a
follow-up commit that will introduce per backend WAL statistics.
---
src/backend/utils/activity/pgstat.c | 2 +-
src/backend/utils/activity/pgstat_backend.c | 60 +++++++++++++-------
src/backend/utils/activity/pgstat_io.c | 8 +--
src/backend/utils/activity/pgstat_relation.c | 4 +-
src/backend/utils/adt/pgstatfuncs.c | 2 +-
src/include/pgstat.h | 6 +-
src/include/utils/pgstat_internal.h | 5 +-
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 57 insertions(+), 31 deletions(-)
81.9% src/backend/utils/activity/
9.0% src/include/utils/
5.3% src/include/
3.6% src/
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 16a03b8ce1..34520535d5 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -370,7 +370,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Backend),
.shared_data_off = offsetof(PgStatShared_Backend, stats),
.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
- .pending_size = sizeof(PgStat_BackendPendingIO),
+ .pending_size = sizeof(PgStat_BackendPending),
.flush_pending_cb = pgstat_backend_flush_cb,
.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 1f91bfef0a..2bdaa07828 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -39,23 +39,23 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
}
/*
- * Flush out locally pending backend statistics
- *
- * If no stats have been recorded, this function returns false.
+ * Flush out locally pending backend IO statistics.
*/
-bool
-pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+static void
+pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool need_lock)
{
- PgStatShared_Backend *shbackendioent;
- PgStat_BackendPendingIO *pendingent;
+ PgStatShared_Backend *shbackendent;
+ PgStat_BackendPending *pendingent;
PgStat_BktypeIO *bktype_shstats;
+ PgStat_BackendPendingIO *pending_io;
- if (!pgstat_lock_entry(entry_ref, nowait))
- return false;
+ if (need_lock && !pgstat_lock_entry(entry_ref, nowait))
+ return;
- shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
- bktype_shstats = &shbackendioent->stats.stats;
- pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ pendingent = (PgStat_BackendPending *) entry_ref->pending;
+ bktype_shstats = &shbackendent->stats.io_stats;
+ pending_io = &pendingent->pending_io;
for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
{
@@ -66,9 +66,9 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
instr_time time;
bktype_shstats->counts[io_object][io_context][io_op] +=
- pendingent->counts[io_object][io_context][io_op];
+ pending_io->counts[io_object][io_context][io_op];
- time = pendingent->pending_times[io_object][io_context][io_op];
+ time = pending_io->pending_times[io_object][io_context][io_op];
bktype_shstats->times[io_object][io_context][io_op] +=
INSTR_TIME_GET_MICROSEC(time);
@@ -76,16 +76,37 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
}
}
+ if (need_lock)
+ pgstat_unlock_entry(entry_ref);
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+ if (!pgstat_tracks_backend_bktype(MyBackendType))
+ return false;
+
+ if (!pgstat_lock_entry(entry_ref, nowait))
+ return false;
+
+ /* IO stats */
+ pgstat_flush_io_entry(entry_ref, nowait, false);
+
pgstat_unlock_entry(entry_ref);
return true;
}
/*
- * Simpler wrapper of pgstat_backend_flush_cb()
+ * Simpler wrapper of pgstat_flush_io_entry()
*/
void
-pgstat_flush_backend(bool nowait)
+pgstat_backend_flush_io(bool nowait)
{
PgStat_EntryRef *entry_ref;
@@ -94,7 +115,8 @@ pgstat_flush_backend(bool nowait)
entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
MyProcNumber, false, NULL);
- (void) pgstat_backend_flush_cb(entry_ref, nowait);
+
+ pgstat_flush_io_entry(entry_ref, nowait, true);
}
/*
@@ -119,9 +141,9 @@ pgstat_create_backend(ProcNumber procnum)
}
/*
- * Find or create a local PgStat_BackendPendingIO entry for proc number.
+ * Find or create a local PgStat_BackendPending entry for proc number.
*/
-PgStat_BackendPendingIO *
+PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
{
PgStat_EntryRef *entry_ref;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9a1f91dba..a7445995d3 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -81,10 +81,10 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- entry_ref->counts[io_object][io_context][io_op] += cnt;
+ entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
}
PendingIOStats.counts[io_object][io_context][io_op] += cnt;
@@ -151,10 +151,10 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+ INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
io_time);
}
}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 2cc304f881..6092826479 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
* VACUUM command has processed all tables and committed.
*/
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_backend_flush_io(false);
}
/*
@@ -351,7 +351,7 @@ pgstat_report_analyze(Relation rel,
/* see pgstat_report_vacuum() */
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_backend_flush_io(false);
}
/*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cccfbf0706..cd3baed472 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1544,7 +1544,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
if (bktype == B_INVALID)
return (Datum) 0;
- bktype_stats = &backend_stats->stats;
+ bktype_stats = &backend_stats->io_stats;
/*
* In Assert builds, we can afford an extra loop through all of the
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0d8427f27d..6631bd2d73 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -381,7 +381,7 @@ typedef PgStat_PendingIO PgStat_BackendPendingIO;
typedef struct PgStat_Backend
{
TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO stats;
+ PgStat_BktypeIO io_stats;
} PgStat_Backend;
typedef struct PgStat_StatDBEntry
@@ -523,6 +523,10 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+typedef struct PgStat_BackendPending
+{
+ PgStat_BackendPendingIO pending_io;
+} PgStat_BackendPending;
/*
* Functions in pgstat.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 52eb008710..320cd5a842 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -613,9 +613,8 @@ extern void pgstat_archiver_snapshot_cb(void);
* Functions in pgstat_backend.c
*/
-extern void pgstat_flush_backend(bool nowait);
-
-extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern void pgstat_backend_flush_io(bool nowait);
+extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e1c4f913f8..1c8516fd63 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2139,6 +2139,7 @@ PgStatShared_Subscription
PgStatShared_Wal
PgStat_ArchiverStats
PgStat_Backend
+PgStat_BackendPending
PgStat_BackendPendingIO
PgStat_BackendSubEntry
PgStat_BgWriterStats
--
2.34.1
v1-0003-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From f42d48b841507d5510e0964fb12690d638fa62f4 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v1 3/3] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/config.sgml | 4 +-
doc/src/sgml/monitoring.sgml | 19 ++++
src/backend/access/transam/xlog.c | 36 ++++++-
src/backend/utils/activity/pgstat_backend.c | 108 +++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 6 +-
src/backend/utils/adt/pgstatfuncs.c | 78 +++++++++++---
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 37 +++++--
src/include/utils/pgstat_internal.h | 1 +
src/test/regress/expected/stats.out | 14 +++
src/test/regress/sql/stats.sql | 6 ++
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 289 insertions(+), 29 deletions(-)
9.9% doc/src/sgml/
8.6% src/backend/access/transam/
34.7% src/backend/utils/activity/
27.6% src/backend/utils/adt/
4.9% src/include/catalog/
6.6% src/include/
3.9% src/test/regress/expected/
3.0% src/test/regress/sql/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index fbdd6ce574..ec253414a7 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8433,7 +8433,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
measure the overhead of timing on your system.
I/O timing information is
displayed in <link linkend="monitoring-pg-stat-wal-view">
- <structname>pg_stat_wal</structname></link>.
+ <structname>pg_stat_wal</structname></link> and in the output of the
+ <link linkend="pg-stat-get-backend-wal">
+ <function>pg_stat_get_backend_wal()</function></link> function.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d0d176cc54..84a2d09b76 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4811,6 +4811,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b9ea92a542..513e26aa4b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2058,6 +2058,10 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
XLogWrite(WriteRqst, tli, false);
LWLockRelease(WALWriteLock);
PendingWalStats.wal_buffers_full++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ PendingBackendWalStats.wal_buffers_full++;
+
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
/* Re-acquire WALBufMappingLock and retry */
@@ -2426,11 +2430,14 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
Size nleft;
ssize_t written;
instr_time start;
+ instr_time end;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
nleft = nbytes;
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
do
{
errno = 0;
@@ -2451,14 +2458,26 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
}
PendingWalStats.wal_write++;
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ /*
+ * We are inside a critical section, so we can't use
+ * pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+ PendingBackendWalStats.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_write_time,
+ end, start);
+ }
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -8684,8 +8703,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
{
char *msg = NULL;
instr_time start;
+ instr_time end;
Assert(tli != 0);
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
/*
* Quick exit if fsync is disabled or write() has already synced the WAL
@@ -8751,13 +8773,19 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
}
PendingWalStats.wal_sync++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendWalStats.wal_sync++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_sync_time, end, start);
+ }
}
/*
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 2bdaa07828..28fb7a38d7 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -24,6 +24,16 @@
#include "utils/pgstat_internal.h"
+PgStat_PendingWalStats PendingBackendWalStats = {0};
+
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Returns statistics of a backend by proc number.
*/
@@ -80,6 +90,80 @@ pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool need_lock)
pgstat_unlock_entry(entry_ref);
}
+/*
+ * To determine whether any WAL activity has occurred since last time, not
+ * only the number of generated WAL records but also the numbers of WAL
+ * writes and syncs need to be checked. Because even transaction that
+ * generates no WAL records can write or sync WAL data when flushing the
+ * data pages.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
+ PendingBackendWalStats.wal_write != 0 ||
+ PendingBackendWalStats.wal_sync != 0;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics.
+ */
+static void
+pgstat_flush_wal_entry(PgStat_EntryRef *entry_ref, bool nowait, bool need_lock)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ if (need_lock && !pgstat_lock_entry(entry_ref, nowait))
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_stats;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+#define WALSTAT_ACC_INSTR_TIME(fld) \
+ (bktype_shstats->fld += INSTR_TIME_GET_MICROSEC(PendingBackendWalStats.fld))
+ WALSTAT_ACC(wal_buffers_full, PendingBackendWalStats);
+ WALSTAT_ACC(wal_write, PendingBackendWalStats);
+ WALSTAT_ACC(wal_sync, PendingBackendWalStats);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+ WALSTAT_ACC_INSTR_TIME(wal_write_time);
+ WALSTAT_ACC_INSTR_TIME(wal_sync_time);
+#undef WALSTAT_ACC_INSTR_TIME
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+
+ /*
+ * Clear out the statistics buffer, so it can be re-used.
+ */
+ MemSet(&PendingBackendWalStats, 0, sizeof(PendingWalStats));
+
+ if (need_lock)
+ pgstat_unlock_entry(entry_ref);
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -97,6 +181,9 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
/* IO stats */
pgstat_flush_io_entry(entry_ref, nowait, false);
+ /* WAL stats */
+ pgstat_flush_wal_entry(entry_ref, nowait, false);
+
pgstat_unlock_entry(entry_ref);
return true;
@@ -119,6 +206,23 @@ pgstat_backend_flush_io(bool nowait)
pgstat_flush_io_entry(entry_ref, nowait, true);
}
+/*
+ * Simpler wrapper of pgstat_flush_wal_entry()
+ */
+void
+pgstat_backend_flush_wal(bool nowait)
+{
+ PgStat_EntryRef *entry_ref;
+
+ if (!pgstat_tracks_backend_bktype(MyBackendType))
+ return;
+
+ entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+ MyProcNumber, false, NULL);
+
+ pgstat_flush_wal_entry(entry_ref, nowait, true);
+}
+
/*
* Create backend statistics entry for proc number.
*/
@@ -138,10 +242,12 @@ pgstat_create_backend(ProcNumber procnum)
* e.g. if we previously used this proc number.
*/
memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
- * Find or create a local PgStat_BackendPending entry for proc number.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
*/
PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 18fa6b2936..160ce99a4c 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -55,6 +55,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_backend_flush_wal(nowait);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
@@ -117,9 +119,9 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
#define WALSTAT_ACC_INSTR_TIME(fld) \
- (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
+ (stats_shmem->stats.wal_counters.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index cd3baed472..593f04dc2a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1562,11 +1562,12 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
@@ -1598,26 +1599,26 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats.wal_write);
- values[5] = Int64GetDatum(wal_stats.wal_sync);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_counters.wal_write);
+ values[5] = Int64GetDatum(wal_counters.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_counters.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_counters.wal_sync_time) / 1000.0);
- if (wal_stats.stat_reset_timestamp != 0)
- values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[8] = true;
@@ -1625,6 +1626,55 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_stats;
+
+ /* save tuples with data from this PgStat_BktypeIO */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
+
/*
* Returns statistics of WAL activity
*/
@@ -1636,7 +1686,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters, wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..72a5dae4b1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5929,6 +5929,13 @@
proargmodes => '{o,o,o,o,o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6631bd2d73..045877c5a8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -378,12 +378,6 @@ typedef struct PgStat_IO
/* Backend statistics store the same amount of IO data as PGSTAT_KIND_IO */
typedef PgStat_PendingIO PgStat_BackendPendingIO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -495,7 +489,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter autoanalyze_count;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
@@ -505,6 +499,11 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_sync;
PgStat_Counter wal_write_time;
PgStat_Counter wal_sync_time;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
@@ -523,9 +522,21 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_stats;
+} PgStat_Backend;
+
typedef struct PgStat_BackendPending
{
PgStat_BackendPendingIO pending_io;
+
+ /*
+ * We are not creating one member for PgStat_PendingWalStats. See the
+ * comment above the PendingBackendWalStats definition as to why.
+ */
} PgStat_BackendPending;
/*
@@ -857,5 +868,17 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
/* updated directly by backends and background processes */
extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
+/*
+ * Variables in pgstat_backend.c
+ */
+
+/* updated directly by backends and background processes */
+
+/*
+ * WAL pending statistics are incremented inside a critical section
+ * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;
#endif /* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 320cd5a842..06681b9102 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -614,6 +614,7 @@ extern void pgstat_archiver_snapshot_cb(void);
*/
extern void pgstat_backend_flush_io(bool nowait);
+extern void pgstat_backend_flush_wal(bool nowait);
extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208..cc01fdf274 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf..28fe0a1a7d 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 1c8516fd63..fe616a5770 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2158,6 +2158,7 @@ PgStat_KindInfo
PgStat_LocalState
PgStat_PendingDroppedStatsItem
PgStat_PendingIO
+PgStat_PendingBackendWalStats
PgStat_PendingWalStats
PgStat_SLRUStats
PgStat_ShmemControl
@@ -2174,6 +2175,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
On Tue, Jan 07, 2025 at 08:48:51AM +0000, Bertrand Drouvot wrote:
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics), we can more easily add per backend statistics.Please find attached a patch to implement $SUBJECT.
I've looked at v1-0002 and v1-0001.
+static void
+pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool need_lock)
{
- PgStatShared_Backend *shbackendioent;
- PgStat_BackendPendingIO *pendingent;
+ PgStatShared_Backend *shbackendent;
+ PgStat_BackendPending *pendingent;
PgStat_BktypeIO *bktype_shstats;
+ PgStat_BackendPendingIO *pending_io;
- if (!pgstat_lock_entry(entry_ref, nowait))
- return false;
+ if (need_lock && !pgstat_lock_entry(entry_ref, nowait))
+ return;
The addition of need_lock at this level leads to a result that seems a
bit confusing, where pgstat_backend_flush_cb() passes "false" because
it locks the entry by itself as an effect of v1-0003 with the new area
for WAL. Wouldn't it be cleaner to do an extra pgstat_[un]lock_entry
dance in pgstat_backend_flush_io() instead? Another approach I can
think of that would be slightly cleaner to me is to pass a bits32 to a
single routine that would control if WAL stats, I/O stats or both
should be flushed, keeping pgstat_flush_backend() as name with an
extra argument to decide which parts of the stats should be flushed.
-PgStat_BackendPendingIO *
+PgStat_BackendPending *
This rename makes sense.
#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS] = {0};
bool nulls[PG_STAT_GET_WAL_COLS] = {0};
It feels unnatural to have a PG_STAT_GET_WAL_COLS while it would not
only relate to this function anymore.
--
Michael
Hi,
On Wed, Jan 08, 2025 at 03:21:26PM +0900, Michael Paquier wrote:
On Tue, Jan 07, 2025 at 08:48:51AM +0000, Bertrand Drouvot wrote:
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics), we can more easily add per backend statistics.Please find attached a patch to implement $SUBJECT.
I've looked at v1-0002 and v1-0001.
Thanks for looking at it!
+static void +pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool need_lock) { - PgStatShared_Backend *shbackendioent; - PgStat_BackendPendingIO *pendingent; + PgStatShared_Backend *shbackendent; + PgStat_BackendPending *pendingent; PgStat_BktypeIO *bktype_shstats; + PgStat_BackendPendingIO *pending_io;- if (!pgstat_lock_entry(entry_ref, nowait)) - return false; + if (need_lock && !pgstat_lock_entry(entry_ref, nowait)) + return;The addition of need_lock at this level leads to a result that seems a
bit confusing, where pgstat_backend_flush_cb() passes "false" because
it locks the entry by itself as an effect of v1-0003 with the new area
for WAL. Wouldn't it be cleaner to do an extra pgstat_[un]lock_entry
dance in pgstat_backend_flush_io() instead? Another approach I can
think of that would be slightly cleaner to me is to pass a bits32 to a
single routine that would control if WAL stats, I/O stats or both
should be flushed, keeping pgstat_flush_backend() as name with an
extra argument to decide which parts of the stats should be flushed.
Yeah, that's more elegant as it also means that the main callback will not change
(should we add even more stats in the future). Done that way in v2 attached.
-PgStat_BackendPendingIO * +PgStat_BackendPending *This rename makes sense.
#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS] = {0};
bool nulls[PG_STAT_GET_WAL_COLS] = {0};It feels unnatural to have a PG_STAT_GET_WAL_COLS while it would not
only relate to this function anymore.
Has been renamed to PG_STAT_WAL_COLS in the attached.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v2-0001-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From 190422763353e87f5208e8f6d1aee823eb7860e5 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v2 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 56 ++++++++++++++++++-----------
1 file changed, 36 insertions(+), 20 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3245f3a8d8..7309f06993 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1560,20 +1560,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 9
+#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats->wal_write);
- values[5] = Int64GetDatum(wal_stats->wal_sync);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_stats.wal_write);
+ values[5] = Int64GetDatum(wal_stats.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
- values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[8] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v2-0002-PGSTAT_KIND_BACKEND-code-refactoring.patchtext/x-diff; charset=us-asciiDownload
From 8477a4931e7dc61a42d8c08ccc13d6f12d7e5707 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 08:44:29 +0000
Subject: [PATCH v2 2/3] PGSTAT_KIND_BACKEND code refactoring
This commit refactors some come related to per backend statistics. It makes
the code more generic or more IO statistics focused as it will be used in a
follow-up commit that will introduce per backend WAL statistics.
---
src/backend/utils/activity/pgstat.c | 2 +-
src/backend/utils/activity/pgstat_backend.c | 88 +++++++++++++-------
src/backend/utils/activity/pgstat_io.c | 8 +-
src/backend/utils/activity/pgstat_relation.c | 4 +-
src/backend/utils/adt/pgstatfuncs.c | 2 +-
src/include/pgstat.h | 6 +-
src/include/utils/pgstat_internal.h | 5 +-
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 76 insertions(+), 40 deletions(-)
85.6% src/backend/utils/activity/
7.2% src/include/utils/
4.2% src/include/
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 16a03b8ce1..34520535d5 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -370,7 +370,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Backend),
.shared_data_off = offsetof(PgStatShared_Backend, stats),
.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
- .pending_size = sizeof(PgStat_BackendPendingIO),
+ .pending_size = sizeof(PgStat_BackendPending),
.flush_pending_cb = pgstat_backend_flush_cb,
.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 1f91bfef0a..0ae507160d 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -24,6 +24,12 @@
#include "utils/pgstat_internal.h"
+/* flag bits for different types of statistics to flush */
+#define PGSTAT_FLUSH_IO (1 << 0) /* Flush I/O statistics */
+#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO)
+
+static void pgstat_flush_io_entry(PgStat_EntryRef *entry_ref);
+
/*
* Returns statistics of a backend by proc number.
*/
@@ -39,23 +45,49 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
}
/*
- * Flush out locally pending backend statistics
- *
- * If no stats have been recorded, this function returns false.
+ * Main function to flush backend statistics.
+ * The "stats_to_flush" parameter controls which statistics to flush.
*/
-bool
-pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+static bool
+pgstat_flush_backend(PgStat_EntryRef *entry_ref, bool nowait,
+ bits32 stats_to_flush)
{
- PgStatShared_Backend *shbackendioent;
- PgStat_BackendPendingIO *pendingent;
- PgStat_BktypeIO *bktype_shstats;
+ if (!pgstat_tracks_backend_bktype(MyBackendType))
+ return false;
+
+ /* Get our own entry_ref if not provided */
+ if (!entry_ref)
+ entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+ MyProcNumber, false, NULL);
if (!pgstat_lock_entry(entry_ref, nowait))
return false;
- shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
- bktype_shstats = &shbackendioent->stats.stats;
- pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+ /* Flush requested statistics */
+ if (stats_to_flush & PGSTAT_FLUSH_IO)
+ pgstat_flush_io_entry(entry_ref);
+
+ pgstat_unlock_entry(entry_ref);
+
+ return true;
+}
+
+/*
+ * Flush out locally pending backend IO statistics.
+ * Locking is managed by the caller.
+ */
+static void
+pgstat_flush_io_entry(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_BackendPending *pendingent;
+ PgStat_BktypeIO *bktype_shstats;
+ PgStat_BackendPendingIO *pending_io;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ pendingent = (PgStat_BackendPending *) entry_ref->pending;
+ bktype_shstats = &shbackendent->stats.io_stats;
+ pending_io = &pendingent->pending_io;
for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
{
@@ -66,35 +98,35 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
instr_time time;
bktype_shstats->counts[io_object][io_context][io_op] +=
- pendingent->counts[io_object][io_context][io_op];
+ pending_io->counts[io_object][io_context][io_op];
- time = pendingent->pending_times[io_object][io_context][io_op];
+ time = pending_io->pending_times[io_object][io_context][io_op];
bktype_shstats->times[io_object][io_context][io_op] +=
INSTR_TIME_GET_MICROSEC(time);
}
}
}
+}
- pgstat_unlock_entry(entry_ref);
-
- return true;
+/*
+ * Flush out locally pending backend statistics
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+ return pgstat_flush_backend(entry_ref, nowait, PGSTAT_FLUSH_ALL);
}
/*
- * Simpler wrapper of pgstat_backend_flush_cb()
+ * Convenience wrapper to flush I/O statistics.
*/
void
-pgstat_flush_backend(bool nowait)
+pgstat_backend_flush_io(bool nowait)
{
- PgStat_EntryRef *entry_ref;
-
- if (!pgstat_tracks_backend_bktype(MyBackendType))
- return;
-
- entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
- MyProcNumber, false, NULL);
- (void) pgstat_backend_flush_cb(entry_ref, nowait);
+ pgstat_flush_backend(NULL, nowait, PGSTAT_FLUSH_IO);
}
/*
@@ -119,9 +151,9 @@ pgstat_create_backend(ProcNumber procnum)
}
/*
- * Find or create a local PgStat_BackendPendingIO entry for proc number.
+ * Find or create a local PgStat_BackendPending entry for proc number.
*/
-PgStat_BackendPendingIO *
+PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
{
PgStat_EntryRef *entry_ref;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9a1f91dba..a7445995d3 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -81,10 +81,10 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- entry_ref->counts[io_object][io_context][io_op] += cnt;
+ entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
}
PendingIOStats.counts[io_object][io_context][io_op] += cnt;
@@ -151,10 +151,10 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+ INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
io_time);
}
}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 2cc304f881..6092826479 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
* VACUUM command has processed all tables and committed.
*/
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_backend_flush_io(false);
}
/*
@@ -351,7 +351,7 @@ pgstat_report_analyze(Relation rel,
/* see pgstat_report_vacuum() */
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_backend_flush_io(false);
}
/*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 7309f06993..8a4340e977 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1544,7 +1544,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
if (bktype == B_INVALID)
return (Datum) 0;
- bktype_stats = &backend_stats->stats;
+ bktype_stats = &backend_stats->io_stats;
/*
* In Assert builds, we can afford an extra loop through all of the
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0d8427f27d..6631bd2d73 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -381,7 +381,7 @@ typedef PgStat_PendingIO PgStat_BackendPendingIO;
typedef struct PgStat_Backend
{
TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO stats;
+ PgStat_BktypeIO io_stats;
} PgStat_Backend;
typedef struct PgStat_StatDBEntry
@@ -523,6 +523,10 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+typedef struct PgStat_BackendPending
+{
+ PgStat_BackendPendingIO pending_io;
+} PgStat_BackendPending;
/*
* Functions in pgstat.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 52eb008710..320cd5a842 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -613,9 +613,8 @@ extern void pgstat_archiver_snapshot_cb(void);
* Functions in pgstat_backend.c
*/
-extern void pgstat_flush_backend(bool nowait);
-
-extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern void pgstat_backend_flush_io(bool nowait);
+extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f83ecf181..f15526236a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2140,6 +2140,7 @@ PgStatShared_Subscription
PgStatShared_Wal
PgStat_ArchiverStats
PgStat_Backend
+PgStat_BackendPending
PgStat_BackendPendingIO
PgStat_BackendSubEntry
PgStat_BgWriterStats
--
2.34.1
v2-0003-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 158dd7d6da4ee39a1743c44ca6385aba33a826e2 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v2 3/3] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/config.sgml | 4 +-
doc/src/sgml/monitoring.sgml | 19 ++++
src/backend/access/transam/xlog.c | 36 +++++++-
src/backend/utils/activity/pgstat_backend.c | 99 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 6 +-
src/backend/utils/adt/pgstatfuncs.c | 78 +++++++++++++---
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 37 ++++++--
src/include/utils/pgstat_internal.h | 1 +
src/test/regress/expected/stats.out | 14 +++
src/test/regress/sql/stats.sql | 6 ++
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 279 insertions(+), 30 deletions(-)
9.9% doc/src/sgml/
8.6% src/backend/access/transam/
34.4% src/backend/utils/activity/
27.8% src/backend/utils/adt/
4.9% src/include/catalog/
6.6% src/include/
3.9% src/test/regress/expected/
3.0% src/test/regress/sql/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8683f0bdf5..8e8478dcb1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8433,7 +8433,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
measure the overhead of timing on your system.
I/O timing information is
displayed in <link linkend="monitoring-pg-stat-wal-view">
- <structname>pg_stat_wal</structname></link>.
+ <structname>pg_stat_wal</structname></link> and in the output of the
+ <link linkend="pg-stat-get-backend-wal">
+ <function>pg_stat_get_backend_wal()</function></link> function.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d0d176cc54..84a2d09b76 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4811,6 +4811,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bf3dbda901..0ba9fcb277 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2058,6 +2058,10 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
XLogWrite(WriteRqst, tli, false);
LWLockRelease(WALWriteLock);
PendingWalStats.wal_buffers_full++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ PendingBackendWalStats.wal_buffers_full++;
+
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
/* Re-acquire WALBufMappingLock and retry */
@@ -2426,11 +2430,14 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
Size nleft;
ssize_t written;
instr_time start;
+ instr_time end;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
nleft = nbytes;
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
do
{
errno = 0;
@@ -2451,14 +2458,26 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
}
PendingWalStats.wal_write++;
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ /*
+ * We are inside a critical section, so we can't use
+ * pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+ PendingBackendWalStats.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_write_time,
+ end, start);
+ }
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -8684,8 +8703,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
{
char *msg = NULL;
instr_time start;
+ instr_time end;
Assert(tli != 0);
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
/*
* Quick exit if fsync is disabled or write() has already synced the WAL
@@ -8751,13 +8773,19 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
}
PendingWalStats.wal_sync++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendWalStats.wal_sync++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_sync_time, end, start);
+ }
}
/*
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 0ae507160d..58c41edea4 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -26,9 +26,21 @@
/* flag bits for different types of statistics to flush */
#define PGSTAT_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO)
+#define PGSTAT_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO | PGSTAT_FLUSH_WAL)
+
+PgStat_PendingWalStats PendingBackendWalStats = {0};
+
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
static void pgstat_flush_io_entry(PgStat_EntryRef *entry_ref);
+static void pgstat_flush_wal_entry(PgStat_EntryRef *entry_ref);
/*
* Returns statistics of a backend by proc number.
@@ -67,6 +79,9 @@ pgstat_flush_backend(PgStat_EntryRef *entry_ref, bool nowait,
if (stats_to_flush & PGSTAT_FLUSH_IO)
pgstat_flush_io_entry(entry_ref);
+ if (stats_to_flush & PGSTAT_FLUSH_WAL)
+ pgstat_flush_wal_entry(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return true;
@@ -109,6 +124,75 @@ pgstat_flush_io_entry(PgStat_EntryRef *entry_ref)
}
}
+/*
+ * To determine whether any WAL activity has occurred since last time, not
+ * only the number of generated WAL records but also the numbers of WAL
+ * writes and syncs need to be checked. Because even transaction that
+ * generates no WAL records can write or sync WAL data when flushing the
+ * data pages.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
+ PendingBackendWalStats.wal_write != 0 ||
+ PendingBackendWalStats.wal_sync != 0;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics.
+ * Locking is managed by the caller.
+ */
+static void
+pgstat_flush_wal_entry(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_stats;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+#define WALSTAT_ACC_INSTR_TIME(fld) \
+ (bktype_shstats->fld += INSTR_TIME_GET_MICROSEC(PendingBackendWalStats.fld))
+ WALSTAT_ACC(wal_buffers_full, PendingBackendWalStats);
+ WALSTAT_ACC(wal_write, PendingBackendWalStats);
+ WALSTAT_ACC(wal_sync, PendingBackendWalStats);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+ WALSTAT_ACC_INSTR_TIME(wal_write_time);
+ WALSTAT_ACC_INSTR_TIME(wal_sync_time);
+#undef WALSTAT_ACC_INSTR_TIME
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+
+ /*
+ * Clear out the statistics buffer, so it can be re-used.
+ */
+ MemSet(&PendingBackendWalStats, 0, sizeof(PendingWalStats));
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -129,6 +213,15 @@ pgstat_backend_flush_io(bool nowait)
pgstat_flush_backend(NULL, nowait, PGSTAT_FLUSH_IO);
}
+/*
+ * Convenience wrapper to flush WAL statistics.
+ */
+void
+pgstat_backend_flush_wal(bool nowait)
+{
+ pgstat_flush_backend(NULL, nowait, PGSTAT_FLUSH_WAL);
+}
+
/*
* Create backend statistics entry for proc number.
*/
@@ -148,10 +241,12 @@ pgstat_create_backend(ProcNumber procnum)
* e.g. if we previously used this proc number.
*/
memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
- * Find or create a local PgStat_BackendPending entry for proc number.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
*/
PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 18fa6b2936..160ce99a4c 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -55,6 +55,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_backend_flush_wal(nowait);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
@@ -117,9 +119,9 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
#define WALSTAT_ACC_INSTR_TIME(fld) \
- (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
+ (stats_shmem->stats.wal_counters.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 8a4340e977..8d251dfcca 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1562,11 +1562,12 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
@@ -1598,26 +1599,26 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats.wal_write);
- values[5] = Int64GetDatum(wal_stats.wal_sync);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_counters.wal_write);
+ values[5] = Int64GetDatum(wal_counters.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_counters.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_counters.wal_sync_time) / 1000.0);
- if (wal_stats.stat_reset_timestamp != 0)
- values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[8] = true;
@@ -1625,6 +1626,55 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_stats;
+
+ /* save tuples with data from this PgStat_BktypeIO */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
+
/*
* Returns statistics of WAL activity
*/
@@ -1636,7 +1686,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters, wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..72a5dae4b1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5929,6 +5929,13 @@
proargmodes => '{o,o,o,o,o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6631bd2d73..045877c5a8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -378,12 +378,6 @@ typedef struct PgStat_IO
/* Backend statistics store the same amount of IO data as PGSTAT_KIND_IO */
typedef PgStat_PendingIO PgStat_BackendPendingIO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -495,7 +489,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter autoanalyze_count;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
@@ -505,6 +499,11 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_sync;
PgStat_Counter wal_write_time;
PgStat_Counter wal_sync_time;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
@@ -523,9 +522,21 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_stats;
+} PgStat_Backend;
+
typedef struct PgStat_BackendPending
{
PgStat_BackendPendingIO pending_io;
+
+ /*
+ * We are not creating one member for PgStat_PendingWalStats. See the
+ * comment above the PendingBackendWalStats definition as to why.
+ */
} PgStat_BackendPending;
/*
@@ -857,5 +868,17 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
/* updated directly by backends and background processes */
extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
+/*
+ * Variables in pgstat_backend.c
+ */
+
+/* updated directly by backends and background processes */
+
+/*
+ * WAL pending statistics are incremented inside a critical section
+ * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;
#endif /* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 320cd5a842..06681b9102 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -614,6 +614,7 @@ extern void pgstat_archiver_snapshot_cb(void);
*/
extern void pgstat_backend_flush_io(bool nowait);
+extern void pgstat_backend_flush_wal(bool nowait);
extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208..cc01fdf274 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf..28fe0a1a7d 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f15526236a..b593d8601e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2159,6 +2159,7 @@ PgStat_KindInfo
PgStat_LocalState
PgStat_PendingDroppedStatsItem
PgStat_PendingIO
+PgStat_PendingBackendWalStats
PgStat_PendingWalStats
PgStat_SLRUStats
PgStat_ShmemControl
@@ -2175,6 +2176,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
On Wed, Jan 08, 2025 at 11:11:59AM +0000, Bertrand Drouvot wrote:
Yeah, that's more elegant as it also means that the main callback will not change
(should we add even more stats in the future). Done that way in v2 attached.
I've put my hands on v2-0002 to begin with something.
+/* flag bits for different types of statistics to flush */
+#define PGSTAT_FLUSH_IO (1 << 0) /* Flush I/O statistics */
+#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO)
These are located and used only in pgstat_backend.c. It seems to me
that we'd better declare them in pgstat_internal.h and extend the
existing pgstat_flush_backend() with an argument so as callers can do
what they want.
+ /* Get our own entry_ref if not provided */
+ if (!entry_ref)
+ entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+ MyProcNumber, false, NULL);
This relates to the previous remark, actually, where I think that it
is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(),
same way as HEAD, and just pass down the flags. pgstat_flush_backend
cannot call directly pgstat_backend_flush_cb(), of course, so I've
settled down to a new pgstat_flush_backend_entry() that handles the
entry locking. This comes at the cost of pgstat_flush_backend_entry()
requiring an extra pgstat_tracks_backend_bktype(), which is not a big
issue, and the patch gets a bit shorter.
--
Michael
Attachments:
v3-0002-PGSTAT_KIND_BACKEND-code-refactoring.patchtext/x-diff; charset=us-asciiDownload
From eba357e4e2712cd73fdad585f6d8088de0dbaccc Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 08:44:29 +0000
Subject: [PATCH v3] PGSTAT_KIND_BACKEND code refactoring
This commit refactors some come related to per backend statistics. It makes
the code more generic or more IO statistics focused as it will be used in a
follow-up commit that will introduce per backend WAL statistics.
---
src/include/pgstat.h | 6 +-
src/include/utils/pgstat_internal.h | 7 +-
src/backend/utils/activity/pgstat.c | 2 +-
src/backend/utils/activity/pgstat_backend.c | 69 ++++++++++++++------
src/backend/utils/activity/pgstat_io.c | 8 +--
src/backend/utils/activity/pgstat_relation.c | 4 +-
src/backend/utils/adt/pgstatfuncs.c | 2 +-
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 68 insertions(+), 31 deletions(-)
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0d8427f27d1..6631bd2d730 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -381,7 +381,7 @@ typedef PgStat_PendingIO PgStat_BackendPendingIO;
typedef struct PgStat_Backend
{
TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO stats;
+ PgStat_BktypeIO io_stats;
} PgStat_Backend;
typedef struct PgStat_StatDBEntry
@@ -523,6 +523,10 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+typedef struct PgStat_BackendPending
+{
+ PgStat_BackendPendingIO pending_io;
+} PgStat_BackendPending;
/*
* Functions in pgstat.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 52eb008710f..4bb8e5c53ab 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -613,9 +613,12 @@ extern void pgstat_archiver_snapshot_cb(void);
* Functions in pgstat_backend.c
*/
-extern void pgstat_flush_backend(bool nowait);
+/* flags for pgstat_flush_backend() */
+#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
-extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern void pgstat_flush_backend(bool nowait, bits32 flags);
+extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 16a03b8ce15..34520535d54 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -370,7 +370,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Backend),
.shared_data_off = offsetof(PgStatShared_Backend, stats),
.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
- .pending_size = sizeof(PgStat_BackendPendingIO),
+ .pending_size = sizeof(PgStat_BackendPending),
.flush_pending_cb = pgstat_backend_flush_cb,
.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 1f91bfef0a3..3972bcf9456 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -39,23 +39,21 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
}
/*
- * Flush out locally pending backend statistics
- *
- * If no stats have been recorded, this function returns false.
+ * Flush out locally pending backend IO statistics. Locking is managed
+ * by the caller.
*/
-bool
-pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+static void
+pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
{
- PgStatShared_Backend *shbackendioent;
- PgStat_BackendPendingIO *pendingent;
+ PgStatShared_Backend *shbackendent;
+ PgStat_BackendPending *pendingent;
PgStat_BktypeIO *bktype_shstats;
+ PgStat_BackendPendingIO *pending_io;
- if (!pgstat_lock_entry(entry_ref, nowait))
- return false;
-
- shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
- bktype_shstats = &shbackendioent->stats.stats;
- pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ pendingent = (PgStat_BackendPending *) entry_ref->pending;
+ bktype_shstats = &shbackendent->stats.io_stats;
+ pending_io = &pendingent->pending_io;
for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
{
@@ -66,15 +64,33 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
instr_time time;
bktype_shstats->counts[io_object][io_context][io_op] +=
- pendingent->counts[io_object][io_context][io_op];
+ pending_io->counts[io_object][io_context][io_op];
- time = pendingent->pending_times[io_object][io_context][io_op];
+ time = pending_io->pending_times[io_object][io_context][io_op];
bktype_shstats->times[io_object][io_context][io_op] +=
INSTR_TIME_GET_MICROSEC(time);
}
}
}
+}
+
+/*
+ * Wrapper routine to flush backend statistics.
+ */
+static bool
+pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
+ bits32 flags)
+{
+ if (!pgstat_tracks_backend_bktype(MyBackendType))
+ return false;
+
+ if (!pgstat_lock_entry(entry_ref, nowait))
+ return false;
+
+ /* Flush requested statistics */
+ if (flags & PGSTAT_BACKEND_FLUSH_IO)
+ pgstat_flush_backend_entry_io(entry_ref);
pgstat_unlock_entry(entry_ref);
@@ -82,10 +98,23 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
}
/*
- * Simpler wrapper of pgstat_backend_flush_cb()
+ * Callback to flush out locally pending backend statistics.
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+ return pgstat_flush_backend_entry(entry_ref, nowait, PGSTAT_BACKEND_FLUSH_ALL);
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * "flags" parameter controls which statistics to flush.
*/
void
-pgstat_flush_backend(bool nowait)
+pgstat_flush_backend(bool nowait, bits32 flags)
{
PgStat_EntryRef *entry_ref;
@@ -94,7 +123,7 @@ pgstat_flush_backend(bool nowait)
entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
MyProcNumber, false, NULL);
- (void) pgstat_backend_flush_cb(entry_ref, nowait);
+ (void) pgstat_flush_backend_entry(entry_ref, nowait, flags);
}
/*
@@ -119,9 +148,9 @@ pgstat_create_backend(ProcNumber procnum)
}
/*
- * Find or create a local PgStat_BackendPendingIO entry for proc number.
+ * Find or create a local PgStat_BackendPending entry for proc number.
*/
-PgStat_BackendPendingIO *
+PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
{
PgStat_EntryRef *entry_ref;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9a1f91dba8..a7445995d32 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -81,10 +81,10 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- entry_ref->counts[io_object][io_context][io_op] += cnt;
+ entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
}
PendingIOStats.counts[io_object][io_context][io_op] += cnt;
@@ -151,10 +151,10 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+ INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
io_time);
}
}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 2cc304f8812..09247ba0971 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
* VACUUM command has processed all tables and committed.
*/
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
}
/*
@@ -351,7 +351,7 @@ pgstat_report_analyze(Relation rel,
/* see pgstat_report_vacuum() */
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
}
/*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3245f3a8d8a..5f8d20a406d 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1544,7 +1544,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
if (bktype == B_INVALID)
return (Datum) 0;
- bktype_stats = &backend_stats->stats;
+ bktype_stats = &backend_stats->io_stats;
/*
* In Assert builds, we can afford an extra loop through all of the
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f83ecf181f..f15526236a3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2140,6 +2140,7 @@ PgStatShared_Subscription
PgStatShared_Wal
PgStat_ArchiverStats
PgStat_Backend
+PgStat_BackendPending
PgStat_BackendPendingIO
PgStat_BackendSubEntry
PgStat_BgWriterStats
--
2.47.1
Hi,
On Thu, Jan 09, 2025 at 01:03:15PM +0900, Michael Paquier wrote:
On Wed, Jan 08, 2025 at 11:11:59AM +0000, Bertrand Drouvot wrote:
Yeah, that's more elegant as it also means that the main callback will not change
(should we add even more stats in the future). Done that way in v2 attached.I've put my hands on v2-0002 to begin with something.
+/* flag bits for different types of statistics to flush */ +#define PGSTAT_FLUSH_IO (1 << 0) /* Flush I/O statistics */ +#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO)These are located and used only in pgstat_backend.c. It seems to me
that we'd better declare them in pgstat_internal.h and extend the
existing pgstat_flush_backend() with an argument so as callers can do
what they want.+ /* Get our own entry_ref if not provided */ + if (!entry_ref) + entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid, + MyProcNumber, false, NULL);This relates to the previous remark, actually, where I think that it
is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(),
same way as HEAD, and just pass down the flags.
I see, so you keep pgstat_flush_backend() calls (with an extra arg) and remove
the new "pgstat_backend_flush_io()" function.
This comes at the cost of pgstat_flush_backend_entry()
requiring an extra pgstat_tracks_backend_bktype(), which is not a big
issue, and the patch gets a bit shorter.
Yeah, all of the above is fine by me.
PFA v3 which is v2 refactoring with your proposed above changes.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v3-0001-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From bcdf4eb91f7ff4beeba50433a173ac2650afe432 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v3 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 56 ++++++++++++++++++-----------
1 file changed, 36 insertions(+), 20 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 3245f3a8d8..7309f06993 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1560,20 +1560,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 9
+#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats->wal_write);
- values[5] = Int64GetDatum(wal_stats->wal_sync);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_stats.wal_write);
+ values[5] = Int64GetDatum(wal_stats.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
- values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[8] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v3-0002-PGSTAT_KIND_BACKEND-code-refactoring.patchtext/x-diff; charset=us-asciiDownload
From 9173b59075759fdd88f29bcf2b555e86c823704d Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 08:44:29 +0000
Subject: [PATCH v3 2/3] PGSTAT_KIND_BACKEND code refactoring
This commit refactors some come related to per backend statistics. It makes
the code more generic or more IO statistics focused as it will be used in a
follow-up commit that will introduce per backend WAL statistics.
---
src/backend/utils/activity/pgstat.c | 2 +-
src/backend/utils/activity/pgstat_backend.c | 70 ++++++++++++++------
src/backend/utils/activity/pgstat_io.c | 8 +--
src/backend/utils/activity/pgstat_relation.c | 4 +-
src/backend/utils/adt/pgstatfuncs.c | 2 +-
src/include/pgstat.h | 6 +-
src/include/utils/pgstat_internal.h | 7 +-
src/tools/pgindent/typedefs.list | 1 +
8 files changed, 69 insertions(+), 31 deletions(-)
79.4% src/backend/utils/activity/
12.9% src/include/utils/
4.5% src/include/
3.0% src/
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 16a03b8ce1..34520535d5 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -370,7 +370,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.shared_size = sizeof(PgStatShared_Backend),
.shared_data_off = offsetof(PgStatShared_Backend, stats),
.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
- .pending_size = sizeof(PgStat_BackendPendingIO),
+ .pending_size = sizeof(PgStat_BackendPending),
.flush_pending_cb = pgstat_backend_flush_cb,
.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 1f91bfef0a..ea49208b80 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -39,23 +39,21 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
}
/*
- * Flush out locally pending backend statistics
- *
- * If no stats have been recorded, this function returns false.
+ * Flush out locally pending backend IO statistics. Locking is managed
+ * by the caller.
*/
-bool
-pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+static void
+pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
{
- PgStatShared_Backend *shbackendioent;
- PgStat_BackendPendingIO *pendingent;
+ PgStatShared_Backend *shbackendent;
+ PgStat_BackendPending *pendingent;
PgStat_BktypeIO *bktype_shstats;
+ PgStat_BackendPendingIO *pending_io;
- if (!pgstat_lock_entry(entry_ref, nowait))
- return false;
-
- shbackendioent = (PgStatShared_Backend *) entry_ref->shared_stats;
- bktype_shstats = &shbackendioent->stats.stats;
- pendingent = (PgStat_BackendPendingIO *) entry_ref->pending;
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ pendingent = (PgStat_BackendPending *) entry_ref->pending;
+ bktype_shstats = &shbackendent->stats.io_stats;
+ pending_io = &pendingent->pending_io;
for (int io_object = 0; io_object < IOOBJECT_NUM_TYPES; io_object++)
{
@@ -66,15 +64,33 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
instr_time time;
bktype_shstats->counts[io_object][io_context][io_op] +=
- pendingent->counts[io_object][io_context][io_op];
+ pending_io->counts[io_object][io_context][io_op];
- time = pendingent->pending_times[io_object][io_context][io_op];
+ time = pending_io->pending_times[io_object][io_context][io_op];
bktype_shstats->times[io_object][io_context][io_op] +=
INSTR_TIME_GET_MICROSEC(time);
}
}
}
+}
+
+/*
+ * Wrapper routine to flush backend statistics.
+ */
+static bool
+pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
+ bits32 flags)
+{
+ if (!pgstat_tracks_backend_bktype(MyBackendType))
+ return false;
+
+ if (!pgstat_lock_entry(entry_ref, nowait))
+ return false;
+
+ /* Flush requested statistics */
+ if (flags & PGSTAT_BACKEND_FLUSH_IO)
+ pgstat_flush_backend_entry_io(entry_ref);
pgstat_unlock_entry(entry_ref);
@@ -82,10 +98,23 @@ pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
}
/*
- * Simpler wrapper of pgstat_backend_flush_cb()
+ * Callback to flush out locally pending backend statistics.
+ *
+ * If no stats have been recorded, this function returns false.
+ */
+bool
+pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait)
+{
+ return pgstat_flush_backend_entry(entry_ref, nowait, PGSTAT_BACKEND_FLUSH_ALL);
+}
+
+/*
+ * Flush out locally pending backend statistics
+ *
+ * "flags" parameter controls which statistics to flush.
*/
void
-pgstat_flush_backend(bool nowait)
+pgstat_flush_backend(bool nowait, bits32 flags)
{
PgStat_EntryRef *entry_ref;
@@ -94,7 +123,8 @@ pgstat_flush_backend(bool nowait)
entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
MyProcNumber, false, NULL);
- (void) pgstat_backend_flush_cb(entry_ref, nowait);
+
+ (void) pgstat_flush_backend_entry(entry_ref, nowait, flags);
}
/*
@@ -119,9 +149,9 @@ pgstat_create_backend(ProcNumber procnum)
}
/*
- * Find or create a local PgStat_BackendPendingIO entry for proc number.
+ * Find or create a local PgStat_BackendPending entry for proc number.
*/
-PgStat_BackendPendingIO *
+PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
{
PgStat_EntryRef *entry_ref;
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index f9a1f91dba..a7445995d3 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -81,10 +81,10 @@ pgstat_count_io_op_n(IOObject io_object, IOContext io_context, IOOp io_op, uint3
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- entry_ref->counts[io_object][io_context][io_op] += cnt;
+ entry_ref->pending_io.counts[io_object][io_context][io_op] += cnt;
}
PendingIOStats.counts[io_object][io_context][io_op] += cnt;
@@ -151,10 +151,10 @@ pgstat_count_io_op_time(IOObject io_object, IOContext io_context, IOOp io_op,
if (pgstat_tracks_backend_bktype(MyBackendType))
{
- PgStat_PendingIO *entry_ref;
+ PgStat_BackendPending *entry_ref;
entry_ref = pgstat_prep_backend_pending(MyProcNumber);
- INSTR_TIME_ADD(entry_ref->pending_times[io_object][io_context][io_op],
+ INSTR_TIME_ADD(entry_ref->pending_io.pending_times[io_object][io_context][io_op],
io_time);
}
}
diff --git a/src/backend/utils/activity/pgstat_relation.c b/src/backend/utils/activity/pgstat_relation.c
index 2cc304f881..09247ba097 100644
--- a/src/backend/utils/activity/pgstat_relation.c
+++ b/src/backend/utils/activity/pgstat_relation.c
@@ -264,7 +264,7 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
* VACUUM command has processed all tables and committed.
*/
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
}
/*
@@ -351,7 +351,7 @@ pgstat_report_analyze(Relation rel,
/* see pgstat_report_vacuum() */
pgstat_flush_io(false);
- pgstat_flush_backend(false);
+ pgstat_flush_backend(false, PGSTAT_BACKEND_FLUSH_IO);
}
/*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 7309f06993..8a4340e977 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1544,7 +1544,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
if (bktype == B_INVALID)
return (Datum) 0;
- bktype_stats = &backend_stats->stats;
+ bktype_stats = &backend_stats->io_stats;
/*
* In Assert builds, we can afford an extra loop through all of the
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0d8427f27d..6631bd2d73 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -381,7 +381,7 @@ typedef PgStat_PendingIO PgStat_BackendPendingIO;
typedef struct PgStat_Backend
{
TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO stats;
+ PgStat_BktypeIO io_stats;
} PgStat_Backend;
typedef struct PgStat_StatDBEntry
@@ -523,6 +523,10 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+typedef struct PgStat_BackendPending
+{
+ PgStat_BackendPendingIO pending_io;
+} PgStat_BackendPending;
/*
* Functions in pgstat.c
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 52eb008710..4bb8e5c53a 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -613,9 +613,12 @@ extern void pgstat_archiver_snapshot_cb(void);
* Functions in pgstat_backend.c
*/
-extern void pgstat_flush_backend(bool nowait);
+/* flags for pgstat_flush_backend() */
+#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
-extern PgStat_BackendPendingIO *pgstat_prep_backend_pending(ProcNumber procnum);
+extern void pgstat_flush_backend(bool nowait, bits32 flags);
+extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
extern bool pgstat_backend_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header, TimestampTz ts);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f83ecf181..f15526236a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2140,6 +2140,7 @@ PgStatShared_Subscription
PgStatShared_Wal
PgStat_ArchiverStats
PgStat_Backend
+PgStat_BackendPending
PgStat_BackendPendingIO
PgStat_BackendSubEntry
PgStat_BgWriterStats
--
2.34.1
v3-0003-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From b024c89167be988f31dee8330546ed96f43384d5 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v3 3/3] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/config.sgml | 4 +-
doc/src/sgml/monitoring.sgml | 19 +++++
src/backend/access/transam/xlog.c | 36 ++++++++-
src/backend/utils/activity/pgstat_backend.c | 86 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 6 +-
src/backend/utils/adt/pgstatfuncs.c | 78 +++++++++++++++----
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 37 +++++++--
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 ++++
src/test/regress/sql/stats.sql | 6 ++
src/tools/pgindent/typedefs.list | 2 +
12 files changed, 268 insertions(+), 30 deletions(-)
10.1% doc/src/sgml/
8.8% src/backend/access/transam/
31.5% src/backend/utils/activity/
28.3% src/backend/utils/adt/
5.0% src/include/catalog/
8.4% src/include/
4.0% src/test/regress/expected/
3.0% src/test/regress/sql/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8683f0bdf5..8e8478dcb1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8433,7 +8433,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
measure the overhead of timing on your system.
I/O timing information is
displayed in <link linkend="monitoring-pg-stat-wal-view">
- <structname>pg_stat_wal</structname></link>.
+ <structname>pg_stat_wal</structname></link> and in the output of the
+ <link linkend="pg-stat-get-backend-wal">
+ <function>pg_stat_get_backend_wal()</function></link> function.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d0d176cc54..84a2d09b76 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4811,6 +4811,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bf3dbda901..0ba9fcb277 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2058,6 +2058,10 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
XLogWrite(WriteRqst, tli, false);
LWLockRelease(WALWriteLock);
PendingWalStats.wal_buffers_full++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ PendingBackendWalStats.wal_buffers_full++;
+
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
/* Re-acquire WALBufMappingLock and retry */
@@ -2426,11 +2430,14 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
Size nleft;
ssize_t written;
instr_time start;
+ instr_time end;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
nleft = nbytes;
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
do
{
errno = 0;
@@ -2451,14 +2458,26 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
}
PendingWalStats.wal_write++;
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ /*
+ * We are inside a critical section, so we can't use
+ * pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+ PendingBackendWalStats.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_write_time,
+ end, start);
+ }
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -8684,8 +8703,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
{
char *msg = NULL;
instr_time start;
+ instr_time end;
Assert(tli != 0);
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
/*
* Quick exit if fsync is disabled or write() has already synced the WAL
@@ -8751,13 +8773,19 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
}
PendingWalStats.wal_sync++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendWalStats.wal_sync++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_sync_time, end, start);
+ }
}
/*
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index ea49208b80..c66cbf70b6 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -24,6 +24,16 @@
#include "utils/pgstat_internal.h"
+PgStat_PendingWalStats PendingBackendWalStats = {0};
+
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Returns statistics of a backend by proc number.
*/
@@ -75,6 +85,75 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
}
}
+/*
+ * To determine whether any WAL activity has occurred since last time, not
+ * only the number of generated WAL records but also the numbers of WAL
+ * writes and syncs need to be checked. Because even transaction that
+ * generates no WAL records can write or sync WAL data when flushing the
+ * data pages.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
+ PendingBackendWalStats.wal_write != 0 ||
+ PendingBackendWalStats.wal_sync != 0;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_stats;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+#define WALSTAT_ACC_INSTR_TIME(fld) \
+ (bktype_shstats->fld += INSTR_TIME_GET_MICROSEC(PendingBackendWalStats.fld))
+ WALSTAT_ACC(wal_buffers_full, PendingBackendWalStats);
+ WALSTAT_ACC(wal_write, PendingBackendWalStats);
+ WALSTAT_ACC(wal_sync, PendingBackendWalStats);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+ WALSTAT_ACC_INSTR_TIME(wal_write_time);
+ WALSTAT_ACC_INSTR_TIME(wal_sync_time);
+#undef WALSTAT_ACC_INSTR_TIME
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+
+ /*
+ * Clear out the statistics buffer, so it can be re-used.
+ */
+ MemSet(&PendingBackendWalStats, 0, sizeof(PendingWalStats));
+}
+
/*
* Wrapper routine to flush backend statistics.
*/
@@ -92,6 +171,9 @@ pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return true;
@@ -146,10 +228,12 @@ pgstat_create_backend(ProcNumber procnum)
* e.g. if we previously used this proc number.
*/
memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
- * Find or create a local PgStat_BackendPending entry for proc number.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
*/
PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 18fa6b2936..3362764226 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -55,6 +55,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
@@ -117,9 +119,9 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
#define WALSTAT_ACC_INSTR_TIME(fld) \
- (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
+ (stats_shmem->stats.wal_counters.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 8a4340e977..8d251dfcca 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1562,11 +1562,12 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
@@ -1598,26 +1599,26 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats.wal_write);
- values[5] = Int64GetDatum(wal_stats.wal_sync);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_counters.wal_write);
+ values[5] = Int64GetDatum(wal_counters.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_counters.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_counters.wal_sync_time) / 1000.0);
- if (wal_stats.stat_reset_timestamp != 0)
- values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[8] = true;
@@ -1625,6 +1626,55 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_stats;
+
+ /* save tuples with data from this PgStat_BktypeIO */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
+
/*
* Returns statistics of WAL activity
*/
@@ -1636,7 +1686,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters, wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..72a5dae4b1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5929,6 +5929,13 @@
proargmodes => '{o,o,o,o,o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6631bd2d73..045877c5a8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -378,12 +378,6 @@ typedef struct PgStat_IO
/* Backend statistics store the same amount of IO data as PGSTAT_KIND_IO */
typedef PgStat_PendingIO PgStat_BackendPendingIO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -495,7 +489,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter autoanalyze_count;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
@@ -505,6 +499,11 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_sync;
PgStat_Counter wal_write_time;
PgStat_Counter wal_sync_time;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
@@ -523,9 +522,21 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_stats;
+} PgStat_Backend;
+
typedef struct PgStat_BackendPending
{
PgStat_BackendPendingIO pending_io;
+
+ /*
+ * We are not creating one member for PgStat_PendingWalStats. See the
+ * comment above the PendingBackendWalStats definition as to why.
+ */
} PgStat_BackendPending;
/*
@@ -857,5 +868,17 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
/* updated directly by backends and background processes */
extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
+/*
+ * Variables in pgstat_backend.c
+ */
+
+/* updated directly by backends and background processes */
+
+/*
+ * WAL pending statistics are incremented inside a critical section
+ * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;
#endif /* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 4bb8e5c53a..05c3e21500 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -615,7 +615,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern void pgstat_flush_backend(bool nowait, bits32 flags);
extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208..cc01fdf274 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf..28fe0a1a7d 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f15526236a..b593d8601e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2159,6 +2159,7 @@ PgStat_KindInfo
PgStat_LocalState
PgStat_PendingDroppedStatsItem
PgStat_PendingIO
+PgStat_PendingBackendWalStats
PgStat_PendingWalStats
PgStat_SLRUStats
PgStat_ShmemControl
@@ -2175,6 +2176,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
On Thu, Jan 09, 2025 at 07:05:54AM +0000, Bertrand Drouvot wrote:
PFA v3 which is v2 refactoring with your proposed above changes.
An extra thing I have finished by doing is removing
PgStat_BackendPendingIO, then applied the change. It was useful when
returned as a result of pgstat_prep_backend_pending(), but not so much
with the new PgStat_BackendPending that includes all the pending stats
data.
--
Michael
Hi,
Michael Paquier wrote:
An extra thing I have finished by doing is removing
PgStat_BackendPendingIO, then applied the change. It was useful when
returned as a result of pgstat_prep_backend_pending(), but not so much
with the new PgStat_BackendPending that includes all the pending stats
data.
Yeah, makes sense, thanks!
Please find attached v4 taking into account 2c14037bb5.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v4-0001-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From c424cd02baba3f441768b1f049a5ec6ad11bc3ce Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v4 1/2] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 56 ++++++++++++++++++-----------
1 file changed, 36 insertions(+), 20 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 5f8d20a406..8a4340e977 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1560,20 +1560,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 9
+#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1595,34 +1597,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats->wal_write);
- values[5] = Int64GetDatum(wal_stats->wal_sync);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_stats.wal_write);
+ values[5] = Int64GetDatum(wal_stats.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
- values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[8] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v4-0002-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From bb1043a1a4240287d8ba94615a6217d1a12f5722 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v4 2/2] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/config.sgml | 4 +-
doc/src/sgml/monitoring.sgml | 19 +++++
src/backend/access/transam/xlog.c | 36 ++++++++-
src/backend/utils/activity/pgstat_backend.c | 86 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 6 +-
src/backend/utils/adt/pgstatfuncs.c | 78 +++++++++++++++----
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 59 +++++++++-----
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 ++++
src/test/regress/sql/stats.sql | 6 ++
src/tools/pgindent/typedefs.list | 1 +
12 files changed, 277 insertions(+), 42 deletions(-)
10.1% doc/src/sgml/
8.8% src/backend/access/transam/
31.6% src/backend/utils/activity/
28.4% src/backend/utils/adt/
5.0% src/include/catalog/
8.4% src/include/
4.0% src/test/regress/expected/
3.1% src/test/regress/sql/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f1ab614575..9193e70a01 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8435,7 +8435,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
measure the overhead of timing on your system.
I/O timing information is
displayed in <link linkend="monitoring-pg-stat-wal-view">
- <structname>pg_stat_wal</structname></link>.
+ <structname>pg_stat_wal</structname></link> and in the output of the
+ <link linkend="pg-stat-get-backend-wal">
+ <function>pg_stat_get_backend_wal()</function></link> function.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d0d176cc54..84a2d09b76 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4811,6 +4811,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bf3dbda901..0ba9fcb277 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2058,6 +2058,10 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
XLogWrite(WriteRqst, tli, false);
LWLockRelease(WALWriteLock);
PendingWalStats.wal_buffers_full++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ PendingBackendWalStats.wal_buffers_full++;
+
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
/* Re-acquire WALBufMappingLock and retry */
@@ -2426,11 +2430,14 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
Size nleft;
ssize_t written;
instr_time start;
+ instr_time end;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
nleft = nbytes;
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
do
{
errno = 0;
@@ -2451,14 +2458,26 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
}
PendingWalStats.wal_write++;
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ /*
+ * We are inside a critical section, so we can't use
+ * pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+ PendingBackendWalStats.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_write_time,
+ end, start);
+ }
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -8684,8 +8703,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
{
char *msg = NULL;
instr_time start;
+ instr_time end;
Assert(tli != 0);
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
/*
* Quick exit if fsync is disabled or write() has already synced the WAL
@@ -8751,13 +8773,19 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
}
PendingWalStats.wal_sync++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendWalStats.wal_sync++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendWalStats.wal_sync_time, end, start);
+ }
}
/*
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 207bfa3c27..86c3b106dd 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -24,6 +24,16 @@
#include "utils/pgstat_internal.h"
+PgStat_PendingWalStats PendingBackendWalStats = {0};
+
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Returns statistics of a backend by proc number.
*/
@@ -75,6 +85,75 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
}
}
+/*
+ * To determine whether any WAL activity has occurred since last time, not
+ * only the number of generated WAL records but also the numbers of WAL
+ * writes and syncs need to be checked. Because even transaction that
+ * generates no WAL records can write or sync WAL data when flushing the
+ * data pages.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
+ PendingBackendWalStats.wal_write != 0 ||
+ PendingBackendWalStats.wal_sync != 0;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_stats;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+#define WALSTAT_ACC_INSTR_TIME(fld) \
+ (bktype_shstats->fld += INSTR_TIME_GET_MICROSEC(PendingBackendWalStats.fld))
+ WALSTAT_ACC(wal_buffers_full, PendingBackendWalStats);
+ WALSTAT_ACC(wal_write, PendingBackendWalStats);
+ WALSTAT_ACC(wal_sync, PendingBackendWalStats);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+ WALSTAT_ACC_INSTR_TIME(wal_write_time);
+ WALSTAT_ACC_INSTR_TIME(wal_sync_time);
+#undef WALSTAT_ACC_INSTR_TIME
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+
+ /*
+ * Clear out the statistics buffer, so it can be re-used.
+ */
+ MemSet(&PendingBackendWalStats, 0, sizeof(PendingWalStats));
+}
+
/*
* Wrapper routine to flush backend statistics.
*/
@@ -92,6 +171,9 @@ pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return true;
@@ -145,10 +227,12 @@ pgstat_create_backend(ProcNumber procnum)
* e.g. if we previously used this proc number.
*/
memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
- * Find or create a local PgStat_BackendPending entry for proc number.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
*/
PgStat_BackendPending *
pgstat_prep_backend_pending(ProcNumber procnum)
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 18fa6b2936..3362764226 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -55,6 +55,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
@@ -117,9 +119,9 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
#define WALSTAT_ACC_INSTR_TIME(fld) \
- (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
+ (stats_shmem->stats.wal_counters.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 8a4340e977..8d251dfcca 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1562,11 +1562,12 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
@@ -1598,26 +1599,26 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats.wal_write);
- values[5] = Int64GetDatum(wal_stats.wal_sync);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_counters.wal_write);
+ values[5] = Int64GetDatum(wal_counters.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_counters.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_counters.wal_sync_time) / 1000.0);
- if (wal_stats.stat_reset_timestamp != 0)
- values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[8] = true;
@@ -1625,6 +1626,55 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_stats;
+
+ /* save tuples with data from this PgStat_BktypeIO */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
+
/*
* Returns statistics of WAL activity
*/
@@ -1636,7 +1686,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters, wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f88..72a5dae4b1 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5929,6 +5929,13 @@
proargmodes => '{o,o,o,o,o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6475889c58..dd9cbbe103 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -375,24 +375,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -504,7 +486,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter autoanalyze_count;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
@@ -514,6 +496,11 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_sync;
PgStat_Counter wal_write_time;
PgStat_Counter wal_sync_time;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
@@ -532,6 +519,28 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+ /*
+ * We are not creating one member for PgStat_PendingWalStats. See the
+ * comment above the PendingBackendWalStats definition as to why.
+ */
+} PgStat_BackendPending;
+
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_stats;
+} PgStat_Backend;
/*
* Functions in pgstat.c
@@ -862,5 +871,17 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
/* updated directly by backends and background processes */
extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
+/*
+ * Variables in pgstat_backend.c
+ */
+
+/* updated directly by backends and background processes */
+
+/*
+ * WAL pending statistics are incremented inside a critical section
+ * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;
#endif /* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 4bb8e5c53a..05c3e21500 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -615,7 +615,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern void pgstat_flush_backend(bool nowait, bits32 flags);
extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208..cc01fdf274 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf..28fe0a1a7d 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index eb93debe10..faca109b80 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2174,6 +2174,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote:
Please find attached v4 taking into account 2c14037bb5.
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
I know that's a nit, but perhaps that could be a patch of its own,
pushing that to pg_stat_wal_build_tuple() to reduce the diffs in the
main patch.
- * Find or create a local PgStat_BackendPending entry for proc number.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
Seems like you are undoing a change here.
+ * WAL pending statistics are incremented inside a critical section
+ * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;
Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.
--
Michael
Hi,
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote:
Please find attached v4 taking into account 2c14037bb5.
+} PgStat_WalCounters; + +typedef struct PgStat_WalStats +{ + PgStat_WalCounters wal_counters;I know that's a nit, but perhaps that could be a patch of its own,
pushing that to pg_stat_wal_build_tuple() to reduce the diffs in the
main patch.
Done in 0003 attached.
- * Find or create a local PgStat_BackendPending entry for proc number. + * Find or create a local PgStat_BackendPendingIO entry for proc number.Seems like you are undoing a change here.
Arf, nice catch, hopefully that's only the comment.. Fixed in the attached.
+ * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.
I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.
The attached is implementing in 0001 the proposition done in [1]/messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz, i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when needed
Note that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).
While it avoids the failed assertion mentioned above and in [1]/messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz (on
the MemoryContextAllocZero() call), the TAP tests are still failing with a new
failed assertion.
If you apply the whole patch series attached, you'll see that:
make -C src/bin/pg_rewind check PROVE_TESTS=t/001_basic.pl
is failing with something like:
TRAP: failed Assert("CritSectionCount == 0"), File: "mcxt.c", Line: 1107, PID: 3295726
pg18/bin/postgres(ExceptionalCondition+0xbb)[0x59668bee1f6d]
pg18/bin/postgres(MemoryContextCreate+0x46)[0x59668bf2a8fe]
pg18/bin/postgres(AllocSetContextCreateInternal+0x1df)[0x59668bf1bb11]
pg18/bin/postgres(pgstat_prep_pending_entry+0x86)[0x59668bcff8cc]
pg18/bin/postgres(pgstat_prep_backend_pending+0x2b)[0x59668bd024a9]
This one is more problematic because we are in MemoryContextCreate() so that the
"workaround" above does not help. I need to put more thoughts on it but already
sharing the issue here (as also discussed in [1]/messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz).
[1]: /messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v5-0001-Add-allow_critical_section-to-PgStat_KindInfo-for.patchtext/x-diff; charset=us-asciiDownload
From 327160f7f96bcdc1c3779d25bb0228f25a4c2504 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Jan 2025 08:51:09 +0000
Subject: [PATCH v5 1/4] Add allow_critical_section to PgStat_KindInfo for
pgstats kinds
This new field controls if a variable-numbered stats kind is allowed
to allocate memory in contexts while in a critical section.
Allocating memory in contexts while in a critical section isn't usually allowed,
but we make an exception.
It means that there's a theoretical possibility to run out of memory while
allocating the memory, which leads to a PANIC.
The memory allocation here are small (pending entries or entry reference) so
that's unlikely to happen in practice.
This is useful for a following commit that will track WAL statistics while in
a critical section (while in XLogWrite() for example).
---
src/backend/utils/activity/pgstat.c | 19 +++++++++++++++++++
src/backend/utils/activity/pgstat_shmem.c | 13 +++++++++++++
src/include/utils/pgstat_internal.h | 3 +++
.../injection_points/injection_stats.c | 1 +
4 files changed, 36 insertions(+)
89.9% src/backend/utils/activity/
7.6% src/include/utils/
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 34520535d54..a942e04bb97 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -284,6 +284,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.fixed_amount = false,
.write_to_file = true,
+ .allow_critical_section = false,
/* so pg_stat_database entries can be seen in all databases */
.accessed_across_databases = true,
@@ -301,6 +302,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.fixed_amount = false,
.write_to_file = true,
+ .allow_critical_section = false,
.shared_size = sizeof(PgStatShared_Relation),
.shared_data_off = offsetof(PgStatShared_Relation, stats),
@@ -316,6 +318,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.fixed_amount = false,
.write_to_file = true,
+ .allow_critical_section = false,
.shared_size = sizeof(PgStatShared_Function),
.shared_data_off = offsetof(PgStatShared_Function, stats),
@@ -330,6 +333,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.fixed_amount = false,
.write_to_file = true,
+ .allow_critical_section = false,
.accessed_across_databases = true,
@@ -347,6 +351,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.fixed_amount = false,
.write_to_file = true,
+ .allow_critical_section = false,
/* so pg_stat_subscription_stats entries can be seen in all databases */
.accessed_across_databases = true,
@@ -364,6 +369,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.fixed_amount = false,
.write_to_file = false,
+ .allow_critical_section = false,
.accessed_across_databases = true,
@@ -1316,7 +1322,20 @@ pgstat_prep_pending_entry(PgStat_Kind kind, Oid dboid, uint64 objid, bool *creat
Assert(entrysize != (size_t) -1);
+ /*
+ * That could be done within a critical section, which isn't usually
+ * allowed, but we make an exception. It means that there's a
+ * theoretical possibility that you run out of memory while preparing
+ * the entry, which leads to a PANIC. Fortunately the pending entry is
+ * small so that's unlikely to happen in practice.
+ */
+ if (pgstat_get_kind_info(kind)->allow_critical_section)
+ MemoryContextAllowInCriticalSection(pgStatPendingContext, true);
+
entry_ref->pending = MemoryContextAllocZero(pgStatPendingContext, entrysize);
+
+ MemoryContextAllowInCriticalSection(pgStatPendingContext, false);
+
dlist_push_tail(&pgStatPending, &entry_ref->pending_node);
}
diff --git a/src/backend/utils/activity/pgstat_shmem.c b/src/backend/utils/activity/pgstat_shmem.c
index 342586397d6..cb1ad330bd4 100644
--- a/src/backend/utils/activity/pgstat_shmem.c
+++ b/src/backend/utils/activity/pgstat_shmem.c
@@ -402,9 +402,22 @@ pgstat_get_entry_ref_cached(PgStat_HashKey key, PgStat_EntryRef **entry_ref_p)
{
PgStat_EntryRef *entry_ref;
+ /*
+ * That could be done within a critical section, which isn't usually
+ * allowed, but we make an exception. It means that there's a
+ * theoretical possibility that you run out of memory while creating
+ * the entry ref, which leads to a PANIC. Fortunately the pending
+ * entry is small so that's unlikely to happen in practice.
+ */
+ if (pgstat_get_kind_info(key.kind)->allow_critical_section)
+ MemoryContextAllowInCriticalSection(pgStatSharedRefContext, true);
+
cache_entry->entry_ref = entry_ref =
MemoryContextAlloc(pgStatSharedRefContext,
sizeof(PgStat_EntryRef));
+
+ MemoryContextAllowInCriticalSection(pgStatSharedRefContext, false);
+
entry_ref->shared_stats = NULL;
entry_ref->shared_entry = NULL;
entry_ref->pending = NULL;
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 4bb8e5c53ab..0b21603c863 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -216,6 +216,9 @@ typedef struct PgStat_KindInfo
/* Should stats be written to the on-disk stats file? */
bool write_to_file:1;
+ /* For variable-numbered stats: allow memory context in critical section */
+ bool allow_critical_section:1;
+
/*
* The size of an entry in the shared stats hash table (pointed to by
* PgStatShared_HashEntry->body). For fixed-numbered statistics, this is
diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index 5db62bca66f..873dc88aace 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -40,6 +40,7 @@ static const PgStat_KindInfo injection_stats = {
.name = "injection_points",
.fixed_amount = false, /* Bounded by the number of points */
.write_to_file = true,
+ .allow_critical_section = false,
/* Injection points are system-wide */
.accessed_across_databases = true,
--
2.34.1
v5-0002-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From 0ff81e01d4292e575270a4bd380ebb0505c57e11 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v5 2/4] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 56 ++++++++++++++++++-----------
1 file changed, 36 insertions(+), 20 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0f5e0a9778d..0442be03304 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1604,20 +1604,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 9
+#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1639,34 +1641,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats->wal_write);
- values[5] = Int64GetDatum(wal_stats->wal_sync);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_stats.wal_write);
+ values[5] = Int64GetDatum(wal_stats.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
- values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[8] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v5-0003-Adding-a-new-PgStat_WalCounters-struct.patchtext/x-diff; charset=us-asciiDownload
From 383d186b1699ac74a53f2fea85ced5abc10cd558 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Jan 2025 15:06:01 +0000
Subject: [PATCH v5 3/4] Adding a new PgStat_WalCounters struct
This new struct contains only the counters related to the WAL statistics.
This will be used in a follow-up commit that uses the same structures but
for the PGSTAT_KIND_BACKEND statistics kind.
---
src/backend/utils/activity/pgstat_wal.c | 4 ++--
src/backend/utils/adt/pgstatfuncs.c | 28 +++++++++++++------------
src/include/pgstat.h | 7 ++++++-
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 24 insertions(+), 16 deletions(-)
14.1% src/backend/utils/activity/
79.8% src/backend/utils/adt/
5.0% src/include/
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 18fa6b2936a..bfc06178a68 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -117,9 +117,9 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
#define WALSTAT_ACC_INSTR_TIME(fld) \
- (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
+ (stats_shmem->stats.wal_counters.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0442be03304..97510d48eef 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1607,10 +1607,11 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
* pg_stat_wal_build_tuple
*
* Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
@@ -1642,26 +1643,26 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats.wal_write);
- values[5] = Int64GetDatum(wal_stats.wal_sync);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_counters.wal_write);
+ values[5] = Int64GetDatum(wal_counters.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_counters.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_counters.wal_sync_time) / 1000.0);
- if (wal_stats.stat_reset_timestamp != 0)
- values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[8] = true;
@@ -1680,7 +1681,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters,
+ wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a878402f502..bb8e0044a47 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -463,7 +463,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter autoanalyze_count;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
@@ -473,6 +473,11 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_sync;
PgStat_Counter wal_write_time;
PgStat_Counter wal_sync_time;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 94dc956ae8c..d8b6623ba9c 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2176,6 +2176,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
v5-0004-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 094bc124baea085a721f0013da290f3b92f96607 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v5 4/4] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
Note that allow_critical_section has been enabled for the PGSTAT_KIND_BACKEND
stats kind as we are now tracking variable-numbered stats in critical section(s)
(XLogWrite() for example).
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/config.sgml | 4 +-
doc/src/sgml/monitoring.sgml | 19 +++++++
src/backend/access/transam/xlog.c | 44 +++++++++++++--
src/backend/utils/activity/pgstat.c | 2 +-
src/backend/utils/activity/pgstat_backend.c | 59 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 53 +++++++++++++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 43 ++++++++-------
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 +++
12 files changed, 227 insertions(+), 29 deletions(-)
14.2% doc/src/sgml/
14.9% src/backend/access/transam/
28.8% src/backend/utils/activity/
19.7% src/backend/utils/adt/
7.0% src/include/catalog/
5.1% src/include/
5.6% src/test/regress/expected/
4.3% src/test/regress/sql/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 3f41a17b1fe..075cb4fe1a8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8292,7 +8292,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
measure the overhead of timing on your system.
I/O timing information is
displayed in <link linkend="monitoring-pg-stat-wal-view">
- <structname>pg_stat_wal</structname></link>.
+ <structname>pg_stat_wal</structname></link> and in the output of the
+ <link linkend="pg-stat-get-backend-wal">
+ <function>pg_stat_get_backend_wal()</function></link> function.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e5888fae2b5..8fdf7d13f21 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4824,6 +4824,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bf3dbda901d..d09a079bcbb 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -96,6 +96,7 @@
#include "utils/guc_hooks.h"
#include "utils/guc_tables.h"
#include "utils/injection_point.h"
+#include "utils/pgstat_internal.h"
#include "utils/ps_status.h"
#include "utils/relmapper.h"
#include "utils/snapmgr.h"
@@ -2058,6 +2059,15 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
XLogWrite(WriteRqst, tli, false);
LWLockRelease(WALWriteLock);
PendingWalStats.wal_buffers_full++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PgStat_BackendPending *entry_ref;
+
+ entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+ entry_ref->pending_wal.wal_buffers_full++;
+ }
+
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
/* Re-acquire WALBufMappingLock and retry */
@@ -2426,11 +2436,14 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
Size nleft;
ssize_t written;
instr_time start;
+ instr_time end;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
nleft = nbytes;
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
do
{
errno = 0;
@@ -2451,14 +2464,24 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
}
PendingWalStats.wal_write++;
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PgStat_BackendPending *entry_ref;
+
+ entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+ entry_ref->pending_wal.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(entry_ref->pending_wal.wal_write_time,
+ end, start);
+ }
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -8684,8 +8707,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
{
char *msg = NULL;
instr_time start;
+ instr_time end;
Assert(tli != 0);
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
/*
* Quick exit if fsync is disabled or write() has already synced the WAL
@@ -8751,13 +8777,23 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
}
PendingWalStats.wal_sync++;
+
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PgStat_BackendPending *entry_ref;
+
+ entry_ref = pgstat_prep_backend_pending(MyProcNumber);
+ entry_ref->pending_wal.wal_sync++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(entry_ref->pending_wal.wal_sync_time,
+ end, start);
+ }
}
/*
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index a942e04bb97..65b2f687816 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -369,7 +369,7 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
.fixed_amount = false,
.write_to_file = false,
- .allow_critical_section = false,
+ .allow_critical_section = true,
.accessed_across_databases = true,
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 79e4d0a3053..94762b12f8d 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -24,6 +24,14 @@
#include "utils/pgstat_internal.h"
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Returns statistics of a backend by proc number.
*/
@@ -77,6 +85,52 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
}
}
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_BackendPending *pendingent;
+ PgStat_WalCounters *bktype_shstats;
+ PgStat_PendingWalStats pending_wal;
+ WalUsage wal_usage_diff = {0};
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ pendingent = (PgStat_BackendPending *) entry_ref->pending;
+ bktype_shstats = &shbackendent->stats.wal_stats;
+ pending_wal = pendingent->pending_wal;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+#define WALSTAT_ACC_INSTR_TIME(fld) \
+ (bktype_shstats->fld += INSTR_TIME_GET_MICROSEC(pending_wal.fld))
+ WALSTAT_ACC(wal_buffers_full, pending_wal);
+ WALSTAT_ACC(wal_write, pending_wal);
+ WALSTAT_ACC(wal_sync, pending_wal);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+ WALSTAT_ACC_INSTR_TIME(wal_write_time);
+ WALSTAT_ACC_INSTR_TIME(wal_sync_time);
+#undef WALSTAT_ACC_INSTR_TIME
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Wrapper routine to flush backend statistics.
*/
@@ -94,6 +148,9 @@ pgstat_flush_backend_entry(PgStat_EntryRef *entry_ref, bool nowait,
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return true;
@@ -147,6 +204,8 @@ pgstat_create_backend(ProcNumber procnum)
* e.g. if we previously used this proc number.
*/
memset(&shstatent->stats, 0, sizeof(shstatent->stats));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index bfc06178a68..33627642261 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -55,6 +55,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97510d48eef..52bfda4f923 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal () returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1670,6 +1670,55 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_stats;
+
+ /* save tuples with data from this PgStat_BktypeIO */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index ba02ba53b29..9d57c69c152 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5929,6 +5929,13 @@
proargmodes => '{o,o,o,o,o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index bb8e0044a47..821104047a0 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -334,24 +334,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -496,6 +478,29 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_WAL.
+ */
+ PgStat_PendingWalStats pending_wal;
+} PgStat_BackendPending;
+
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_stats;
+} PgStat_Backend;
/*
* Functions in pgstat.c
@@ -826,6 +831,4 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
/* updated directly by backends and background processes */
extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
-
-
#endif /* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 0b21603c863..5a284d8c94a 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -618,7 +618,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern void pgstat_flush_backend(bool nowait, bits32 flags);
extern PgStat_BackendPending *pgstat_prep_backend_pending(ProcNumber procnum);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208e..cc01fdf2741 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf7..28fe0a1a7d0 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
Hi,
On 2025-01-16 15:59:31 +0000, Bertrand Drouvot wrote:
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote: + * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.
I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.The attached is implementing in 0001 the proposition done in [1], i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when neededNote that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).
This is a preposterously bad idea. The restriction to not allocate memory in
critical sections exists for a reason, why on earth should this code be
allowed to just opt out of the restriction of not allowing memory allocations
in critical sections?
The only cases where we can somewhat safely allocate memory in critical
sections is when using memory contexts with pre-reserved memory, where there's
a pretty low bound on how much memory is going to be needed. E.g. logging a
message inside a critical section, where elog.c can reset ErrorContext
afterwards.
Greetings,
Andres Freund
Hi,
On Thu, Jan 16, 2025 at 11:38:47AM -0500, Andres Freund wrote:
Hi,
On 2025-01-16 15:59:31 +0000, Bertrand Drouvot wrote:
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote: + * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.The attached is implementing in 0001 the proposition done in [1], i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when neededNote that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).This is a preposterously bad idea. The restriction to not allocate memory in
critical sections exists for a reason,
Thanks for sharing your thoughts on it. In [1]/messages/by-id/66efowskppsns35v5u2m7k4sdnl7yoz5bo64tdjwq7r5lhplrz@y7dme5xwh2r5, you said:
"
My view is that for IO stats no memory allocation should be required - that
used to be the case and should be the case again
"
So, do you think that the initial proposal that has been made here (See R1. in
[2]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
"
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"
and implemented up to v4 is a viable approach?
[1]: /messages/by-id/66efowskppsns35v5u2m7k4sdnl7yoz5bo64tdjwq7r5lhplrz@y7dme5xwh2r5
[2]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
On Thu, Jan 16, 2025 at 11:38:47AM -0500, Andres Freund wrote:
Hi,
On 2025-01-16 15:59:31 +0000, Bertrand Drouvot wrote:
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote: + * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.The attached is implementing in 0001 the proposition done in [1], i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when neededNote that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).This is a preposterously bad idea. The restriction to not allocate memory in
critical sections exists for a reason,Thanks for sharing your thoughts on it. In [1], you said:
"
My view is that for IO stats no memory allocation should be required - that
used to be the case and should be the case again
"So, do you think that the initial proposal that has been made here (See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:
Well, I think this first needs be fixed for for the IO stats change made in
commit 9aea73fc61d
Author: Michael Paquier <michael@paquier.xyz>
Date: 2024-12-19 13:19:22 +0900
Add backend-level statistics to pgstats
Once we have a pattern to model after, we can apply the same scheme here.
"
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general than
that, handling this for all types of backend stats, not just for WAL.
Greetings,
Andres Freund
On Thu, Jan 16, 2025 at 12:44:20PM -0500, Andres Freund wrote:
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
So, do you think that the initial proposal that has been made here (See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:Well, I think this first needs be fixed for for the IO stats change made in
Once we have a pattern to model after, we can apply the same scheme here.
Okay, thanks for the input. I was not sure what you intended
originally with all this part of the backend code, and how much would
be acceptable. The line is clear now.
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general than
that, handling this for all types of backend stats, not just for WAL.
Agreed to use the same concept for all these parts of the backend
stats kind rather than two of them. Will send a reply on the original
backend I/O thread as well.
--
Michael
Hi,
On Fri, Jan 17, 2025 at 08:43:57AM +0900, Michael Paquier wrote:
On Thu, Jan 16, 2025 at 12:44:20PM -0500, Andres Freund wrote:
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
So, do you think that the initial proposal that has been made here (See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:Well, I think this first needs be fixed for for the IO stats change made in
Once we have a pattern to model after, we can apply the same scheme here.
Okay, thanks for the input. I was not sure what you intended
originally with all this part of the backend code, and how much would
be acceptable. The line is clear now.0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general than
that, handling this for all types of backend stats, not just for WAL.Agreed to use the same concept for all these parts of the backend
stats kind rather than two of them. Will send a reply on the original
backend I/O thread as well.
PFA v6 that now relies on the new PendingBackendStats variable introduced in
4feba03d8b9.
Remark: I moved PendingBackendStats back to pgstat.h because I think that the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more complex IO
stats handling). So PendingBackendStats is now visible to the outside world like
PendingWalStats and friends.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v6-0001-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From 4f2a197b086dd64a2723a8491cc66453012743e9 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v6 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 56 ++++++++++++++++++-----------
1 file changed, 36 insertions(+), 20 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0f5e0a9778d..0442be03304 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1604,20 +1604,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 9
+#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1639,34 +1641,48 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats->wal_write);
- values[5] = Int64GetDatum(wal_stats->wal_sync);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_stats.wal_write);
+ values[5] = Int64GetDatum(wal_stats.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
- values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[8] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v6-0002-Adding-a-new-PgStat_WalCounters-struct.patchtext/x-diff; charset=us-asciiDownload
From 49b4d7f60c30dfa53d7994b05ce012782f449190 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Jan 2025 15:06:01 +0000
Subject: [PATCH v6 2/3] Adding a new PgStat_WalCounters struct
This new struct contains only the counters related to the WAL statistics.
This will be used in a follow-up commit that uses the same structures but
for the PGSTAT_KIND_BACKEND statistics kind.
---
src/backend/utils/activity/pgstat_wal.c | 4 ++--
src/backend/utils/adt/pgstatfuncs.c | 28 +++++++++++++------------
src/include/pgstat.h | 7 ++++++-
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 24 insertions(+), 16 deletions(-)
14.1% src/backend/utils/activity/
79.8% src/backend/utils/adt/
5.0% src/include/
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 18fa6b2936a..bfc06178a68 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -117,9 +117,9 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
#define WALSTAT_ACC_INSTR_TIME(fld) \
- (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
+ (stats_shmem->stats.wal_counters.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0442be03304..97510d48eef 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1607,10 +1607,11 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
* pg_stat_wal_build_tuple
*
* Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 9
TupleDesc tupdesc;
@@ -1642,26 +1643,26 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats.wal_write);
- values[5] = Int64GetDatum(wal_stats.wal_sync);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
+ values[4] = Int64GetDatum(wal_counters.wal_write);
+ values[5] = Int64GetDatum(wal_counters.wal_sync);
/* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats.wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats.wal_sync_time) / 1000.0);
+ values[6] = Float8GetDatum(((double) wal_counters.wal_write_time) / 1000.0);
+ values[7] = Float8GetDatum(((double) wal_counters.wal_sync_time) / 1000.0);
- if (wal_stats.stat_reset_timestamp != 0)
- values[8] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[8] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[8] = true;
@@ -1680,7 +1681,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters,
+ wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d0d45150977..9bbba883685 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -467,7 +467,7 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter autoanalyze_count;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
@@ -477,6 +477,11 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_sync;
PgStat_Counter wal_write_time;
PgStat_Counter wal_sync_time;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index d5aa5c295ae..d752e626cb0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2178,6 +2178,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
v6-0003-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From c00c545ed8da5c37d5590018df30b633b32b2ed7 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v6 3/3] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/config.sgml | 4 +-
doc/src/sgml/monitoring.sgml | 19 +++++
src/backend/access/transam/xlog.c | 35 +++++++-
src/backend/utils/activity/pgstat_backend.c | 91 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 53 +++++++++++-
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 48 +++++++----
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 ++++
src/test/regress/sql/stats.sql | 6 ++
11 files changed, 253 insertions(+), 29 deletions(-)
12.5% doc/src/sgml/
11.2% src/backend/access/transam/
37.5% src/backend/utils/activity/
17.3% src/backend/utils/adt/
6.1% src/include/catalog/
6.3% src/include/
5.0% src/test/regress/expected/
3.8% src/test/regress/sql/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a8866292d46..5a48b427609 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8292,7 +8292,9 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
measure the overhead of timing on your system.
I/O timing information is
displayed in <link linkend="monitoring-pg-stat-wal-view">
- <structname>pg_stat_wal</structname></link>.
+ <structname>pg_stat_wal</structname></link> and in the output of the
+ <link linkend="pg-stat-get-backend-wal">
+ <function>pg_stat_get_backend_wal()</function></link> function.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e5888fae2b5..8fdf7d13f21 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4824,6 +4824,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index bf3dbda901d..f7c5b1e5e3b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2058,6 +2058,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
XLogWrite(WriteRqst, tli, false);
LWLockRelease(WALWriteLock);
PendingWalStats.wal_buffers_full++;
+
+ /* Add the per-backend related statistic */
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ PendingBackendStats.pending_wal.wal_buffers_full++;
+
TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
}
/* Re-acquire WALBufMappingLock and retry */
@@ -2426,11 +2431,14 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
Size nleft;
ssize_t written;
instr_time start;
+ instr_time end;
/* OK to write the page(s) */
from = XLogCtl->pages + startidx * (Size) XLOG_BLCKSZ;
nbytes = npages * (Size) XLOG_BLCKSZ;
nleft = nbytes;
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
do
{
errno = 0;
@@ -2451,14 +2459,22 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
}
PendingWalStats.wal_write++;
+ /* Add the per-backend related statistic */
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendStats.pending_wal.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendStats.pending_wal.wal_write_time,
+ end, start);
+ }
+
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -8684,8 +8700,11 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
{
char *msg = NULL;
instr_time start;
+ instr_time end;
Assert(tli != 0);
+ /* keep compiler quiet */
+ INSTR_TIME_SET_ZERO(end);
/*
* Quick exit if fsync is disabled or write() has already synced the WAL
@@ -8751,13 +8770,21 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
*/
if (track_wal_io_timing)
{
- instr_time end;
-
INSTR_TIME_SET_CURRENT(end);
INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
}
PendingWalStats.wal_sync++;
+
+ /* Add the per-backend related statistic */
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendStats.pending_wal.wal_sync++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendStats.pending_wal.wal_sync_time,
+ end, start);
+ }
}
/*
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index bcf9e4b1487..90fb5f72bc1 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -33,11 +33,18 @@
* reported within critical sections so we use static memory in order to avoid
* memory allocation.
*/
-static PgStat_BackendPending PendingBackendStats;
+PgStat_BackendPending PendingBackendStats;
/*
- * Utility routines to report I/O stats for backends, kept here to avoid
- * exposing PendingBackendStats to the outside world.
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
+/*
+ * Utility routines to report I/O stats for backends.
*/
void
pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
@@ -131,6 +138,79 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether any WAL activity has occurred since last time, not
+ * only the number of generated WAL records but also the numbers of WAL
+ * writes and syncs need to be checked. Because even transaction that
+ * generates no WAL records can write or sync WAL data when flushing the
+ * data pages.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ PgStat_PendingWalStats pending_wal;
+
+ pending_wal = PendingBackendStats.pending_wal;
+
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
+ pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ PgStat_PendingWalStats pending_wal = PendingBackendStats.pending_wal;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_stats;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+#define WALSTAT_ACC_INSTR_TIME(fld) \
+ (bktype_shstats->fld += INSTR_TIME_GET_MICROSEC(pending_wal.fld))
+ WALSTAT_ACC(wal_buffers_full, pending_wal);
+ WALSTAT_ACC(wal_write, pending_wal);
+ WALSTAT_ACC(wal_sync, pending_wal);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+ WALSTAT_ACC_INSTR_TIME(wal_write_time);
+ WALSTAT_ACC_INSTR_TIME(wal_sync_time);
+#undef WALSTAT_ACC_INSTR_TIME
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+
+ /*
+ * Clear out the statistics buffer, so it can be re-used.
+ */
+ MemSet(&PendingBackendStats.pending_wal, 0, sizeof(PendingWalStats));
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -158,6 +238,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -205,6 +288,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index bfc06178a68..33627642261 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -55,6 +55,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 97510d48eef..6da98bafbf2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1670,6 +1670,55 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_stats;
+
+ /* save tuples with data from this PgStat_BktypeIO */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 18560755d26..ca60af1e434 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5929,6 +5929,13 @@
proargmodes => '{o,o,o,o,o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9bbba883685..27a0053ea21 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -338,24 +338,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,30 @@ typedef struct PgStat_PendingWalStats
instr_time wal_sync_time;
} PgStat_PendingWalStats;
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+
+ /*
+ * Backend statistics store the same amount of WAL data as
+ * PGSTAT_KIND_WAL.
+ */
+ PgStat_PendingWalStats pending_wal;
+} PgStat_BackendPending;
+
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_stats;
+} PgStat_Backend;
/*
* Functions in pgstat.c
@@ -840,5 +846,11 @@ extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
/* updated directly by backends and background processes */
extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
+/*
+ * Variables in pgstat_backend.c
+ */
+
+/* updated directly by backends */
+extern PGDLLIMPORT PgStat_BackendPending PendingBackendStats;
#endif /* PGSTAT_H */
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index a3d39d2b725..6155a3fc983 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208e..cc01fdf2741 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf7..28fe0a1a7d0 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
On Tue, Jan 21, 2025 at 07:19:55AM +0000, Bertrand Drouvot wrote:
PFA v6 that now relies on the new PendingBackendStats variable introduced in
4feba03d8b9.Remark: I moved PendingBackendStats back to pgstat.h because I think that the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more complex IO
stats handling). So PendingBackendStats is now visible to the outside world like
PendingWalStats and friends.
You are re-doing here a pattern I was trying to avoid so as we don't
copy-paste more checks based on pgstat_tracks_backend_bktype more than
necessary. I am wondering if we should think harder about the
interface used to register WAL stats, and make it more consistent with
the way pg_stat_io is handled, avoiding the hardcoded attribute
numbers if we have an enum to control which field to update in some
input routine.
As we have only five counters in PgStat_PendingWalStats, the result
you have is not that invasive, true.
Are you sure that the interactions between pgWalUsage, prevWalUsage
and prevBackendWalUsage are correct? As far I got it from a code
read, prevWalUsage, prevBackendWalUsage and their local trackings in
pgstat_backend.c and pgstat_wal.c rely on instrument.c as the primary
source, as pgWalUsage can never be reset. Is that right?
--
Michael
Hi,
On Thu, Jan 23, 2025 at 05:05:30PM +0900, Michael Paquier wrote:
On Tue, Jan 21, 2025 at 07:19:55AM +0000, Bertrand Drouvot wrote:
PFA v6 that now relies on the new PendingBackendStats variable introduced in
4feba03d8b9.Remark: I moved PendingBackendStats back to pgstat.h because I think that the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more complex IO
stats handling). So PendingBackendStats is now visible to the outside world like
PendingWalStats and friends.You are re-doing here a pattern I was trying to avoid so as we don't
copy-paste more checks based on pgstat_tracks_backend_bktype more than
necessary.
I'm not sure I get it. pgstat_tracks_backend_bktype() is also called in
pgstat_count_backend_io_op() and pgstat_count_backend_io_op_time(). What issue
do you see with the extra calls part of this patch?
I am wondering if we should think harder about the
interface used to register WAL stats, and make it more consistent with
the way pg_stat_io is handled, avoiding the hardcoded attribute
numbers if we have an enum to control which field to update in some
input routine.
Not sure as WAL stats just tracks a single dimension unlike IO stats which track
both IOObject and IOContext. What would be the benefit(s)?
As we have only five counters in PgStat_PendingWalStats, the result
you have is not that invasive, true.
And only one dimension.
Are you sure that the interactions between pgWalUsage, prevWalUsage
and prevBackendWalUsage are correct?
I think so and according to my testing I can see WalUsage values
that correlate nicely between pg_stat_wal() and pg_stat_get_backend_wal().
As far I got it from a code
read, prevWalUsage, prevBackendWalUsage and their local trackings in
pgstat_backend.c and pgstat_wal.c rely on instrument.c as the primary
source, as pgWalUsage can never be reset. Is that right?
yeah, IIUC pgWalUsage acts as the primary source that both prevWalUsage and
prevBackendWalUsage diff against to calculate incremental stats.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
Thank you for the patchset. Having per-backend WAL statistics,
in addition to cluster-wide ones, is useful.
I had a few comments while looking at v6-0003-* patch.
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
Maybe an explicit call to AuxiliaryPidGetProc() followed by a check
for pgstat_tracks_backend_bktype() would be more maintainable.
Since the processes tracked by AuxiliaryPidGetProc and
pgstat_tracks_backend_bktype might diverge in future.
On that note, it is not clear to me why the WAL writer statistics are not
included in per backend
wal statistics? I understand the same limitation currently exists in
pgstats_track_io_bktype(), but why does that need to be extended to
WAL statistics?
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> (
<type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
Should the naming describe what is being returned more clearly?
Something like pg_stat_get_backend_wal_activity()? Currently it
suggests that it returns a backend's WAL, which is not the case.
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendStats.pending_wal.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendStats.pending_wal.wal_write_time,
+ end, start);
+ }
At the risk of nitpicking, may I suggest moving the above code, which is
under the
track_wal_io_timing check, to the existing check before this added chunk?
This way, all code related to track_wal_io_timing will be grouped together,
closer to where the "end" variable is computed.
Thank you,
Rahila Syed
On Tue, Jan 21, 2025 at 12:50 PM Bertrand Drouvot <
bertranddrouvot.pg@gmail.com> wrote:
Show quoted text
Hi,
On Fri, Jan 17, 2025 at 08:43:57AM +0900, Michael Paquier wrote:
On Thu, Jan 16, 2025 at 12:44:20PM -0500, Andres Freund wrote:
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
So, do you think that the initial proposal that has been made here
(See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:
Well, I think this first needs be fixed for for the IO stats change
made in
Once we have a pattern to model after, we can apply the same scheme
here.
Okay, thanks for the input. I was not sure what you intended
originally with all this part of the backend code, and how much would
be acceptable. The line is clear now.0003 does not rely on pgstat_prep_backend_pending() for its pending
statistics
but on a new PendingBackendWalStats variable. The reason is that the
pending wal
statistics are incremented in a critical section (see XLogWrite(),
and so
a call to pgstat_prep_pending_entry() could trigger a failed
assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 ||
(context)->allowInCritSection"
"
and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general
than
that, handling this for all types of backend stats, not just for WAL.
Agreed to use the same concept for all these parts of the backend
stats kind rather than two of them. Will send a reply on the original
backend I/O thread as well.PFA v6 that now relies on the new PendingBackendStats variable introduced
in
4feba03d8b9.Remark: I moved PendingBackendStats back to pgstat.h because I think that
the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more
complex IO
stats handling). So PendingBackendStats is now visible to the outside
world like
PendingWalStats and friends.Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Wed, Jan 29, 2025 at 01:14:09PM +0530, Rahila Syed wrote:
Hi,
Thank you for the patchset. Having per-backend WAL statistics,
in addition to cluster-wide ones, is useful.
Thanks for looking at it!
I had a few comments while looking at v6-0003-* patch.
+ /* + * This could be an auxiliary process but these do not report backend + * statistics due to pgstat_tracks_backend_bktype(), so there is no need + * for an extra call to AuxiliaryPidGetProc(). + */ + if (!proc) + PG_RETURN_NULL();Maybe an explicit call to AuxiliaryPidGetProc() followed by a check
for pgstat_tracks_backend_bktype() would be more maintainable.
Since the processes tracked by AuxiliaryPidGetProc and
pgstat_tracks_backend_bktype might diverge in future.
I think that could make sense but that might need a separate thread as this
is not only related to this patch (already done that way in pg_stat_reset_backend_stats()
and pg_stat_get_backend_io()).
On that note, it is not clear to me why the WAL writer statistics are not
included in per backend
wal statistics? I understand the same limitation currently exists in
pgstats_track_io_bktype(), but why does that need to be extended to
WAL statistics?
WAL writer might be fine but that would not add that much value here because
it's going to appear anyway in pg_stat_io once Nazir's patch [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com gets in.
+ <primary>pg_stat_get_backend_wal</primary> + </indexterm> + <function>pg_stat_get_backend_wal</function> ( <type>integer</type> ) + <returnvalue>record</returnvalue> + </para> Should the naming describe what is being returned more clearly? Something like pg_stat_get_backend_wal_activity()? Currently it suggests that it returns a backend's WAL, which is not the case.
Not sure. It aligns with pg_stat_get_backend_io() and the "stat" in its name
suggests this is related to stats.
+ if (pgstat_tracks_backend_bktype(MyBackendType)) + { + PendingBackendStats.pending_wal.wal_write++; + + if (track_wal_io_timing) + INSTR_TIME_ACCUM_DIFF(PendingBackendStats.pending_wal.wal_write_time, + end, start); + } At the risk of nitpicking, may I suggest moving the above code, which is under the track_wal_io_timing check, to the existing check before this added chunk? This way, all code related to track_wal_io_timing will be grouped together, closer to where the "end" variable is computed.
I think we're waiting [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com to be in before moving forward with this patch. I think
that [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com also touches this part of the code. I'll keep your remark in mind and
see if it still makes sense once [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com gets in.
[1]: /messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Thu, Jan 23, 2025 at 09:57:50AM +0000, Bertrand Drouvot wrote:
On Thu, Jan 23, 2025 at 05:05:30PM +0900, Michael Paquier wrote:
As far I got it from a code
read, prevWalUsage, prevBackendWalUsage and their local trackings in
pgstat_backend.c and pgstat_wal.c rely on instrument.c as the primary
source, as pgWalUsage can never be reset. Is that right?yeah, IIUC pgWalUsage acts as the primary source that both prevWalUsage and
prevBackendWalUsage diff against to calculate incremental stats.
Now that a051e71e28a is in, I think that we can reduce the scope of this patch
(i.e reduce the number of stats provided by pg_stat_get_backend_wal()).
I think we can keep:
wal_records
wal_fpi
wal_bytes (because it differs from write_bytes in pg_stat_get_backend_io())
wal_buffers_full
The first 3 are in the WalUsage struct.
I think that:
wal_write (and wal_write_time)
wal_sync (and wal_sync_time)
can be extracted from pg_stat_get_backend_io(), so there is no need to duplicate
this information. The same comment could be done for pg_stat_wal and pg_stat_io
though, but pg_stat_wal already exists so removing fields has not the same
effect.
What are you thoughts about keeping in pg_stat_get_backend_wal() only the
4 stats mentioned above?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Feb 04, 2025 at 08:49:41AM +0000, Bertrand Drouvot wrote:
I think that:
wal_write (and wal_write_time)
wal_sync (and wal_sync_time)
Right. We are not able to get this data from XLogWrite() and
issue_xlog_fsync(), so there is no need to duplicate that anymore in
your patch.
can be extracted from pg_stat_get_backend_io(), so there is no need to duplicate
this information. The same comment could be done for pg_stat_wal and pg_stat_io
though, but pg_stat_wal already exists so removing fields has not the same
effect.What are you thoughts about keeping in pg_stat_get_backend_wal() only the
4 stats mentioned above?
wal_buffers_full is incremented in AdvanceXLInsertBuffer(), part of
PendingWalStats. wal_records, wal_fpi and wal_bytes are part of the
instrumentation field. It looks to me that if you discard the
wal_buffers_full part, the implementation of the data in the backend
could just be tied to the fields coming from WalUsage.
Actually, could it actually be useful to have wal_buffers_full be
available in WalUsage, so as it would show up in EXPLAIN in a
per-query basis with show_wal_usage()? Consolidating that would make
what you are trying it a bit easier, because we would have the
WalUsage and the pg_stat_io parts without any of the PendingWalStats
part. And it is just a counter not that expensive to handle, like the
data for records, fpis and bytes. This extra information could be
useful to have in the context of an EXPLAIN.
--
Michael
Hi,
On Wed, Feb 05, 2025 at 11:16:15AM +0900, Michael Paquier wrote:
On Tue, Feb 04, 2025 at 08:49:41AM +0000, Bertrand Drouvot wrote:
can be extracted from pg_stat_get_backend_io(), so there is no need to duplicate
this information. The same comment could be done for pg_stat_wal and pg_stat_io
though, but pg_stat_wal already exists so removing fields has not the same
effect.What are you thoughts about keeping in pg_stat_get_backend_wal() only the
4 stats mentioned above?wal_buffers_full is incremented in AdvanceXLInsertBuffer(), part of
PendingWalStats. wal_records, wal_fpi and wal_bytes are part of the
instrumentation field. It looks to me that if you discard the
wal_buffers_full part, the implementation of the data in the backend
could just be tied to the fields coming from WalUsage.
Yup.
Actually, could it actually be useful to have wal_buffers_full be
available in WalUsage, so as it would show up in EXPLAIN in a
per-query basis with show_wal_usage()?
Yeah, that might help. One could not be 100% sure that the statement being
explained is fully responsible of the wal buffer being full (as it could just be
a "victim" of an already almost full wal buffer). But OTOH that might help to
understand why an EXPLAIN analyze is slower than another one (i.e one generating
wal buffer full and the other not). Also I think it could be added to
pg_stat_statements and could also provide valuable information.
Consolidating that would make
what you are trying it a bit easier, because we would have the
WalUsage and the pg_stat_io parts without any of the PendingWalStats
part. And it is just a counter not that expensive to handle, like the
data for records, fpis and bytes. This extra information could be
useful to have in the context of an EXPLAIN.
Yeah, I did a bit of archeology to try to understand why it's not already the
case. From what I can see, in commit time order:
1. df3b181499 introduced the WalUsage structure
2. 6b466bf5f2 added the wal usage in pg_stat_statements
3. 33e05f89c5 added the wal usage in EXPLAIN
4. 8d9a935965f added pg_stat_wal (and wal_buffers_full)
5. 01469241b2f added the wal usage in pg_stat_wal
So, wal_buffers_full has been introduced after the WalUsage structure was
there but I don't see any reason in the emails as to why it's not in the WalUsage
structure (I might have missed it though).
I think that this proposal makes sense but would need a dedicated thread,
thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Feb 05, 2025 at 10:22:55AM +0000, Bertrand Drouvot wrote:
So, wal_buffers_full has been introduced after the WalUsage structure was
there but I don't see any reason in the emails as to why it's not in the WalUsage
structure (I might have missed it though).I think that this proposal makes sense but would need a dedicated thread,
thoughts?
Using a separate thread for a change like that makes sense to me. I
have to admit that the simplifications in terms of designs for what
we're discussing here makes such a change more valuable. Adding this
information to WalUsage is one thing. Showing it in EXPLAIN is a
second thing. Doing the former simplifies the patch you are proposing
here. We don't necessarily have to do the latter, but I don't see a
reason to not do it, either.
--
Michael
Hi,
On Wed, Feb 05, 2025 at 07:31:13PM +0900, Michael Paquier wrote:
On Wed, Feb 05, 2025 at 10:22:55AM +0000, Bertrand Drouvot wrote:
So, wal_buffers_full has been introduced after the WalUsage structure was
there but I don't see any reason in the emails as to why it's not in the WalUsage
structure (I might have missed it though).I think that this proposal makes sense but would need a dedicated thread,
thoughts?Using a separate thread for a change like that makes sense to me. I
have to admit that the simplifications in terms of designs for what
we're discussing here makes such a change more valuable. Adding this
information to WalUsage is one thing. Showing it in EXPLAIN is a
second thing. Doing the former simplifies the patch you are proposing
here. We don't necessarily have to do the latter, but I don't see a
reason to not do it, either.
Agree, I'll start a dedicated thread for that.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Feb 05, 2025 at 02:28:08PM +0000, Bertrand Drouvot wrote:
Agree, I'll start a dedicated thread for that.
Thanks.
--
Michael
Hi,
On Thu, Feb 06, 2025 at 10:38:55AM +0900, Michael Paquier wrote:
On Wed, Feb 05, 2025 at 02:28:08PM +0000, Bertrand Drouvot wrote:
Agree, I'll start a dedicated thread for that.
Thanks.
Done in [1]/messages/by-id/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal.
[1]: /messages/by-id/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Thu, Feb 06, 2025 at 10:28:48AM +0000, Bertrand Drouvot wrote:
Hi,
On Thu, Feb 06, 2025 at 10:38:55AM +0900, Michael Paquier wrote:
On Wed, Feb 05, 2025 at 02:28:08PM +0000, Bertrand Drouvot wrote:
Agree, I'll start a dedicated thread for that.
Thanks.
Done in [1].
[1]: /messages/by-id/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
Thanks for having committed the work done in [1] above.
There is still something that would simplify what is done here: it's the
"the elimination of the write & sync columns for pg_stat_wal" mentioned in [2]/messages/by-id/Z6L3ZNGCljZZouvN@paquier.xyz.
I'll add it in the new patch serie for this thread (that simplifies the new
pg_stat_wal_build_tuple() among other things) unless Nazir beat me to it.
[2]: /messages/by-id/Z6L3ZNGCljZZouvN@paquier.xyz
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
Thank you for working on this!
On Mon, 17 Feb 2025 at 09:59, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
There is still something that would simplify what is done here: it's the
"the elimination of the write & sync columns for pg_stat_wal" mentioned in [2].I'll add it in the new patch serie for this thread (that simplifies the new
pg_stat_wal_build_tuple() among other things) unless Nazir beat me to it.
I am working on some other stuff right now. Please feel free to work on it.
--
Regards,
Nazir Bilal Yavuz
Microsoft
On Mon, Feb 17, 2025 at 06:59:40AM +0000, Bertrand Drouvot wrote:
There is still something that would simplify what is done here: it's the
"the elimination of the write & sync columns for pg_stat_wal" mentioned in [2].
Yeah, still you cannot just remove them because the data tracked in
pg_stat_io is not entirely the same, right?
--
Michael
Hi,
On Mon, Feb 17, 2025 at 04:25:46PM +0900, Michael Paquier wrote:
On Mon, Feb 17, 2025 at 06:59:40AM +0000, Bertrand Drouvot wrote:
There is still something that would simplify what is done here: it's the
"the elimination of the write & sync columns for pg_stat_wal" mentioned in [2].Yeah, still you cannot just remove them because the data tracked in
pg_stat_io is not entirely the same, right?
I think that we can just remove them. They are tracked and incremented at the
exact same places in issue_xlog_fsync() and XLogWrite(). What differs is the
"bytes" (as pg_stat_wal.wal_bytes somehow "focus" on the wal records size while
the pg_stat_io's unit is the wal_block_size) and we keep them in both places.
Also it looks like we can get rid of PendingWalStats...
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Mon, Feb 17, 2025 at 07:59:26AM +0000, Bertrand Drouvot wrote:
Hi,
On Mon, Feb 17, 2025 at 04:25:46PM +0900, Michael Paquier wrote:
On Mon, Feb 17, 2025 at 06:59:40AM +0000, Bertrand Drouvot wrote:
There is still something that would simplify what is done here: it's the
"the elimination of the write & sync columns for pg_stat_wal" mentioned in [2].Yeah, still you cannot just remove them because the data tracked in
pg_stat_io is not entirely the same, right?I think that we can just remove them. They are tracked and incremented at the
exact same places in issue_xlog_fsync() and XLogWrite(). What differs is the
"bytes" (as pg_stat_wal.wal_bytes somehow "focus" on the wal records size while
the pg_stat_io's unit is the wal_block_size) and we keep them in both places.
Also it looks like we can get rid of PendingWalStats...
PFA the whole picture. 0001 is implementing the fields removal in pg_stat_wal
(and also PendingWalStats). I think that's ok given the backend's type for which
pgstat_tracks_io_bktype() returns false. But now you make me doubt about 0001.
Anyway, it's probably better to move the 0001 discussion to a dedicated thread,
thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v7-0001-Remove-wal_-sync-write-_time-from-pg_stat_wal.patchtext/x-diff; charset=us-asciiDownload
From 3e54a2f32b79238cefdd946e2fc017de0883c182 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 17 Feb 2025 07:17:49 +0000
Subject: [PATCH v7 1/4] Remove wal_[sync|write][_time] from pg_stat_wal
a051e71e28a added this information into pg_stat_io (with more details and
granularity), so there is no need to keep it in pg_stat_wal. This also
allows to remove PendingWalStats and simplifies up coming commits related
to per backend WAL statistics.
---
doc/src/sgml/config.sgml | 4 +-
doc/src/sgml/monitoring.sgml | 98 +++++++++++--------------
doc/src/sgml/wal.sgml | 9 ++-
src/backend/access/transam/xlog.c | 27 -------
src/backend/catalog/system_views.sql | 4 -
src/backend/utils/activity/pgstat_wal.c | 24 +-----
src/backend/utils/adt/pgstatfuncs.c | 20 +----
src/include/catalog/pg_proc.dat | 6 +-
src/include/pgstat.h | 28 -------
src/test/regress/expected/rules.out | 6 +-
src/tools/pgindent/typedefs.list | 1 -
11 files changed, 57 insertions(+), 170 deletions(-)
56.1% doc/src/sgml/
6.6% src/backend/access/transam/
11.6% src/backend/utils/activity/
11.0% src/backend/utils/adt/
3.4% src/include/catalog/
7.7% src/include/
3.3% src/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 336630ce417..9fff9048551 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -8341,8 +8341,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
You can use the <application>pg_test_timing</application> tool to
measure the overhead of timing on your system.
I/O timing information is
- displayed in <link linkend="monitoring-pg-stat-wal-view">
- <structname>pg_stat_wal</structname></link>.
+ displayed in <link linkend="monitoring-pg-stat-io-view">
+ <structname>pg_stat_io</structname></link> for the wal <literal>object</literal>.
Only superusers and users with the appropriate <literal>SET</literal>
privilege can change this setting.
</para>
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 928a6eb64b0..e645591ce53 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2930,6 +2930,47 @@ description | Waiting for a newly initialized WAL file to reach durable storage
writer</literal>.
</para>
+ <para>
+ For the <literal>wal</literal> <structfield>object</structfield>:
+ <itemizedlist>
+ <listitem>
+ <para>
+ <structfield>writes</structfield> is the number of times WAL buffers were
+ written out to disk via <function>XLogWrite</function> request (See <xref linkend="wal-configuration"/>
+ for more information about the internal WAL function <function>XLogWrite</function>).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <structfield>fsyncs</structfield> is the number of times WAL files were
+ synced to disk via <function>issue_xlog_fsync</function> request (if <xref linkend="guc-fsync"/>
+ is <literal>on</literal> and <xref linkend="guc-wal-sync-method"/> is either
+ <literal>fdatasync</literal>, <literal>fsync</literal> or <literal>fsync_writethrough</literal>,
+ otherwise zero) (See <xref linkend="wal-configuration"/> for more information about
+ the internal WAL function <function>issue_xlog_fsync</function>).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <structfield>write_time</structfield> is the total amount of time spent writing
+ WAL buffers to disk via <function>XLogWrite</function> request (if <xref linkend="guc-track-wal-io-timing"/>
+ is enabled, otherwise zero). This includes the sync time when <varname>wal_sync_method</varname>
+ is either <literal>open_datasync</literal> or <literal>open_sync</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ <structfield>sync_time</structfield> is the total amount of time spent syncing
+ WAL files to disk via <function>issue_xlog_fsync</function> request (if
+ <varname>track_wal_io_timing</varname> is enabled, <varname>fsync</varname> is
+ <literal>on</literal>, and <varname>wal_sync_method</varname> is either
+ <literal>fdatasync</literal>, <literal>fsync</literal> or <literal>fsync_writethrough</literal>,
+ otherwise zero).
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
<para>
<structname>pg_stat_io</structname> can be used to inform database tuning.
For example:
@@ -3255,63 +3296,6 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
- <row>
- <entry role="catalog_table_entry"><para role="column_definition">
- <structfield>wal_write</structfield> <type>bigint</type>
- </para>
- <para>
- Number of times WAL buffers were written out to disk via
- <function>XLogWrite</function> request.
- See <xref linkend="wal-configuration"/> for more information about
- the internal WAL function <function>XLogWrite</function>.
- </para></entry>
- </row>
-
- <row>
- <entry role="catalog_table_entry"><para role="column_definition">
- <structfield>wal_sync</structfield> <type>bigint</type>
- </para>
- <para>
- Number of times WAL files were synced to disk via
- <function>issue_xlog_fsync</function> request
- (if <xref linkend="guc-fsync"/> is <literal>on</literal> and
- <xref linkend="guc-wal-sync-method"/> is either
- <literal>fdatasync</literal>, <literal>fsync</literal> or
- <literal>fsync_writethrough</literal>, otherwise zero).
- See <xref linkend="wal-configuration"/> for more information about
- the internal WAL function <function>issue_xlog_fsync</function>.
- </para></entry>
- </row>
-
- <row>
- <entry role="catalog_table_entry"><para role="column_definition">
- <structfield>wal_write_time</structfield> <type>double precision</type>
- </para>
- <para>
- Total amount of time spent writing WAL buffers to disk via
- <function>XLogWrite</function> request, in milliseconds
- (if <xref linkend="guc-track-wal-io-timing"/> is enabled,
- otherwise zero). This includes the sync time when
- <varname>wal_sync_method</varname> is either
- <literal>open_datasync</literal> or <literal>open_sync</literal>.
- </para></entry>
- </row>
-
- <row>
- <entry role="catalog_table_entry"><para role="column_definition">
- <structfield>wal_sync_time</structfield> <type>double precision</type>
- </para>
- <para>
- Total amount of time spent syncing WAL files to disk via
- <function>issue_xlog_fsync</function> request, in milliseconds
- (if <varname>track_wal_io_timing</varname> is enabled,
- <varname>fsync</varname> is <literal>on</literal>, and
- <varname>wal_sync_method</varname> is either
- <literal>fdatasync</literal>, <literal>fsync</literal> or
- <literal>fsync_writethrough</literal>, otherwise zero).
- </para></entry>
- </row>
-
<row>
<entry role="catalog_table_entry"><para role="column_definition">
<structfield>stats_reset</structfield> <type>timestamp with time zone</type>
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index b908720adea..bb4892413fc 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -813,8 +813,8 @@
When <xref linkend="guc-track-wal-io-timing"/> is enabled, the total
amounts of time <function>XLogWrite</function> writes and
<function>issue_xlog_fsync</function> syncs WAL data to disk are counted as
- <literal>wal_write_time</literal> and <literal>wal_sync_time</literal> in
- <xref linkend="pg-stat-wal-view"/>, respectively.
+ <literal>write_time</literal> and <literal>sync_time</literal> in
+ <xref linkend="pg-stat-io-view"/> for the wal <literal>object</literal>, respectively.
<function>XLogWrite</function> is normally called by
<function>XLogInsertRecord</function> (when there is no space for the new
record in WAL buffers), <function>XLogFlush</function> and the WAL writer,
@@ -832,8 +832,9 @@
of the setting of <varname>track_wal_io_timing</varname>, the number
of times <function>XLogWrite</function> writes and
<function>issue_xlog_fsync</function> syncs WAL data to disk are also
- counted as <literal>wal_write</literal> and <literal>wal_sync</literal>
- in <structname>pg_stat_wal</structname>, respectively.
+ counted as <literal>write</literal> and <literal>sync</literal>
+ in <structname>pg_stat_io</structname> for the <literal>wal</literal> object
+ respectively.
</para>
<para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 25a5c605404..020b99d402c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -2448,20 +2448,6 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible)
pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL,
IOOP_WRITE, start, 1, written);
- /*
- * Increment the I/O timing and the number of times WAL data
- * were written out to disk.
- */
- if (track_wal_io_timing)
- {
- instr_time end;
-
- INSTR_TIME_SET_CURRENT(end);
- INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, end, start);
- }
-
- PendingWalStats.wal_write++;
-
if (written <= 0)
{
char xlogfname[MAXFNAMELEN];
@@ -8767,21 +8753,8 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
pgstat_report_wait_end();
- /*
- * Increment the I/O timing and the number of times WAL files were synced.
- */
- if (track_wal_io_timing)
- {
- instr_time end;
-
- INSTR_TIME_SET_CURRENT(end);
- INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, end, start);
- }
-
pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC,
start, 1, 0);
-
- PendingWalStats.wal_sync++;
}
/*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index eff0990957e..a4d2cfdcaf5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1189,10 +1189,6 @@ CREATE VIEW pg_stat_wal AS
w.wal_fpi,
w.wal_bytes,
w.wal_buffers_full,
- w.wal_write,
- w.wal_sync,
- w.wal_write_time,
- w.wal_sync_time,
w.stats_reset
FROM pg_stat_get_wal() w;
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index c1ca65eb991..4dc41a4a590 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -21,8 +21,6 @@
#include "utils/pgstat_internal.h"
-PgStat_PendingWalStats PendingWalStats = {0};
-
/*
* WAL usage counters saved from pgWalUsage at the previous call to
* pgstat_report_wal(). This is used to calculate how much WAL usage
@@ -118,17 +116,10 @@ pgstat_wal_flush_cb(bool nowait)
#define WALSTAT_ACC(fld, var_to_add) \
(stats_shmem->stats.fld += var_to_add.fld)
-#define WALSTAT_ACC_INSTR_TIME(fld) \
- (stats_shmem->stats.fld += INSTR_TIME_GET_MICROSEC(PendingWalStats.fld))
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
- WALSTAT_ACC(wal_write, PendingWalStats);
- WALSTAT_ACC(wal_sync, PendingWalStats);
- WALSTAT_ACC_INSTR_TIME(wal_write_time);
- WALSTAT_ACC_INSTR_TIME(wal_sync_time);
-#undef WALSTAT_ACC_INSTR_TIME
#undef WALSTAT_ACC
LWLockRelease(&stats_shmem->lock);
@@ -138,11 +129,6 @@ pgstat_wal_flush_cb(bool nowait)
*/
prevWalUsage = pgWalUsage;
- /*
- * Clear out the statistics buffer, so it can be re-used.
- */
- MemSet(&PendingWalStats, 0, sizeof(PendingWalStats));
-
return false;
}
@@ -158,18 +144,12 @@ pgstat_wal_init_backend_cb(void)
}
/*
- * To determine whether any WAL activity has occurred since last time, not
- * only the number of generated WAL records but also the numbers of WAL
- * writes and syncs need to be checked. Because even transaction that
- * generates no WAL records can write or sync WAL data when flushing the
- * data pages.
+ * To determine whether WAL usage happened.
*/
bool
pgstat_wal_have_pending_cb(void)
{
- return pgWalUsage.wal_records != prevWalUsage.wal_records ||
- PendingWalStats.wal_write != 0 ||
- PendingWalStats.wal_sync != 0;
+ return pgWalUsage.wal_records != prevWalUsage.wal_records;
}
void
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index e9096a88492..68e16e52ab6 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1637,7 +1637,7 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
Datum
pg_stat_get_wal(PG_FUNCTION_ARGS)
{
-#define PG_STAT_GET_WAL_COLS 9
+#define PG_STAT_GET_WAL_COLS 5
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS] = {0};
bool nulls[PG_STAT_GET_WAL_COLS] = {0};
@@ -1654,15 +1654,7 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
NUMERICOID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 4, "wal_buffers_full",
INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 5, "wal_write",
- INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 6, "wal_sync",
- INT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 7, "wal_write_time",
- FLOAT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 8, "wal_sync_time",
- FLOAT8OID, -1, 0);
- TupleDescInitEntry(tupdesc, (AttrNumber) 9, "stats_reset",
+ TupleDescInitEntry(tupdesc, (AttrNumber) 5, "stats_reset",
TIMESTAMPTZOID, -1, 0);
BlessTupleDesc(tupdesc);
@@ -1682,14 +1674,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
Int32GetDatum(-1));
values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
- values[4] = Int64GetDatum(wal_stats->wal_write);
- values[5] = Int64GetDatum(wal_stats->wal_sync);
-
- /* Convert counters from microsec to millisec for display */
- values[6] = Float8GetDatum(((double) wal_stats->wal_write_time) / 1000.0);
- values[7] = Float8GetDatum(((double) wal_stats->wal_sync_time) / 1000.0);
- values[8] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9e803d610d7..1e1626964e3 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5950,9 +5950,9 @@
{ oid => '1136', descr => 'statistics: information about WAL activity',
proname => 'pg_stat_get_wal', proisstrict => 'f', provolatile => 's',
proparallel => 'r', prorettype => 'record', proargtypes => '',
- proallargtypes => '{int8,int8,numeric,int8,int8,int8,float8,float8,timestamptz}',
- proargmodes => '{o,o,o,o,o,o,o,o,o}',
- proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,wal_write,wal_sync,wal_write_time,wal_sync_time,stats_reset}',
+ proallargtypes => '{int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{o,o,o,o,o}',
+ proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 53f2a8458e6..a3a341cc604 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -480,28 +480,9 @@ typedef struct PgStat_WalStats
PgStat_Counter wal_fpi;
uint64 wal_bytes;
PgStat_Counter wal_buffers_full;
- PgStat_Counter wal_write;
- PgStat_Counter wal_sync;
- PgStat_Counter wal_write_time;
- PgStat_Counter wal_sync_time;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
-/*
- * This struct stores wal-related durations as instr_time, which makes it
- * cheaper and easier to accumulate them, by not requiring type
- * conversions. During stats flush instr_time will be converted into
- * microseconds.
- */
-typedef struct PgStat_PendingWalStats
-{
- PgStat_Counter wal_write;
- PgStat_Counter wal_sync;
- instr_time wal_write_time;
- instr_time wal_sync_time;
-} PgStat_PendingWalStats;
-
-
/*
* Functions in pgstat.c
*/
@@ -834,13 +815,4 @@ extern PGDLLIMPORT PgStat_Counter pgStatTransactionIdleTime;
/* updated by the traffic cop and in errfinish() */
extern PGDLLIMPORT SessionEndType pgStatSessionEndCause;
-
-/*
- * Variables in pgstat_wal.c
- */
-
-/* updated directly by backends and background processes */
-extern PGDLLIMPORT PgStat_PendingWalStats PendingWalStats;
-
-
#endif /* PGSTAT_H */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 5baba8d39ff..62f69ac20b2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2259,12 +2259,8 @@ pg_stat_wal| SELECT wal_records,
wal_fpi,
wal_bytes,
wal_buffers_full,
- wal_write,
- wal_sync,
- wal_write_time,
- wal_sync_time,
stats_reset
- FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, wal_write, wal_sync, wal_write_time, wal_sync_time, stats_reset);
+ FROM pg_stat_get_wal() w(wal_records, wal_fpi, wal_bytes, wal_buffers_full, stats_reset);
pg_stat_wal_receiver| SELECT pid,
status,
receive_start_lsn,
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bce4214503d..740777127e9 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2163,7 +2163,6 @@ PgStat_KindInfo
PgStat_LocalState
PgStat_PendingDroppedStatsItem
PgStat_PendingIO
-PgStat_PendingWalStats
PgStat_SLRUStats
PgStat_ShmemControl
PgStat_Snapshot
--
2.34.1
v7-0002-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From c516da1bd0730c274d892999f8c390138dd3c8aa Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v7 2/4] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 48 +++++++++++++++++++----------
1 file changed, 32 insertions(+), 16 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 68e16e52ab6..620d60a0938 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1632,20 +1632,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 5
+#define PG_STAT_WAL_COLS 5
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1659,28 +1661,42 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[4] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v7-0003-Adding-a-new-PgStat_WalCounters-struct.patchtext/x-diff; charset=us-asciiDownload
From a33621c2a2468c9e908622ed9940d15b19c2d843 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Jan 2025 15:06:01 +0000
Subject: [PATCH v7 3/4] Adding a new PgStat_WalCounters struct
This new struct contains only the counters related to the WAL statistics.
This will be used in a follow-up commit that uses the same structures but
for the PGSTAT_KIND_BACKEND statistics kind.
---
src/backend/utils/activity/pgstat_wal.c | 2 +-
src/backend/utils/adt/pgstatfuncs.c | 20 +++++++++++---------
src/include/pgstat.h | 7 ++++++-
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 19 insertions(+), 11 deletions(-)
8.5% src/backend/utils/activity/
82.0% src/backend/utils/adt/
7.8% src/include/
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 4dc41a4a590..6d024872701 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -115,7 +115,7 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 620d60a0938..9de14ffd449 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1635,10 +1635,11 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
* pg_stat_wal_build_tuple
*
* Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 5
TupleDesc tupdesc;
@@ -1662,20 +1663,20 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
- if (wal_stats.stat_reset_timestamp != 0)
- values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[4] = true;
@@ -1694,7 +1695,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters,
+ wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index a3a341cc604..459a7cb328e 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -474,12 +474,17 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter total_autoanalyze_time;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
uint64 wal_bytes;
PgStat_Counter wal_buffers_full;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 740777127e9..77b47a6df7e 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2178,6 +2178,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
v7-0004-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 7665e01b6cf0b6bae3d277ce150387aff15e8f0f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v7 4/4] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 64 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 52 ++++++++++++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 13 +++--
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 171 insertions(+), 9 deletions(-)
14.0% doc/src/sgml/
34.5% src/backend/utils/activity/
25.9% src/backend/utils/adt/
7.7% src/include/catalog/
3.9% src/include/utils/
7.4% src/test/regress/expected/
5.6% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e645591ce53..bd3fc5bd4c6 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4859,6 +4859,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 4a667e7019c..cf3fac678c2 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -35,6 +35,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -131,6 +139,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -158,6 +217,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -205,6 +267,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 6d024872701..f268a610fb8 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -53,6 +53,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9de14ffd449..6676245651c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1634,8 +1634,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1684,6 +1684,54 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1e1626964e3..97108a1f4c4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 459a7cb328e..6b282be4d5b 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,12 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
/* ---------
* PgStat_BackendPending Non-flushed backend stats.
* ---------
@@ -488,6 +482,13 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 06dcea3f0dc..2385839d83f 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 7d91f047bb3..ec056cd00cf 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 11628ebc8a1..92e2ec46a6f 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
On Mon, Feb 17, 2025 at 03:14:59PM +0000, Bertrand Drouvot wrote:
PFA the whole picture. 0001 is implementing the fields removal in pg_stat_wal
(and also PendingWalStats). I think that's ok given the backend's type for which
pgstat_tracks_io_bktype() returns false. But now you make me doubt about 0001.
Double-checking the code now and my doubts are wrong.
I think that I would vote for a removal of the fields from pg_stat_wal
rather than a replacement in pg_stat_wal, for the following reasons:
- pg_stat_stat.wal_write is the same value as "select sum(writes)
from pg_stat_io where object = 'wal' and context = 'normal'" as these
are incremented in XLogWrite().
- Same argument about pg_stat_wal.wal_write_time with
pg_stat_io.write_time.
- issue_xlog_fsync() tells that pg_stat_wal.wal_sync_time and
sum(pg_stat_io.fsync_time) under object=wal and context=normal are the
same values.
- Same argument with the fsync counters pg_stat_wal.wal_sync and
pg_stat_io.fsyncs.
- Encourage monitoring pull to move to pg_stat_io, where there is much
more context and granularity of the stats data.
Regarding the GUC track_wal_io_timing, my take is that we'll live
better if we just let it go. It loses its meaning once pg_stat_wal
does not track the write and sync timings.
Anyway, it's probably better to move the 0001 discussion to a dedicated thread,
thoughts?
Yes. And we cannot really move forward with what we have here without
deciding about this part. The simplifications I can read from
v7-0002~v7-0004 are really nice. These make the implementation of WAL
stats at backend-level really simpler to think about.
The doc additions of v7-0001 about the description of what the 'wal'
object does in pg_stat_io are actually worth a change of their own?
We already track them in pg_stat_io.
--
Michael
Hi,
On Tue, Feb 18, 2025 at 08:34:32AM +0900, Michael Paquier wrote:
On Mon, Feb 17, 2025 at 03:14:59PM +0000, Bertrand Drouvot wrote:
PFA the whole picture. 0001 is implementing the fields removal in pg_stat_wal
(and also PendingWalStats). I think that's ok given the backend's type for which
pgstat_tracks_io_bktype() returns false. But now you make me doubt about 0001.Double-checking the code now and my doubts are wrong.
Thanks for double checking.
I think that I would vote for a removal of the fields from pg_stat_wal
rather than a replacement in pg_stat_wal, for the following reasons:
- pg_stat_stat.wal_write is the same value as "select sum(writes)
from pg_stat_io where object = 'wal' and context = 'normal'" as these
are incremented in XLogWrite().
- Same argument about pg_stat_wal.wal_write_time with
pg_stat_io.write_time.
- issue_xlog_fsync() tells that pg_stat_wal.wal_sync_time and
sum(pg_stat_io.fsync_time) under object=wal and context=normal are the
same values.
- Same argument with the fsync counters pg_stat_wal.wal_sync and
pg_stat_io.fsyncs.
- Encourage monitoring pull to move to pg_stat_io, where there is much
more context and granularity of the stats data.
Agree with all of the above + pgstat_tracks_io_bktype() returns false for backend's
that do not generate WAL (so that we don't lose WAL information in pg_stat_io).
Regarding the GUC track_wal_io_timing, my take is that we'll live
better if we just let it go. It loses its meaning once pg_stat_wal
does not track the write and sync timings.
Yeah, done that way in the dedicated thread ([1]/messages/by-id/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal).
Anyway, it's probably better to move the 0001 discussion to a dedicated thread,
thoughts?Yes. And we cannot really move forward with what we have here without
deciding about this part. The simplifications I can read from
v7-0002~v7-0004 are really nice. These make the implementation of WAL
stats at backend-level really simpler to think about.
yup.
The doc additions of v7-0001 about the description of what the 'wal'
object does in pg_stat_io are actually worth a change of their own?
We already track them in pg_stat_io.
Agree, done that way in the dedicated thread ([1]/messages/by-id/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal).
[1]: /messages/by-id/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Feb 18, 2025 at 10:45:29AM +0000, Bertrand Drouvot wrote:
Agree, done that way in the dedicated thread ([1]).
[1]: /messages/by-id/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
Thanks for splitting this part into its own thread.
--
Michael
Hi,
On Wed, Feb 19, 2025 at 07:28:49AM +0900, Michael Paquier wrote:
On Tue, Feb 18, 2025 at 10:45:29AM +0000, Bertrand Drouvot wrote:
Agree, done that way in the dedicated thread ([1]).
[1]: /messages/by-id/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
Thanks for splitting this part into its own thread.
Now that 2421e9a51d2 is in, let's resume working in this thread. PFA a rebase to
make the CF bot happy. Nothing has changed since V7, V8 only removes "v7-0001" (
as part of 2421e9a51d2), so that v8-000N is nothing but v7-000(N+1).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v8-0003-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From e4cebfe49be300d37ddf54dd58a44c73c642f4ba Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v8 3/3] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 64 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 52 ++++++++++++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 13 +++--
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 171 insertions(+), 9 deletions(-)
14.0% doc/src/sgml/
34.5% src/backend/utils/activity/
25.9% src/backend/utils/adt/
7.7% src/include/catalog/
3.9% src/include/utils/
7.4% src/test/regress/expected/
5.6% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3dfd059b7ee..6b5c9b23c3a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4856,6 +4856,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 4a667e7019c..cf3fac678c2 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -35,6 +35,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -131,6 +139,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -158,6 +217,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -205,6 +267,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 6d024872701..f268a610fb8 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -53,6 +53,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9de14ffd449..6676245651c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1634,8 +1634,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1684,6 +1684,54 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PGPROC *proc;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+ PgBackendStatus *beentry;
+
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index af9546de23d..399fccb1afb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 438736870b3..8a706ddabab 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,12 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
/* ---------
* PgStat_BackendPending Non-flushed backend stats.
* ---------
@@ -488,6 +482,13 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 06dcea3f0dc..2385839d83f 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
v8-0002-Adding-a-new-PgStat_WalCounters-struct.patchtext/x-diff; charset=us-asciiDownload
From a791f83c453e5e94d3508cde526b6d992512dc14 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Jan 2025 15:06:01 +0000
Subject: [PATCH v8 2/3] Adding a new PgStat_WalCounters struct
This new struct contains only the counters related to the WAL statistics.
This will be used in a follow-up commit that uses the same structures but
for the PGSTAT_KIND_BACKEND statistics kind.
---
src/backend/utils/activity/pgstat_wal.c | 2 +-
src/backend/utils/adt/pgstatfuncs.c | 20 +++++++++++---------
src/include/pgstat.h | 7 ++++++-
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 19 insertions(+), 11 deletions(-)
8.5% src/backend/utils/activity/
82.0% src/backend/utils/adt/
7.8% src/include/
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 4dc41a4a590..6d024872701 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -115,7 +115,7 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 620d60a0938..9de14ffd449 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1635,10 +1635,11 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
* pg_stat_wal_build_tuple
*
* Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 5
TupleDesc tupdesc;
@@ -1662,20 +1663,20 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
- if (wal_stats.stat_reset_timestamp != 0)
- values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[4] = true;
@@ -1694,7 +1695,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters,
+ wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fc651d03cf9..438736870b3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -474,12 +474,17 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter total_autoanalyze_time;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
uint64 wal_bytes;
PgStat_Counter wal_buffers_full;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3e09a2207e..19d510c9ec3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2188,6 +2188,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
v8-0001-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From a4a1dce54b8542c74e39117e77b2f1d005341501 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v8 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 48 +++++++++++++++++++----------
1 file changed, 32 insertions(+), 16 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 68e16e52ab6..620d60a0938 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1632,20 +1632,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 5
+#define PG_STAT_WAL_COLS 5
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1659,28 +1661,42 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[4] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
On Mon, Feb 24, 2025 at 09:07:39AM +0000, Bertrand Drouvot wrote:
Now that 2421e9a51d2 is in, let's resume working in this thread. PFA a rebase to
make the CF bot happy. Nothing has changed since V7, V8 only removes "v7-0001" (
as part of 2421e9a51d2), so that v8-000N is nothing but v7-000(N+1).
v7-0001 looks sensible, so does v7-0002 with the introduction of
PgStat_WalCounters to tackle the fact that backend statistics need
only one reset_timestamp shared across IO and WAL stats.
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
Okay for this pending data check.
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -53,6 +53,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
Fine to stick that into pgstat_report_wal(), which is used anywhere
else. pgstat_flush_wal() could be static in pgstat_wal.c?
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
[...]
+ pid = PG_GETARG_INT32(0);
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ PG_RETURN_NULL();
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ PG_RETURN_NULL();
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ PG_RETURN_NULL();
This is a block of code copy-pasted from pg_stat_get_backend_io().
This is complex, so perhaps it would be better to refactor that in a
single routine that returns PgStat_Backend? Then reuse the refactored
code in both pg_stat_get_backend_io() and the new
pg_stat_get_backend_wal().
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
[...]
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
That should be stable, as we're guaranteed to have records here.
--
Michael
Hi,
On Tue, Feb 25, 2025 at 03:50:38PM +0900, Michael Paquier wrote:
On Mon, Feb 24, 2025 at 09:07:39AM +0000, Bertrand Drouvot wrote:
Now that 2421e9a51d2 is in, let's resume working in this thread. PFA a rebase to
make the CF bot happy. Nothing has changed since V7, V8 only removes "v7-0001" (
as part of 2421e9a51d2), so that v8-000N is nothing but v7-000(N+1).v7-0001 looks sensible, so does v7-0002 with the introduction of
PgStat_WalCounters to tackle the fact that backend statistics need
only one reset_timestamp shared across IO and WAL stats.
Thanks for looking at it! (I guess you meant to say v8-0001 and v8-0002).
--- a/src/backend/utils/activity/pgstat_wal.c +++ b/src/backend/utils/activity/pgstat_wal.c @@ -53,6 +53,8 @@ pgstat_report_wal(bool force) /* flush wal stats */ pgstat_flush_wal(nowait);+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL); + /* flush IO stats */ pgstat_flush_io(nowait);Fine to stick that into pgstat_report_wal(), which is used anywhere
else. pgstat_flush_wal() could be static in pgstat_wal.c?
hmm right. Not linked to this patch though, so done in a dedicated patch
in passing (v9-0001).
+Datum +pg_stat_get_backend_wal(PG_FUNCTION_ARGS) +{ [...] + pid = PG_GETARG_INT32(0); + proc = BackendPidGetProc(pid); + + /* + * This could be an auxiliary process but these do not report backend + * statistics due to pgstat_tracks_backend_bktype(), so there is no need + * for an extra call to AuxiliaryPidGetProc(). + */ + if (!proc) + PG_RETURN_NULL(); + + procNumber = GetNumberFromPGProc(proc); + + beentry = pgstat_get_beentry_by_proc_number(procNumber); + if (!beentry) + PG_RETURN_NULL(); + + backend_stats = pgstat_fetch_stat_backend(procNumber); + if (!backend_stats) + PG_RETURN_NULL(); + + /* if PID does not match, leave */ + if (beentry->st_procpid != pid) + PG_RETURN_NULL(); + + /* backend may be gone, so recheck in case */ + if (beentry->st_backendType == B_INVALID) + PG_RETURN_NULL();This is a block of code copy-pasted from pg_stat_get_backend_io().
This is complex, so perhaps it would be better to refactor that in a
single routine that returns PgStat_Backend? Then reuse the refactored
code in both pg_stat_get_backend_io() and the new
pg_stat_get_backend_wal().
That makes fully sense. Done in 0004 attached. Somehow related to that, I've
a patch in progress to address some of Rahila's comments ([1]/messages/by-id/CAH2L28v9BwN8_y0k6FQ591=0g2Hj_esHLGj3bP38c9nmVykoiA@mail.gmail.com) (the one related
to the AuxiliaryPidGetProc() call is relevant specially since a051e71e28a where
pgstat_tracks_backend_bktype() has been modified for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER). I'll wait for 0004 to go in before sharing the patch.
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists) +SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset + CREATE TEMP TABLE test_stats_temp AS SELECT 17; DROP TABLE test_stats_temp; [...] +SELECT pg_stat_force_next_flush(); +SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());That should be stable, as we're guaranteed to have records here.
Yup.
[1]: /messages/by-id/CAH2L28v9BwN8_y0k6FQ591=0g2Hj_esHLGj3bP38c9nmVykoiA@mail.gmail.com
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v9-0001-make-pgstat_flush_wal-static.patchtext/x-diff; charset=us-asciiDownload
From ee2fb0f389161c8f706923aa58e227ab6a2c623f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 25 Feb 2025 08:22:50 +0000
Subject: [PATCH v9 1/5] make pgstat_flush_wal() static
pgstat_flush_wal() is used only in pgstat_wal.c so make it static.
---
src/backend/utils/activity/pgstat_wal.c | 3 ++-
src/include/utils/pgstat_internal.h | 2 --
2 files changed, 2 insertions(+), 3 deletions(-)
57.6% src/backend/utils/activity/
42.3% src/include/utils/
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 4dc41a4a590..b702891ed46 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -29,6 +29,7 @@
*/
static WalUsage prevWalUsage;
+static void pgstat_flush_wal(bool nowait);
/*
* Calculate how much WAL usage counters have increased and update
@@ -72,7 +73,7 @@ pgstat_fetch_stat_wal(void)
/*
* Simple wrapper of pgstat_wal_flush_cb()
*/
-void
+static void
pgstat_flush_wal(bool nowait)
{
(void) pgstat_wal_flush_cb(nowait);
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 06dcea3f0dc..36d228e3558 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -748,8 +748,6 @@ extern void pgstat_slru_snapshot_cb(void);
* Functions in pgstat_wal.c
*/
-extern void pgstat_flush_wal(bool nowait);
-
extern void pgstat_wal_init_backend_cb(void);
extern bool pgstat_wal_have_pending_cb(void);
extern bool pgstat_wal_flush_cb(bool nowait);
--
2.34.1
v9-0002-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload
From f2c3f83228e0cb1909cdd43f732ecaa377e0ebeb Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v9 2/5] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 48 +++++++++++++++++++----------
1 file changed, 32 insertions(+), 16 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 68e16e52ab6..620d60a0938 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1632,20 +1632,22 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_stats.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
{
-#define PG_STAT_GET_WAL_COLS 5
+#define PG_STAT_WAL_COLS 5
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1659,28 +1661,42 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
-
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats->wal_records);
- values[1] = Int64GetDatum(wal_stats->wal_fpi);
+ values[0] = Int64GetDatum(wal_stats.wal_records);
+ values[1] = Int64GetDatum(wal_stats.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats->wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats->wal_buffers_full);
+ values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[4] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(*wal_stats));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v9-0003-Adding-a-new-PgStat_WalCounters-struct.patchtext/x-diff; charset=us-asciiDownload
From 5a9a7f138128b8b36c566ab5c44eb3a37c357f88 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 16 Jan 2025 15:06:01 +0000
Subject: [PATCH v9 3/5] Adding a new PgStat_WalCounters struct
This new struct contains only the counters related to the WAL statistics.
This will be used in a follow-up commit that uses the same structures but
for the PGSTAT_KIND_BACKEND statistics kind.
---
src/backend/utils/activity/pgstat_wal.c | 2 +-
src/backend/utils/adt/pgstatfuncs.c | 20 +++++++++++---------
src/include/pgstat.h | 7 ++++++-
src/tools/pgindent/typedefs.list | 1 +
4 files changed, 19 insertions(+), 11 deletions(-)
8.5% src/backend/utils/activity/
82.0% src/backend/utils/adt/
7.8% src/include/
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index b702891ed46..830b51a7b93 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -116,7 +116,7 @@ pgstat_wal_flush_cb(bool nowait)
return true;
#define WALSTAT_ACC(fld, var_to_add) \
- (stats_shmem->stats.fld += var_to_add.fld)
+ (stats_shmem->stats.wal_counters.fld += var_to_add.fld)
WALSTAT_ACC(wal_records, wal_usage_diff);
WALSTAT_ACC(wal_fpi, wal_usage_diff);
WALSTAT_ACC(wal_bytes, wal_usage_diff);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 620d60a0938..9de14ffd449 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1635,10 +1635,11 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
* pg_stat_wal_build_tuple
*
* Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_stats.
+ * of wal_counters.
*/
static Datum
-pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
#define PG_STAT_WAL_COLS 5
TupleDesc tupdesc;
@@ -1662,20 +1663,20 @@ pg_stat_wal_build_tuple(PgStat_WalStats wal_stats)
BlessTupleDesc(tupdesc);
/* Fill values and NULLs */
- values[0] = Int64GetDatum(wal_stats.wal_records);
- values[1] = Int64GetDatum(wal_stats.wal_fpi);
+ values[0] = Int64GetDatum(wal_counters.wal_records);
+ values[1] = Int64GetDatum(wal_counters.wal_fpi);
/* Convert to numeric. */
- snprintf(buf, sizeof buf, UINT64_FORMAT, wal_stats.wal_bytes);
+ snprintf(buf, sizeof buf, UINT64_FORMAT, wal_counters.wal_bytes);
values[2] = DirectFunctionCall3(numeric_in,
CStringGetDatum(buf),
ObjectIdGetDatum(0),
Int32GetDatum(-1));
- values[3] = Int64GetDatum(wal_stats.wal_buffers_full);
+ values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
- if (wal_stats.stat_reset_timestamp != 0)
- values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(stat_reset_timestamp);
else
nulls[4] = true;
@@ -1694,7 +1695,8 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
/* Get statistics about WAL activity */
wal_stats = pgstat_fetch_stat_wal();
- return (pg_stat_wal_build_tuple(*wal_stats));
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters,
+ wal_stats->stat_reset_timestamp));
}
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fc651d03cf9..438736870b3 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -474,12 +474,17 @@ typedef struct PgStat_StatTabEntry
PgStat_Counter total_autoanalyze_time;
} PgStat_StatTabEntry;
-typedef struct PgStat_WalStats
+typedef struct PgStat_WalCounters
{
PgStat_Counter wal_records;
PgStat_Counter wal_fpi;
uint64 wal_bytes;
PgStat_Counter wal_buffers_full;
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e3e09a2207e..19d510c9ec3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2188,6 +2188,7 @@ PgStat_SubXactStatus
PgStat_TableCounts
PgStat_TableStatus
PgStat_TableXactStatus
+PgStat_WalCounters
PgStat_WalStats
PgXmlErrorContext
PgXmlStrictness
--
2.34.1
v9-0004-Add-the-pg_stat_get_backend_stats-helper-for-pg_s.patchtext/x-diff; charset=us-asciiDownload
From ff26611c1f74cfc23b13e4607dcf7494cc807ae5 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 25 Feb 2025 09:03:55 +0000
Subject: [PATCH v9 4/5] Add the pg_stat_get_backend_stats() helper for
pg_stat_get_backend_io()
This commit adds pg_stat_get_backend_stats(), a helper routine for
pg_stat_get_backend_io(), that returns the backend stats based on a pid passed
as an argument.
This will be used in a follow-up commit that uses the same logic to return the
per backend WAL stats.
---
src/backend/utils/activity/pgstat_backend.c | 52 +++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 30 +-----------
src/include/pgstat.h | 1 +
3 files changed, 54 insertions(+), 29 deletions(-)
58.1% src/backend/utils/activity/
38.0% src/backend/utils/adt/
3.7% src/include/
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 4a667e7019c..3284d97fa95 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -25,6 +25,8 @@
#include "postgres.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
#include "utils/memutils.h"
#include "utils/pgstat_internal.h"
@@ -81,6 +83,56 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
return backend_entry;
}
+/*
+ * Returns statistics of a backend by pid.
+ *
+ * It adds extra checks as compared to pgstat_fetch_stat_backend() to ensure
+ * that the backend is not gone. Also, if not NULL, bktype is populated as
+ * pg_stat_get_backend_io() needs it.
+ */
+PgStat_Backend *
+pg_stat_get_backend_stats(int pid, BackendType *bktype)
+{
+
+ PGPROC *proc;
+ PgBackendStatus *beentry;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ return NULL;
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ return NULL;
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ return NULL;
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ return NULL;
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ return NULL;
+
+ if (bktype)
+ *bktype = beentry->st_backendType;
+
+ return backend_stats;
+}
+
/*
* Flush out locally pending backend IO statistics. Locking is managed
* by the caller.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9de14ffd449..13c91515480 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1576,46 +1576,18 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
ReturnSetInfo *rsinfo;
BackendType bktype;
int pid;
- PGPROC *proc;
- ProcNumber procNumber;
PgStat_Backend *backend_stats;
PgStat_BktypeIO *bktype_stats;
- PgBackendStatus *beentry;
InitMaterializedSRF(fcinfo, 0);
rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
pid = PG_GETARG_INT32(0);
- proc = BackendPidGetProc(pid);
-
- /*
- * This could be an auxiliary process but these do not report backend
- * statistics due to pgstat_tracks_backend_bktype(), so there is no need
- * for an extra call to AuxiliaryPidGetProc().
- */
- if (!proc)
- return (Datum) 0;
-
- procNumber = GetNumberFromPGProc(proc);
+ backend_stats = pg_stat_get_backend_stats(pid, &bktype);
- beentry = pgstat_get_beentry_by_proc_number(procNumber);
- if (!beentry)
- return (Datum) 0;
-
- backend_stats = pgstat_fetch_stat_backend(procNumber);
if (!backend_stats)
return (Datum) 0;
- bktype = beentry->st_backendType;
-
- /* if PID does not match, leave */
- if (beentry->st_procpid != pid)
- return (Datum) 0;
-
- /* backend may be gone, so recheck in case */
- if (bktype == B_INVALID)
- return (Datum) 0;
-
bktype_stats = &backend_stats->io_stats;
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 438736870b3..3b71c3d4ed6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -542,6 +542,7 @@ extern void pgstat_count_backend_io_op(IOObject io_object,
IOOp io_op, uint32 cnt,
uint64 bytes);
extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
+extern PgStat_Backend *pg_stat_get_backend_stats(int pid, BackendType *bktype);
extern bool pgstat_tracks_backend_bktype(BackendType bktype);
extern void pgstat_create_backend(ProcNumber procnum);
--
2.34.1
v9-0005-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 84b2c89c0f1a719ddfe4cf87fc0f994e21667335 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v9 5/5] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 64 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 2 +
src/backend/utils/adt/pgstatfuncs.c | 26 ++++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 13 +++--
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 145 insertions(+), 9 deletions(-)
15.9% doc/src/sgml/
39.4% src/backend/utils/activity/
15.5% src/backend/utils/adt/
8.8% src/include/catalog/
4.5% src/include/utils/
8.4% src/test/regress/expected/
6.4% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3dfd059b7ee..6b5c9b23c3a 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4856,6 +4856,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 3284d97fa95..b49e8539324 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -37,6 +37,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -183,6 +191,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -210,6 +269,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -257,6 +319,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 830b51a7b93..5051fa596b4 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -54,6 +54,8 @@ pgstat_report_wal(bool force)
/* flush wal stats */
pgstat_flush_wal(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
+
/* flush IO stats */
pgstat_flush_io(nowait);
}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 13c91515480..4fca5b26fde 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1656,6 +1656,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pg_stat_get_backend_stats(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index af9546de23d..399fccb1afb 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3b71c3d4ed6..2665887b1e2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,12 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
/* ---------
* PgStat_BackendPending Non-flushed backend stats.
* ---------
@@ -488,6 +482,13 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
On Tue, Feb 25, 2025 at 03:00:35PM +0000, Bertrand Drouvot wrote:
That makes fully sense. Done in 0004 attached. Somehow related to that, I've
a patch in progress to address some of Rahila's comments ([1]) (the one related
to the AuxiliaryPidGetProc() call is relevant specially since a051e71e28a where
pgstat_tracks_backend_bktype() has been modified for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER). I'll wait for 0004 to go in before sharing the patch.
Applied v9-0001 and v9-0003 as these were fine, with more
documentation added in pgstat.h for the new WAL structure, and the
reason why it exists. I've noticed the difference with bktype in
v9-0004 as the WAL part does not need this information when generating
its tuple, OK here.
Doing v9-0003 after v9-0002 felt a bit odd, changing twice the
signature of pg_stat_wal_build_tuple() to adapt with the split for the
reset timestamp.
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (wal_stats.stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp);
+ else
+ nulls[4] = true;
In patch v9-0002, is this nulls[4] required for the backend part?
--
Michael
Hi,
On Wed, Feb 26, 2025 at 04:52:13PM +0900, Michael Paquier wrote:
On Tue, Feb 25, 2025 at 03:00:35PM +0000, Bertrand Drouvot wrote:
That makes fully sense. Done in 0004 attached. Somehow related to that, I've
a patch in progress to address some of Rahila's comments ([1]) (the one related
to the AuxiliaryPidGetProc() call is relevant specially since a051e71e28a where
pgstat_tracks_backend_bktype() has been modified for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER). I'll wait for 0004 to go in before sharing the patch.Applied v9-0001
I see that you removed pgstat_flush_wal() in d7cbeaf261d (instead of what 0001
was doing i.e making it static). Makes sense to me.a
and v9-0003 as these were fine,
Thanks.
with more
documentation added in pgstat.h for the new WAL structure, and the
reason why it exists.
Saw that, looks good.
I've noticed the difference with bktype in
v9-0004 as the WAL part does not need this information when generating
its tuple, OK here.
Thx.
Doing v9-0003 after v9-0002 felt a bit odd, changing twice the
signature of pg_stat_wal_build_tuple() to adapt with the split for the
reset timestamp.
PFA a rebase.
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp); + if (wal_stats.stat_reset_timestamp != 0) + values[4] = TimestampTzGetDatum(wal_stats.stat_reset_timestamp); + else + nulls[4] = true;In patch v9-0002, is this nulls[4] required for the backend part?
Yup. That's what we've done in pg_stat_io_build_tuples() too (ff7c40d7fd6).
Without this we'd get "2000-01-01 00:00:00+00" in the stats_reset field of
pg_stat_get_backend_wal() and pg_stat_get_backend_io().
That was not needed for pg_stat_io and pg_stat_wal because the stats_reset field
was already non null after initdb.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v10-0001-Extract-logic-filling-pg_stat_get_wal-s-tuple-in.patchtext/x-diff; charset=us-asciiDownload
From ba78796701d3d4228342465e5d08429398ebdc9e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 07:51:27 +0000
Subject: [PATCH v10 1/3] Extract logic filling pg_stat_get_wal()'s tuple into
its own routine
This commit adds pg_stat_wal_build_tuple(), a helper routine for
pg_stat_get_wal(), that fills its tuple based on the contents
of PgStat_WalStats. This will be used in a follow-up commit that uses
the same structures as pg_stat_wal for reporting, but for the PGSTAT_KIND_BACKEND
statistics kind.
---
src/backend/utils/adt/pgstatfuncs.c | 44 ++++++++++++++++++++---------
1 file changed, 30 insertions(+), 14 deletions(-)
100.0% src/backend/utils/adt/
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0ea41299e07..9de14ffd449 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1632,21 +1632,23 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
}
/*
- * Returns statistics of WAL activity
+ * pg_stat_wal_build_tuple
+ *
+ * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
+ * of wal_counters.
*/
-Datum
-pg_stat_get_wal(PG_FUNCTION_ARGS)
+static Datum
+pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
+ TimestampTz stat_reset_timestamp)
{
-#define PG_STAT_GET_WAL_COLS 5
+#define PG_STAT_WAL_COLS 5
TupleDesc tupdesc;
- Datum values[PG_STAT_GET_WAL_COLS] = {0};
- bool nulls[PG_STAT_GET_WAL_COLS] = {0};
+ Datum values[PG_STAT_WAL_COLS] = {0};
+ bool nulls[PG_STAT_WAL_COLS] = {0};
char buf[256];
- PgStat_WalStats *wal_stats;
- PgStat_WalCounters wal_counters;
/* Initialise attributes information in the tuple descriptor */
- tupdesc = CreateTemplateTupleDesc(PG_STAT_GET_WAL_COLS);
+ tupdesc = CreateTemplateTupleDesc(PG_STAT_WAL_COLS);
TupleDescInitEntry(tupdesc, (AttrNumber) 1, "wal_records",
INT8OID, -1, 0);
TupleDescInitEntry(tupdesc, (AttrNumber) 2, "wal_fpi",
@@ -1660,10 +1662,6 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
BlessTupleDesc(tupdesc);
- /* Get statistics about WAL activity */
- wal_stats = pgstat_fetch_stat_wal();
- wal_counters = wal_stats->wal_counters;
-
/* Fill values and NULLs */
values[0] = Int64GetDatum(wal_counters.wal_records);
values[1] = Int64GetDatum(wal_counters.wal_fpi);
@@ -1677,12 +1675,30 @@ pg_stat_get_wal(PG_FUNCTION_ARGS)
values[3] = Int64GetDatum(wal_counters.wal_buffers_full);
- values[4] = TimestampTzGetDatum(wal_stats->stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[4] = TimestampTzGetDatum(stat_reset_timestamp);
+ else
+ nulls[4] = true;
/* Returns the record as Datum */
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns statistics of WAL activity
+ */
+Datum
+pg_stat_get_wal(PG_FUNCTION_ARGS)
+{
+ PgStat_WalStats *wal_stats;
+
+ /* Get statistics about WAL activity */
+ wal_stats = pgstat_fetch_stat_wal();
+
+ return (pg_stat_wal_build_tuple(wal_stats->wal_counters,
+ wal_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of SLRU caches.
*/
--
2.34.1
v10-0002-Add-the-pg_stat_get_backend_stats-helper-for-pg_.patchtext/x-diff; charset=us-asciiDownload
From f145c27a9aef12499bee907fbb6121be3aeca9ca Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 25 Feb 2025 09:03:55 +0000
Subject: [PATCH v10 2/3] Add the pg_stat_get_backend_stats() helper for
pg_stat_get_backend_io()
This commit adds pg_stat_get_backend_stats(), a helper routine for
pg_stat_get_backend_io(), that returns the backend stats based on a pid passed
as an argument.
This will be used in a follow-up commit that uses the same logic to return the
per backend WAL stats.
---
src/backend/utils/activity/pgstat_backend.c | 52 +++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 30 +-----------
src/include/pgstat.h | 1 +
3 files changed, 54 insertions(+), 29 deletions(-)
58.1% src/backend/utils/activity/
38.0% src/backend/utils/adt/
3.7% src/include/
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 338da73a9a9..6f17e1ff39f 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -26,6 +26,8 @@
#include "access/xlog.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
#include "utils/memutils.h"
#include "utils/pgstat_internal.h"
@@ -82,6 +84,56 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
return backend_entry;
}
+/*
+ * Returns statistics of a backend by pid.
+ *
+ * It adds extra checks as compared to pgstat_fetch_stat_backend() to ensure
+ * that the backend is not gone. Also, if not NULL, bktype is populated as
+ * pg_stat_get_backend_io() needs it.
+ */
+PgStat_Backend *
+pg_stat_get_backend_stats(int pid, BackendType *bktype)
+{
+
+ PGPROC *proc;
+ PgBackendStatus *beentry;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ return NULL;
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ return NULL;
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ return NULL;
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ return NULL;
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ return NULL;
+
+ if (bktype)
+ *bktype = beentry->st_backendType;
+
+ return backend_stats;
+}
+
/*
* Flush out locally pending backend IO statistics. Locking is managed
* by the caller.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9de14ffd449..13c91515480 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1576,46 +1576,18 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
ReturnSetInfo *rsinfo;
BackendType bktype;
int pid;
- PGPROC *proc;
- ProcNumber procNumber;
PgStat_Backend *backend_stats;
PgStat_BktypeIO *bktype_stats;
- PgBackendStatus *beentry;
InitMaterializedSRF(fcinfo, 0);
rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
pid = PG_GETARG_INT32(0);
- proc = BackendPidGetProc(pid);
-
- /*
- * This could be an auxiliary process but these do not report backend
- * statistics due to pgstat_tracks_backend_bktype(), so there is no need
- * for an extra call to AuxiliaryPidGetProc().
- */
- if (!proc)
- return (Datum) 0;
-
- procNumber = GetNumberFromPGProc(proc);
+ backend_stats = pg_stat_get_backend_stats(pid, &bktype);
- beentry = pgstat_get_beentry_by_proc_number(procNumber);
- if (!beentry)
- return (Datum) 0;
-
- backend_stats = pgstat_fetch_stat_backend(procNumber);
if (!backend_stats)
return (Datum) 0;
- bktype = beentry->st_backendType;
-
- /* if PID does not match, leave */
- if (beentry->st_procpid != pid)
- return (Datum) 0;
-
- /* backend may be gone, so recheck in case */
- if (bktype == B_INVALID)
- return (Datum) 0;
-
bktype_stats = &backend_stats->io_stats;
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 67656264b62..e8e7d95b334 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -554,6 +554,7 @@ extern void pgstat_count_backend_io_op(IOObject io_object,
IOOp io_op, uint32 cnt,
uint64 bytes);
extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
+extern PgStat_Backend *pg_stat_get_backend_stats(int pid, BackendType *bktype);
extern bool pgstat_tracks_backend_bktype(BackendType bktype);
extern void pgstat_create_backend(ProcNumber procnum);
--
2.34.1
v10-0003-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 3a3d2d36b377338c61dfaf00d257f990e248465c Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v10 3/3] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 64 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 ++++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 13 +++--
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 144 insertions(+), 9 deletions(-)
16.0% doc/src/sgml/
39.4% src/backend/utils/activity/
15.5% src/backend/utils/adt/
8.8% src/include/catalog/
4.5% src/include/utils/
8.4% src/test/regress/expected/
6.4% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..f4c37c811ba 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4860,6 +4860,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 6f17e1ff39f..c39b10fb3f2 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -211,6 +270,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -258,6 +320,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 943be0cbeef..c1c2e6dc386 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 13c91515480..4fca5b26fde 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the contents
- * of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal() returning
+ * one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1656,6 +1656,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pg_stat_get_backend_stats(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 1c1d96e0c7e..ed3eb823d5c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index e8e7d95b334..aaddb7acdf6 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,12 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
/* ---------
* PgStat_BackendPending Non-flushed backend stats.
* ---------
@@ -500,6 +494,13 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
On Wed, Feb 26, 2025 at 10:59:11AM +0000, Bertrand Drouvot wrote:
Yup. That's what we've done in pg_stat_io_build_tuples() too (ff7c40d7fd6).
Without this we'd get "2000-01-01 00:00:00+00" in the stats_reset field of
pg_stat_get_backend_wal() and pg_stat_get_backend_io().
Right, forgot about this part.
That was not needed for pg_stat_io and pg_stat_wal because the stats_reset field
was already non null after initdb.
0001 was OK, so done.
In 0002, couldn't it be better to have the pg_stat_get_backend_stats()
static in pgstatfuncs.c? In 0003, pg_stat_get_backend_wal() is also
in pgstatfuncs.c, meaning that all the callers of
pg_stat_get_backend_stats() would be in this file.
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
/* ---------
* PgStat_BackendPending Non-flushed backend stats.
* ---------
In 0003, let's keep PgStat_BackendPending grouped with PgStat_Backend,
so it sounds better to move both of them after the WAL stats
structures.
--
Michael
Hi,
On Thu, Feb 27, 2025 at 12:02:51PM +0900, Michael Paquier wrote:
0001 was OK, so done.
Thanks!
In 0002, couldn't it be better to have the pg_stat_get_backend_stats()
static in pgstatfuncs.c? In 0003, pg_stat_get_backend_wal() is also
in pgstatfuncs.c, meaning that all the callers of
pg_stat_get_backend_stats() would be in this file.
That's how I did it initially but decided to move it to pgstat_backend.c. The
reason was that it's fully linked to "per backend" stats and that there is
no SQL api on top of it (while I think that's the case for almost all the ones
in pgstatfuncs.c). Thoughts?
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
/* ---------
* PgStat_BackendPending Non-flushed backend stats.
* ---------In 0003, let's keep PgStat_BackendPending grouped with PgStat_Backend,
so it sounds better to move both of them after the WAL stats
structures.
Makes sense. I did not had in mind to submit a new patch version (to at least
implement the above) without getting your final thoughts on your first comment.
But since a rebase is needed anyway,then please find attached a new version. It
just implements your last comment.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v11-0001-Add-the-pg_stat_get_backend_stats-helper-for-pg_.patchtext/x-diff; charset=us-asciiDownload
From a98f27ccecd62867ed7aeb9be166488fffb4304d Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Tue, 25 Feb 2025 09:03:55 +0000
Subject: [PATCH v11 1/2] Add the pg_stat_get_backend_stats() helper for
pg_stat_get_backend_io()
This commit adds pg_stat_get_backend_stats(), a helper routine for
pg_stat_get_backend_io(), that returns the backend stats based on a pid passed
as an argument.
This will be used in a follow-up commit that uses the same logic to return the
per backend WAL stats.
---
src/backend/utils/activity/pgstat_backend.c | 52 +++++++++++++++++++++
src/backend/utils/adt/pgstatfuncs.c | 30 +-----------
src/include/pgstat.h | 1 +
3 files changed, 54 insertions(+), 29 deletions(-)
58.1% src/backend/utils/activity/
38.0% src/backend/utils/adt/
3.7% src/include/
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 338da73a9a9..6f17e1ff39f 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -26,6 +26,8 @@
#include "access/xlog.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
+#include "storage/procarray.h"
#include "utils/memutils.h"
#include "utils/pgstat_internal.h"
@@ -82,6 +84,56 @@ pgstat_fetch_stat_backend(ProcNumber procNumber)
return backend_entry;
}
+/*
+ * Returns statistics of a backend by pid.
+ *
+ * It adds extra checks as compared to pgstat_fetch_stat_backend() to ensure
+ * that the backend is not gone. Also, if not NULL, bktype is populated as
+ * pg_stat_get_backend_io() needs it.
+ */
+PgStat_Backend *
+pg_stat_get_backend_stats(int pid, BackendType *bktype)
+{
+
+ PGPROC *proc;
+ PgBackendStatus *beentry;
+ ProcNumber procNumber;
+ PgStat_Backend *backend_stats;
+
+ proc = BackendPidGetProc(pid);
+
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ return NULL;
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ if (!beentry)
+ return NULL;
+
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ return NULL;
+
+ /* if PID does not match, leave */
+ if (beentry->st_procpid != pid)
+ return NULL;
+
+ /* backend may be gone, so recheck in case */
+ if (beentry->st_backendType == B_INVALID)
+ return NULL;
+
+ if (bktype)
+ *bktype = beentry->st_backendType;
+
+ return backend_stats;
+}
+
/*
* Flush out locally pending backend IO statistics. Locking is managed
* by the caller.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index efb6d0032af..ea91c8fc9d5 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1576,46 +1576,18 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
ReturnSetInfo *rsinfo;
BackendType bktype;
int pid;
- PGPROC *proc;
- ProcNumber procNumber;
PgStat_Backend *backend_stats;
PgStat_BktypeIO *bktype_stats;
- PgBackendStatus *beentry;
InitMaterializedSRF(fcinfo, 0);
rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
pid = PG_GETARG_INT32(0);
- proc = BackendPidGetProc(pid);
-
- /*
- * This could be an auxiliary process but these do not report backend
- * statistics due to pgstat_tracks_backend_bktype(), so there is no need
- * for an extra call to AuxiliaryPidGetProc().
- */
- if (!proc)
- return (Datum) 0;
-
- procNumber = GetNumberFromPGProc(proc);
+ backend_stats = pg_stat_get_backend_stats(pid, &bktype);
- beentry = pgstat_get_beentry_by_proc_number(procNumber);
- if (!beentry)
- return (Datum) 0;
-
- backend_stats = pgstat_fetch_stat_backend(procNumber);
if (!backend_stats)
return (Datum) 0;
- bktype = beentry->st_backendType;
-
- /* if PID does not match, leave */
- if (beentry->st_procpid != pid)
- return (Datum) 0;
-
- /* backend may be gone, so recheck in case */
- if (bktype == B_INVALID)
- return (Datum) 0;
-
bktype_stats = &backend_stats->io_stats;
/*
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 67656264b62..e8e7d95b334 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -554,6 +554,7 @@ extern void pgstat_count_backend_io_op(IOObject io_object,
IOOp io_op, uint32 cnt,
uint64 bytes);
extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
+extern PgStat_Backend *pg_stat_get_backend_stats(int pid, BackendType *bktype);
extern bool pgstat_tracks_backend_bktype(BackendType bktype);
extern void pgstat_create_backend(ProcNumber procnum);
--
2.34.1
v11-0002-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 9fe4e1ba92e5f270fdb5654142d13c3397dec993 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v11 2/2] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 64 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 ++++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 37 ++++++------
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 156 insertions(+), 21 deletions(-)
16.0% doc/src/sgml/
39.4% src/backend/utils/activity/
15.5% src/backend/utils/adt/
8.8% src/include/catalog/
4.5% src/include/utils/
8.4% src/test/regress/expected/
6.4% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..f4c37c811ba 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4860,6 +4860,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 6f17e1ff39f..c39b10fb3f2 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -211,6 +270,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -258,6 +320,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 943be0cbeef..c1c2e6dc386 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index ea91c8fc9d5..164a0bbb3d9 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1656,6 +1656,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pg_stat_get_backend_stats(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bac..3e35f8b8e99 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index e8e7d95b334..6409ea23c84 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
On Thu, Feb 27, 2025 at 07:47:09AM +0000, Bertrand Drouvot wrote:
That's how I did it initially but decided to move it to pgstat_backend.c. The
reason was that it's fully linked to "per backend" stats and that there is
no SQL api on top of it (while I think that's the case for almost all the ones
in pgstatfuncs.c). Thoughts?
Okay by me with pgstat_fetch_stat_backend in parallel, why not
exposing this part as well.. Perhaps that could be useful for some
extension? I'd rather have out-of-core code do these lookups with the
same sanity checks in place for the procnumber and slot lookups.
The name was inconsistent with the rest of the file, so I have settled
to a pgstat_fetch_stat_backend_by_pid() to be more consistent. A
second thing is to properly initialize bktype if defined by the
caller.
Makes sense. I did not had in mind to submit a new patch version (to at least
implement the above) without getting your final thoughts on your first comment.
But since a rebase is needed anyway,then please find attached a new version. It
just implements your last comment.
Attached is a rebased version of the rest.
--
Michael
Attachments:
v12-0001-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 42fd2860e3f06d525936187f46aad06892073078 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v12] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 37 ++++++------
src/include/utils/pgstat_internal.h | 3 +-
src/backend/utils/activity/pgstat_backend.c | 64 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 ++++++++-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
doc/src/sgml/monitoring.sgml | 19 ++++++
9 files changed, 156 insertions(+), 21 deletions(-)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bacf..3e35f8b8e99a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d5..06359b9157d2 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558b..d5557e6e998c 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 3c9ebbcd69c0..641ba27c95b4 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -185,6 +193,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -212,6 +271,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -259,6 +321,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 943be0cbeefd..c1c2e6dc3868 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 68830db8633d..85061e29bd1b 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1656,6 +1656,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbbe..b3c303c98cb5 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4a..ad3f7b7e66ae 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34efd..f4c37c811ba0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4860,6 +4860,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
--
2.47.2
Hi,
On Fri, Feb 28, 2025 at 11:34:14AM +0900, Michael Paquier wrote:
On Thu, Feb 27, 2025 at 07:47:09AM +0000, Bertrand Drouvot wrote:
That's how I did it initially but decided to move it to pgstat_backend.c. The
reason was that it's fully linked to "per backend" stats and that there is
no SQL api on top of it (while I think that's the case for almost all the ones
in pgstatfuncs.c). Thoughts?Okay by me with pgstat_fetch_stat_backend in parallel, why not
exposing this part as well.. Perhaps that could be useful for some
extension? I'd rather have out-of-core code do these lookups with the
same sanity checks in place for the procnumber and slot lookups.
Yeah that's also a pros for it.
The name was inconsistent with the rest of the file, so I have settled
to a pgstat_fetch_stat_backend_by_pid() to be more consistent.
Sounds good, thanks!
A
second thing is to properly initialize bktype if defined by the
caller.
Saw that in c2a50ac678e, makes sense.
Attached is a rebased version of the rest.
The rebased version looks ok.
Also attaching the patch I mentioned up-thread to address some of Rahila's
comments ([1]/messages/by-id/CAH2L28v9BwN8_y0k6FQ591=0g2Hj_esHLGj3bP38c9nmVykoiA@mail.gmail.com): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER.
It looks like it does not need doc updates. Attached as 0002 as it's somehow
un-related to this thread (but not sure it deserves it's dedicated thread though).
[1]: /messages/by-id/CAH2L28v9BwN8_y0k6FQ591=0g2Hj_esHLGj3bP38c9nmVykoiA@mail.gmail.com
[2]: /messages/by-id/Z8FMjlyNpNicucGa@paquier.xyz
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v12-0001-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 38ca4b60869a6c96c22e6bdfd33cac07827cef88 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v12 1/2] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 64 +++++++++++++++++++++
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 ++++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 37 ++++++------
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 156 insertions(+), 21 deletions(-)
15.9% doc/src/sgml/
39.3% src/backend/utils/activity/
15.6% src/backend/utils/adt/
8.8% src/include/catalog/
4.5% src/include/utils/
8.4% src/test/regress/expected/
6.4% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..f4c37c811ba 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4860,6 +4860,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 3c9ebbcd69c..641ba27c95b 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -185,6 +193,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -212,6 +271,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -259,6 +321,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 943be0cbeef..c1c2e6dc386 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 68830db8633..85061e29bd1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1656,6 +1656,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bac..3e35f8b8e99 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d..06359b9157d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
v12-0002-Add-backend-type-check-in-pgstat_fetch_stat_back.patchtext/x-diff; charset=us-asciiDownload
From 146d2ed10e43053174e0688d80481a2beca5379b Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Fri, 28 Feb 2025 06:17:59 +0000
Subject: [PATCH v12 2/2] Add backend type check in
pgstat_fetch_stat_backend_by_pid() and pg_stat_reset_backend_stats()
This is relevant specially since a051e71e28a where pgstat_tracks_backend_bktype()
has been modified for B_WAL_RECEIVER, B_WAL_SUMMARIZER and B_WAL_WRITER.
---
src/backend/utils/activity/pgstat_backend.c | 19 ++++++++-----------
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 21 +++++++++++++++------
3 files changed, 24 insertions(+), 17 deletions(-)
46.3% src/backend/utils/activity/
53.6% src/backend/utils/adt/
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 641ba27c95b..13c4f3f29d6 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -111,18 +111,19 @@ pgstat_fetch_stat_backend_by_pid(int pid, BackendType *bktype)
if (bktype)
*bktype = B_INVALID;
- /*
- * This could be an auxiliary process but these do not report backend
- * statistics due to pgstat_tracks_backend_bktype(), so there is no need
- * for an extra call to AuxiliaryPidGetProc().
- */
+ /* This could be an auxiliary process */
if (!proc)
- return NULL;
+ {
+ proc = AuxiliaryPidGetProc(pid);
+ if (!proc)
+ return NULL;
+ }
procNumber = GetNumberFromPGProc(proc);
beentry = pgstat_get_beentry_by_proc_number(procNumber);
- if (!beentry)
+ /* Check if the backend type tracks statistics */
+ if (!beentry || !pgstat_tracks_backend_bktype(beentry->st_backendType))
return NULL;
backend_stats = pgstat_fetch_stat_backend(procNumber);
@@ -133,10 +134,6 @@ pgstat_fetch_stat_backend_by_pid(int pid, BackendType *bktype)
if (beentry->st_procpid != pid)
return NULL;
- /* backend may be gone, so recheck in case */
- if (beentry->st_backendType == B_INVALID)
- return NULL;
-
if (bktype)
*bktype = beentry->st_backendType;
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index c1c2e6dc386..16a1ecb4d90 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -56,6 +56,7 @@ pgstat_report_wal(bool force)
/* flush IO stats */
pgstat_flush_io(nowait);
+ (void) pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_IO);
}
/*
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 85061e29bd1..9ab0429e749 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1941,19 +1941,28 @@ Datum
pg_stat_reset_backend_stats(PG_FUNCTION_ARGS)
{
PGPROC *proc;
+ PgBackendStatus *beentry;
+ ProcNumber procNumber;
int backend_pid = PG_GETARG_INT32(0);
proc = BackendPidGetProc(backend_pid);
- /*
- * This could be an auxiliary process but these do not report backend
- * statistics due to pgstat_tracks_backend_bktype(), so there is no need
- * for an extra call to AuxiliaryPidGetProc().
- */
+ /* This could be an auxiliary process */
if (!proc)
+ {
+ proc = AuxiliaryPidGetProc(backend_pid);
+ if (!proc)
+ PG_RETURN_VOID();
+ }
+
+ procNumber = GetNumberFromPGProc(proc);
+
+ beentry = pgstat_get_beentry_by_proc_number(procNumber);
+ /* Check if the backend type tracks statistics */
+ if (!beentry || !pgstat_tracks_backend_bktype(beentry->st_backendType))
PG_RETURN_VOID();
- pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, GetNumberFromPGProc(proc));
+ pgstat_reset(PGSTAT_KIND_BACKEND, InvalidOid, procNumber);
PG_RETURN_VOID();
}
--
2.34.1
On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
Also attaching the patch I mentioned up-thread to address some of Rahila's
comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER.
Oops, yes, you are right on this one. This change should have
happened earlier. The flow you are using in 0002 is similar to
pg_log_backend_memory_contexts(), which looks OK at quick glance.
--
Michael
On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
Also attaching the patch I mentioned up-thread to address some of Rahila's
comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER.
Okay by me as it makes the code automatically more flexible if
pgstat_tracks_backend_bktype() gets tweaked, including the call of
pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
able to report backend stats for its WAL I/O. Applied this part as of
3f1db99bfabb.
Something that's still not quite right is that the WAL receiver and
the WAL summarizer do not call pgstat_report_wal() at all, so we don't
report much data and we expect these processes to run continuously.
The location where to report stats for the WAL summarizer is simple,
even if the system is aggressive with WAL this is never called more
than a couple of times per seconds, like the WAL writer:
@@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
* so we don't tight-loop.
*/
HandleWalSummarizerInterrupts();
+
+ /* report pending statistics to the cumulative stats system */
+ pgstat_report_wal(false);
+
summarizer_wait_for_wal();
At this location, the WAL summarizer would wait as there is no data to
read. The hot path is when we're reading a block.
The WAL receiver is a different story, because the WaitLatchOrSocket()
call in the main loop of WalReceiverMain() is *very* aggressive, and
it's easy to reach this code dozens of times each millisecond. In
short, we need to be careful, I think, based on how this is currently
written. My choice is then this path:
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
*/
bool requestReply = false;
+ /* report pending statistics to the cumulative stats system */
+ pgstat_report_wal(false);
+
/*
* Check if time since last receive from prim
This would update the stats only when the WAL receiver has nothing to
do or if wal_receiver_status_interval is reached, so we're not going
to storm pgstats with updates, still we get some data on a periodic
basis *because* wal_receiver_status_interval would make sure that the
path is taken even if we're under a lot of WAL pressure when sending
feedback messages back to the WAL sender. Of course this needs a
pretty good comment explaining the choice of this location. What do
you think?
It looks like it does not need doc updates. Attached as 0002 as it's somehow
un-related to this thread (but not sure it deserves it's dedicated thread though).
I'm wondering if we should not lift more the list of processes listed
in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
entire paragraph. Not sure if we really have to do that for this
release, but we could look at that separately.
With 3f1db99bfabb in place, wouldn't it be simpler to update
pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
with one call of pgstat_flush_backend()? This saves one call for each
stats flush.
--
Michael
Hi,
On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
On Fri, Feb 28, 2025 at 09:26:08AM +0000, Bertrand Drouvot wrote:
Also attaching the patch I mentioned up-thread to address some of Rahila's
comments ([1]): It adds a AuxiliaryPidGetProc() call in pgstat_fetch_stat_backend_by_pid()
and pg_stat_reset_backend_stats(). I think that fully makes sense since a051e71e28a
modified pgstat_tracks_backend_bktype() for B_WAL_RECEIVER, B_WAL_SUMMARIZER
and B_WAL_WRITER.Okay by me as it makes the code automatically more flexible if
pgstat_tracks_backend_bktype() gets tweaked, including the call of
pgstat_flush_backend() in pgstat_report_wal() so as the WAL writer is
able to report backend stats for its WAL I/O. Applied this part as of
3f1db99bfabb.
Thanks!
Something that's still not quite right is that the WAL receiver and
the WAL summarizer do not call pgstat_report_wal() at all, so we don't
report much data and we expect these processes to run continuously.
The location where to report stats for the WAL summarizer is simple,
even if the system is aggressive with WAL this is never called more
than a couple of times per seconds, like the WAL writer:@@ -1541,6 +1542,10 @@ summarizer_read_local_xlog_page(XLogReaderState *state, * so we don't tight-loop. */ HandleWalSummarizerInterrupts(); + + /* report pending statistics to the cumulative stats system */ + pgstat_report_wal(false); + summarizer_wait_for_wal();At this location, the WAL summarizer would wait as there is no data to
read. The hot path is when we're reading a block.
Did not look closely enough but that sounds right after a quick look.
The WAL receiver is a different story, because the WaitLatchOrSocket() call in the main loop of WalReceiverMain() is *very* aggressive, and it's easy to reach this code dozens of times each millisecond. In short, we need to be careful, I think, based on how this is currently written. My choice is then this path: --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -583,6 +583,10 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len) */ bool requestReply = false;+ /* report pending statistics to the cumulative stats system */ + pgstat_report_wal(false); + /* * Check if time since last receive from primThis would update the stats only when the WAL receiver has nothing to
do or if wal_receiver_status_interval is reached, so we're not going
to storm pgstats with updates, still we get some data on a periodic
basis *because* wal_receiver_status_interval would make sure that the
path is taken even if we're under a lot of WAL pressure when sending
feedback messages back to the WAL sender. Of course this needs a
pretty good comment explaining the choice of this location. What do
you think?
Same as above, that sounds right after a quick look.
It looks like it does not need doc updates. Attached as 0002 as it's somehow
un-related to this thread (but not sure it deserves it's dedicated thread though).I'm wondering if we should not lift more the list of processes listed
in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
entire paragraph. Not sure if we really have to do that for this
release, but we could look at that separately.
hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or
update the documentation?
With 3f1db99bfabb in place, wouldn't it be simpler to update
pgstat_report_wal() in v12-0001 so as we use PGSTAT_BACKEND_FLUSH_ALL
with one call of pgstat_flush_backend()? This saves one call for each
stats flush.
hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.
I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
we check:
"
if (pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)))
return false;
"
but the WAL stats are not part of PgStat_BackendPending... So we only check
for IO pending stats here. I'm not sure WAL stats could be non empty if IO
stats are but the attached now also takes care of pending WAL stats here.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v13-0001-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 40100d26d7b6d69367de8670f5df75abbfcb0d57 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v13] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 67 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 +++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 37 ++++++------
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 158 insertions(+), 22 deletions(-)
15.5% doc/src/sgml/
41.0% src/backend/utils/activity/
15.2% src/backend/utils/adt/
8.5% src/include/catalog/
4.4% src/include/utils/
8.2% src/test/regress/expected/
6.2% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..f4c37c811ba 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4860,6 +4860,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index a9343b7b59e..abacdaed330 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;
if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
return false;
entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
@@ -211,6 +271,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -258,6 +321,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 5d3da4b674e..16a1ecb4d90 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 0212d8d5906..f24dac483c7 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1606,8 +1606,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1656,6 +1656,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bac..3e35f8b8e99 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d..06359b9157d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
Something that's still not quite right is that the WAL receiver and
the WAL summarizer do not call pgstat_report_wal() at all, so we don't
report much data and we expect these processes to run continuously.
The location where to report stats for the WAL summarizer is simple,
even if the system is aggressive with WAL this is never called more
than a couple of times per seconds, like the WAL writer:Same as above, that sounds right after a quick look.
Attached is a patch for this set of issues for the WAL receiver, the
WAL summarizer and the WAL writer. Another thing that we can do
better is restrict pgstat_tracks_io_object() so as we don't report
rows for non-WAL IOObject in the case of these three. Two tests are
added for the WAL receiver and WAL summarizer, checking that the stats
are gathered for both. For the WAL receiver, we have at least the
activity coming from one WAL segment created in the init context, at
least. The WAL summarizer is more pro-active with its reads in its
TAP test.
All that should be fixed before looking at the remaining patch for the
WAL stats at backend level, so what do you think about the attached?
I'm wondering if we should not lift more the list of processes listed
in pgstat_tracks_backend_bktype() and include B_AUTOVAC_LAUNCHER,
B_STARTUP, B_CHECKPOINTER, B_BG_WRITER at this stage, removing the
entire paragraph. Not sure if we really have to do that for this
release, but we could look at that separately.hm, do you mean update the comment on top of pgstat_tracks_backend_bktype() or
update the documentation?
My argument would be to make pgstat_tracks_backend_bktype() the same
as pgstat_io.c, and reflect that in the docs and the comments.
hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX is not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it is.
OK, I can see your point here. Fine by me.
--
Michael
Attachments:
0001-Fix-some-gaps-with-pg_stat_wal-and-WAL-related-proce.patchtext/x-diff; charset=us-asciiDownload
From c1ca96900c8e719820f37408bfff83af027401b1 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 4 Mar 2025 09:15:08 +0900
Subject: [PATCH] Fix some gaps with pg_stat_wal and WAL-related processes
The WAL receiver and WAL summarizer processes gain each one a call to
pgstat_report_wal(), to make sure that they report their WAL statistics.
pg_stat_io is adjusted so as these processes do not report any rows when
the IOObject is not WAL, making the view easier to use.
---
src/backend/postmaster/walsummarizer.c | 4 ++++
src/backend/replication/walreceiver.c | 10 ++++++++++
src/backend/utils/activity/pgstat_io.c | 12 +++++++++++-
src/bin/pg_walsummary/t/002_blocks.pl | 7 +++++++
src/test/recovery/t/001_stream_rep.pl | 7 +++++++
5 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/src/backend/postmaster/walsummarizer.c b/src/backend/postmaster/walsummarizer.c
index f4d61c1f3bb8..17cb6984a01d 100644
--- a/src/backend/postmaster/walsummarizer.c
+++ b/src/backend/postmaster/walsummarizer.c
@@ -33,6 +33,7 @@
#include "common/blkreftable.h"
#include "libpq/pqsignal.h"
#include "miscadmin.h"
+#include "pgstat.h"
#include "postmaster/auxprocess.h"
#include "postmaster/interrupt.h"
#include "postmaster/walsummarizer.h"
@@ -1543,6 +1544,9 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
HandleWalSummarizerInterrupts();
summarizer_wait_for_wal();
+ /* report pending statistics to the cumulative stats system */
+ pgstat_report_wal(false);
+
/* Recheck end-of-WAL. */
latest_lsn = GetLatestLSN(&latest_tli);
if (private_data->tli == latest_tli)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 82f7302ff9fd..6cb22ec2bfd7 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -583,6 +583,16 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
*/
bool requestReply = false;
+ /*
+ * Report pending statistics to the cumulative stats
+ * system. This location is useful for the report as it is
+ * not within a tight loop in the WAL receiver, which
+ * would bloat requests to pgstats, while also making sure
+ * that the reports happen at least each time a status
+ * update is sent.
+ */
+ pgstat_report_wal(false);
+
/*
* Check if time since last receive from primary has
* reached the configured limit.
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index ba11545a17f3..eb5750255961 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -435,12 +435,22 @@ pgstat_tracks_io_object(BackendType bktype, IOObject io_object,
*/
no_temp_rel = bktype == B_AUTOVAC_LAUNCHER || bktype == B_BG_WRITER ||
bktype == B_CHECKPOINTER || bktype == B_AUTOVAC_WORKER ||
- bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP;
+ bktype == B_STANDALONE_BACKEND || bktype == B_STARTUP ||
+ bktype == B_WAL_SUMMARIZER || bktype == B_WAL_WRITER ||
+ bktype == B_WAL_RECEIVER;
if (no_temp_rel && io_context == IOCONTEXT_NORMAL &&
io_object == IOOBJECT_TEMP_RELATION)
return false;
+ /*
+ * Some BackendTypes only perform IO under IOOBJECT_WAL, hence exclude all
+ * rows for all the other objects for these.
+ */
+ if ((bktype == B_WAL_SUMMARIZER || bktype == B_WAL_RECEIVER ||
+ bktype == B_WAL_WRITER) && io_object != IOOBJECT_WAL)
+ return false;
+
/*
* Some BackendTypes do not currently perform any IO in certain
* IOContexts, and, while it may not be inherently incorrect for them to
diff --git a/src/bin/pg_walsummary/t/002_blocks.pl b/src/bin/pg_walsummary/t/002_blocks.pl
index 27f29a3b0c68..270332780a45 100644
--- a/src/bin/pg_walsummary/t/002_blocks.pl
+++ b/src/bin/pg_walsummary/t/002_blocks.pl
@@ -46,6 +46,13 @@ SELECT EXISTS (
EOM
ok($result, "WAL summarization caught up after insert");
+# The WAL summarizer should have generated some IO statistics.
+my $stats_reads = $node1->safe_psql(
+ 'postgres',
+ qq{SELECT sum(reads) > 0 FROM pg_stat_io
+ WHERE backend_type = 'walsummarizer' AND object = 'wal'});
+is($stats_reads, 't', "WAL summarizer generates statistics for WAL reads");
+
# Find the highest LSN that is summarized on disk.
my $summarized_lsn = $node1->safe_psql('postgres', <<EOM);
SELECT MAX(end_lsn) AS summarized_lsn FROM pg_available_wal_summaries()
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index ee57d234c861..3945f00ab884 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -506,6 +506,13 @@ $node_standby_2->append_conf('postgresql.conf', "primary_slot_name = ''");
$node_standby_2->enable_streaming($node_primary);
$node_standby_2->reload;
+# The WAL receiver should have generated some IO statistics.
+my $stats_reads = $node_standby_1->safe_psql(
+ 'postgres',
+ qq{SELECT sum(writes) > 0 FROM pg_stat_io
+ WHERE backend_type = 'walreceiver' AND object = 'wal'});
+is($stats_reads, 't', "WAL receiver generates statistics for WAL writes");
+
# be sure do not streaming from cascade
$node_standby_1->stop;
--
2.47.2
Hi,
On Tue, Mar 04, 2025 at 09:28:27AM +0900, Michael Paquier wrote:
On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
On Mon, Mar 03, 2025 at 10:48:23AM +0900, Michael Paquier wrote:
Something that's still not quite right is that the WAL receiver and
the WAL summarizer do not call pgstat_report_wal() at all, so we don't
report much data and we expect these processes to run continuously.
The location where to report stats for the WAL summarizer is simple,
even if the system is aggressive with WAL this is never called more
than a couple of times per seconds, like the WAL writer:Same as above, that sounds right after a quick look.
Attached is a patch for this set of issues for the WAL receiver, the
WAL summarizer and the WAL writer.
Thanks for the patch!
=== 1
@@ -1543,6 +1544,9 @@ summarizer_read_local_xlog_page(XLogReaderState *state,
HandleWalSummarizerInterrupts();
summarizer_wait_for_wal();
+ /* report pending statistics to the cumulative stats system */
+ pgstat_report_wal(false);
s/report/Report/ and s/system/system./? to be consistent with the other single
line comments around.
=== 2
+ /*
+ * Report pending statistics to the cumulative stats
+ * system. This location is useful for the report as it is
+ * not within a tight loop in the WAL receiver, which
+ * would bloat requests to pgstats, while also making sure
+ * that the reports happen at least each time a status
+ * update is sent.
+ */
Yeah, I also think that's the right location.
Nit: s/would/could/?
=== 3
+ /*
+ * Some BackendTypes only perform IO under IOOBJECT_WAL, hence exclude all
+ * rows for all the other objects for these.
+ */
+ if ((bktype == B_WAL_SUMMARIZER || bktype == B_WAL_RECEIVER ||
+ bktype == B_WAL_WRITER) && io_object != IOOBJECT_WAL)
+ return false;
I think that makes sense and it removes 15 lines out of 86. This function is
"hard" to read/parse from my point of view. Maybe we could re-write it in a
simpler way but that's outside the purpose of this thread.
=== 4
+ WHERE backend_type = 'walsummarizer' AND object = 'wal'});
The object = 'wal' is not needed (thanks to === 3), maybe we can remove this
filtering?
Also what about adding a test to check that sum(reads) is NULL where object != 'wal'?
=== 5
Same remark as above for the WAL receiver (excepts that sum(writes) is NULL where
object != 'wal').
All that should be fixed before looking at the remaining patch for the
WAL stats at backend level
Not sure as that would also prevent the other backend types to report their WAL
statistics if the above is not fixed.
, so what do you think about the attached?
That's pretty straightforward, so yeah we can wait that it goes in before
moving forward with the WAL stats at backend level.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Mar 04, 2025 at 08:48:28AM +0000, Bertrand Drouvot wrote:
s/report/Report/ and s/system/system./? to be consistent with the other single
line comments around.
Right.
Yeah, I also think that's the right location.
We could be more optimal for the WAL receiver if we add more timestamp
calculations, I think, but that's a sensitive loop, and this is better
than no information anyway. If somebody has a better idea, feel free.
We could have an extra GUC to control that, but I'm feeling that we
should restructure the WAL receiver before that, perhaps leverage some
of its activity elsewhere (?).
+ /* + * Some BackendTypes only perform IO under IOOBJECT_WAL, hence exclude all + * rows for all the other objects for these. + */ + if ((bktype == B_WAL_SUMMARIZER || bktype == B_WAL_RECEIVER || + bktype == B_WAL_WRITER) && io_object != IOOBJECT_WAL) + return false;I think that makes sense and it removes 15 lines out of 86. This function is
"hard" to read/parse from my point of view. Maybe we could re-write it in a
simpler way but that's outside the purpose of this thread.
One thing I am planning to do here to improve the situation is the
addition of a regression test that queries pg_stat_io for all the
combinations of backend_type, object and contexts that are now
allowed, to keep track of the number of tuples we have.
+ WHERE backend_type = 'walsummarizer' AND object = 'wal'});
The object = 'wal' is not needed (thanks to === 3), maybe we can remove this
filtering?Also what about adding a test to check that sum(reads) is NULL where object != 'wal'?
Not sure it matters as long as we track the supported combinations.
We need something a bit more general here.
(I've actually found a different issue while looking at the WAL
receiver, which is a bit older than what we have here. Will post that
in a different thread with you in CC.)
--
Michael
Subject: Clarification Needed on WAL Pending Checks in Patchset
Hi,
Thank you for the patchset. I’ve spent some time learning and reviewing it
and have 2 comments. I'm new to PostgreSQL hacking, so please bear with me
if I make mistakes or say something that seems trivial.
I noticed that in patches prior to patch 6, the function
pgstat_backend_wal_have_pending() was implemented as follows:
/*
* To determine whether any WAL activity has occurred since last time, not
* only the number of generated WAL records but also the numbers of WAL
* writes and syncs need to be checked. Because even transactions that
* generate no WAL records can write or sync WAL data when flushing the
* data pages.
*/
static bool
pgstat_backend_wal_have_pending(void)
{
PgStat_PendingWalStats pending_wal;
pending_wal = PendingBackendStats.pending_wal;
return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
}
Starting with patch 7, it seems the implementation was simplified to:
/*
* To determine whether WAL usage happened.
*/
static bool
pgstat_backend_wal_have_pending(void)
{
return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
}
Meanwhile, the cluster-level check in the function
pgstat_wal_have_pending_cb() still performs the additional checks:
bool
pgstat_wal_have_pending_cb(void)
{
return pgWalUsage.wal_records != prevWalUsage.wal_records ||
PendingWalStats.wal_write != 0 ||
PendingWalStats.wal_sync != 0;
}
The difference lies in the removal of the checks for wal_write and wal_sync
from the per-backend function. I assume that this may be due to factoring
out these counters—perhaps because they can now be extracted from
pg_stat_get_backend_io(). However, I’m not entirely sure I grasp the full
rationale behind this change.
Another one is on:
Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月3日周一 18:47写道:
Hi,
On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents
things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX isnot
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it
is.
I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
we check:"
if (pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)))
return false;
"but the WAL stats are not part of PgStat_BackendPending... So we only check
for IO pending stats here. I'm not sure WAL stats could be non empty if IO
stats are but the attached now also takes care of pending WAL stats here.Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
I've noticed that here's a function for checking if there are any backend
stats waiting for flush:
/*
* Check if there are any backend stats waiting for flush.
*/
bool
pgstat_backend_have_pending_cb(void)
{
return (!pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)));
}
[PGSTAT_KIND_BACKEND] = {
....
.have_static_pending_cb = pgstat_backend_have_pending_cb,
.flush_static_cb = pgstat_backend_flush_cb,
.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
},
Should the following modification be applied to the above function as well.
Or should we modify the comment 'any backend stat' if we choose to check
i/o only.
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;
if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
return false;
Best regards,
[Xuneng]
Hi,
On Wed, Mar 05, 2025 at 05:45:57PM +0800, Xuneng Zhou wrote:
Subject: Clarification Needed on WAL Pending Checks in Patchset
Hi,
Thank you for the patchset. I’ve spent some time learning and reviewing it
and have 2 comments.
Thanks for looking at it!
I noticed that in patches prior to patch 6, the function
pgstat_backend_wal_have_pending() was implemented as follows:/*
* To determine whether any WAL activity has occurred since last time, not
* only the number of generated WAL records but also the numbers of WAL
* writes and syncs need to be checked. Because even transactions that
* generate no WAL records can write or sync WAL data when flushing the
* data pages.
*/
static bool
pgstat_backend_wal_have_pending(void)
{
PgStat_PendingWalStats pending_wal;pending_wal = PendingBackendStats.pending_wal;
return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
}Starting with patch 7, it seems the implementation was simplified to:
/*
* To determine whether WAL usage happened.
*/
static bool
pgstat_backend_wal_have_pending(void)
{
return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
}
That's right. This is due to 2421e9a51d2 that removed PgStat_PendingWalStats.
Meanwhile, the cluster-level check in the function
pgstat_wal_have_pending_cb() still performs the additional checks:bool
pgstat_wal_have_pending_cb(void)
{
return pgWalUsage.wal_records != prevWalUsage.wal_records ||
PendingWalStats.wal_write != 0 ||
PendingWalStats.wal_sync != 0;
}
Not since 2421e9a51d2. It looks like that you are looking at code prior to
2421e9a51d2.
Another one is on:
Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月3日周一 18:47写道:Hi,
On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents
things
that need to be called from pgstat_report_wal(). But I think that's open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where XXX isnot
related to pgstat_report_wal() at all. So, I'm tempted to keep it as it
is.
I just realized that pgstat_flush_backend() is not correct in 0001. Indeed
we check:"
if (pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)))
return false;
"but the WAL stats are not part of PgStat_BackendPending... So we only check
for IO pending stats here. I'm not sure WAL stats could be non empty if IO
stats are but the attached now also takes care of pending WAL stats here.I've noticed that here's a function for checking if there are any backend
stats waiting for flush:
/*
* Check if there are any backend stats waiting for flush.
*/
bool
pgstat_backend_have_pending_cb(void)
{
return (!pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)));
}
That's right.
The reason I did not add the extra check there is because I have in mind to
replace the pg_memory_is_all_zeros() checks by a new global variable and also replace
the checks in pgstat_flush_backend() by a call to pgstat_backend_have_pending_cb()
(see 0002 in [1]/messages/by-id/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal). It means that all of that would be perfectly clean if
0002 in [1]/messages/by-id/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal goes in.
But yeah, if 0002 in [1]/messages/by-id/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal does not go in, then your concern is valid, so adding
the extra check in the attached.
Thanks for the review!
[1]: /messages/by-id/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v14-0001-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From c9500ba1ed93f97aee2f9f4a97bcbfaaff998cad Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v14] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 70 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 +++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 37 +++++------
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 160 insertions(+), 23 deletions(-)
15.1% doc/src/sgml/
42.6% src/backend/utils/activity/
14.7% src/backend/utils/adt/
8.3% src/include/catalog/
4.2% src/include/utils/
8.0% src/test/regress/expected/
6.1% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..b1710680705 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4866,6 +4866,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index a9343b7b59e..01e072a9bfd 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;
if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
return false;
entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
@@ -211,6 +271,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -223,7 +286,8 @@ bool
pgstat_backend_have_pending_cb(void)
{
return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ sizeof(struct PgStat_BackendPending))
+ || pgstat_backend_wal_have_pending());
}
/*
@@ -258,6 +322,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 5d3da4b674e..16a1ecb4d90 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9172e1cb9d2..662ce46cbc2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1609,8 +1609,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1659,6 +1659,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bac..3e35f8b8e99 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d..06359b9157d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
Hi,
On 2025-03-05 13:03:07 +0000, Bertrand Drouvot wrote:
But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding
the extra check in the attached.
This crashes in cfbot:
https://cirrus-ci.com/task/5111872610893824
[13:42:37.315] src/tools/ci/cores_backtrace.sh freebsd /tmp/cores
[13:42:37.620] dumping /tmp/cores/postgres.7656.core for /tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres
[13:42:37.749] [New LWP 101860]
[13:42:37.831] Core was generated by `postgres: primary: checkpointer '.
[13:42:37.831] Program terminated with signal SIGABRT, Aborted.
[13:42:37.831] Sent by thr_kill() from pid 7656 and user 1003.
[13:42:37.831] #0 0x000000082c0f941a in thr_kill () from /lib/libc.so.7
[13:42:37.860]
[13:42:37.860] Thread 1 (LWP 101860):
[13:42:37.860] #0 0x000000082c0f941a in thr_kill () from /lib/libc.so.7
[13:42:37.860] No symbol table info available.
[13:42:37.860] #1 0x000000082c072e64 in raise () from /lib/libc.so.7
[13:42:37.860] No symbol table info available.
[13:42:37.860] #2 0x000000082c1236f9 in abort () from /lib/libc.so.7
[13:42:37.860] No symbol table info available.
[13:42:37.860] #3 0x0000000000ab2125 in ExceptionalCondition (conditionName=0x340512 "!pgStatLocal.shmem->is_shutdown", fileName=<optimized out>, lineNumber=lineNumber@entry=746) at ../src/backend/utils/error/assert.c:66
[13:42:37.860] No locals.
[13:42:37.860] #4 0x000000000096bcd4 in pgstat_report_stat (force=true) at ../src/backend/utils/activity/pgstat.c:746
[13:42:37.860] pending_since = 0
[13:42:37.860] last_flush = 794496967784484
[13:42:37.860] now = <optimized out>
[13:42:37.860] partial_flush = <optimized out>
[13:42:37.860] nowait = <optimized out>
[13:42:37.860] #5 0x000000000096bef9 in pgstat_shutdown_hook (code=<optimized out>, arg=<optimized out>) at ../src/backend/utils/activity/pgstat.c:616
[13:42:37.860] No locals.
[13:42:37.860] #6 0x00000000009221b1 in shmem_exit (code=code@entry=0) at ../src/backend/storage/ipc/ipc.c:243
[13:42:37.860] No locals.
[13:42:37.860] #7 0x00000000009220a8 in proc_exit_prepare (code=101860, code@entry=0) at ../src/backend/storage/ipc/ipc.c:198
[13:42:37.860] No locals.
[13:42:37.860] #8 0x0000000000921fef in proc_exit (code=code@entry=0) at ../src/backend/storage/ipc/ipc.c:111
[13:42:37.860] No locals.
[13:42:37.860] #9 0x00000000008a265e in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:630
[13:42:37.860] local_sigjmp_buf = {{_sjb = {9052188, 34925197376, 34925197368, 34925197536, 12666224, 11, 1, 35301277696, 895, -2147815357, -1, 34359738369}}}
[13:42:37.860] checkpointer_context = <optimized out>
[13:42:37.860] #10 0x00000000008a35a5 in postmaster_child_launch (child_type=child_type@entry=B_CHECKPOINTER, child_slot=56, startup_data=startup_data@entry=0x0, startup_data_len=startup_data_len@entry=0, client_sock=client_sock@entry=0x0) at ../src/backend/postmaster/launch_backend.c:274
[13:42:37.860] pid = <optimized out>
[13:42:37.860] #11 0x00000000008a61a7 in StartChildProcess (type=type@entry=B_CHECKPOINTER) at ../src/backend/postmaster/postmaster.c:3905
[13:42:37.860] pmchild = 0x0
[13:42:37.860] pid = <optimized out>
[13:42:37.860] #12 0x00000000008a5d16 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x821b43e68) at ../src/backend/postmaster/postmaster.c:1371
[13:42:37.860] userDoption = <optimized out>
[13:42:37.860] listen_addr_saved = false
[13:42:37.860] output_config_variable = <optimized out>
[13:42:37.860] opt = <optimized out>
[13:42:37.860] status = <optimized out>
[13:42:37.860] #13 0x00000000007da2e0 in main (argc=4, argv=0x821b43e68) at ../src/backend/main/main.c:230
[13:42:37.860] do_check_root = <optimized out>
[13:42:37.860] dispatch_option = DISPATCH_POSTMASTER
[13:42:37.870]
Hi,
On Wed, Mar 05, 2025 at 09:18:16AM -0500, Andres Freund wrote:
Hi,
On 2025-03-05 13:03:07 +0000, Bertrand Drouvot wrote:
But yeah, if 0002 in [1] does not go in, then your concern is valid, so adding
the extra check in the attached.This crashes in cfbot:
Thanks for the report! I usually always run a make check-world locally and also
launch the CI tests on my github repo before submitting patches. This time,
that was a one line change (as compared to v13), so confident enough I did not
trigger those tests. Murphy's Law I guess ;-)
So yeah, back to the issue, we have to pay more attention for the WAL stats
because pgWalUsage is "incremented" without any check with pgstat_tracks_backend_bktype()
(that's not the case for the IO stats where the counters are incremented taking
into account pgstat_tracks_backend_bktype()).
So for the moment, in the attached I "just" add a pgstat_tracks_backend_bktype()
check in pgstat_backend_have_pending_cb() but I'm not sure I like it that much...
Will think more about it...
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v15-0001-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From db6b4601108c4ec4a636d5085c30baad758b6f5e Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v15] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 71 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 +++++++-
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 37 +++++------
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 ++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 161 insertions(+), 23 deletions(-)
14.9% doc/src/sgml/
43.2% src/backend/utils/activity/
14.6% src/backend/utils/adt/
8.2% src/include/catalog/
4.2% src/include/utils/
7.9% src/test/regress/expected/
6.0% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..b1710680705 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4866,6 +4866,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index a9343b7b59e..7be35e667dd 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;
if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
return false;
entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
@@ -211,6 +271,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -223,7 +286,9 @@ bool
pgstat_backend_have_pending_cb(void)
{
return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ sizeof(struct PgStat_BackendPending))
+ || (pgstat_tracks_backend_bktype(MyBackendType) &&
+ pgstat_backend_wal_have_pending()));
}
/*
@@ -258,6 +323,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 5d3da4b674e..16a1ecb4d90 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9172e1cb9d2..662ce46cbc2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1609,8 +1609,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1659,6 +1659,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0bac..3e35f8b8e99 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d..06359b9157d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
Hi,
Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月5日周三 21:03写道:
Show quoted text
Hi,
On Wed, Mar 05, 2025 at 05:45:57PM +0800, Xuneng Zhou wrote:
Subject: Clarification Needed on WAL Pending Checks in Patchset
Hi,
Thank you for the patchset. I’ve spent some time learning and reviewing
it
and have 2 comments.
Thanks for looking at it!
I noticed that in patches prior to patch 6, the function
pgstat_backend_wal_have_pending() was implemented as follows:/*
* To determine whether any WAL activity has occurred since last time,not
* only the number of generated WAL records but also the numbers of WAL
* writes and syncs need to be checked. Because even transactions that
* generate no WAL records can write or sync WAL data when flushing the
* data pages.
*/
static bool
pgstat_backend_wal_have_pending(void)
{
PgStat_PendingWalStats pending_wal;pending_wal = PendingBackendStats.pending_wal;
return pgWalUsage.wal_records != prevBackendWalUsage.wal_records ||
pending_wal.wal_write != 0 || pending_wal.wal_sync != 0;
}Starting with patch 7, it seems the implementation was simplified to:
/*
* To determine whether WAL usage happened.
*/
static bool
pgstat_backend_wal_have_pending(void)
{
return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
}That's right. This is due to 2421e9a51d2 that removed
PgStat_PendingWalStats.
Meanwhile, the cluster-level check in the function
pgstat_wal_have_pending_cb() still performs the additional checks:bool
pgstat_wal_have_pending_cb(void)
{
return pgWalUsage.wal_records != prevWalUsage.wal_records ||
PendingWalStats.wal_write != 0 ||
PendingWalStats.wal_sync != 0;
}Not since 2421e9a51d2. It looks like that you are looking at code prior to
2421e9a51d2.
Yeh, my local version is behind the main branch.
Another one is on:
Bertrand Drouvot <bertranddrouvot.pg@gmail.com> 于2025年3月3日周一 18:47写道:Hi,
On Mon, Mar 03, 2025 at 09:17:30AM +0000, Bertrand Drouvot wrote:
hmm, that would work as long as PGSTAT_BACKEND_FLUSH_ALL represents
things
that need to be called from pgstat_report_wal(). But I think that's
open
door for issue should be add a new PGSTAT_BACKEND_FLUSH_XXX where
XXX is
not
related to pgstat_report_wal() at all. So, I'm tempted to keep it as
it
is.
I just realized that pgstat_flush_backend() is not correct in 0001.
Indeed
we check:
"
if (pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)))
return false;
"but the WAL stats are not part of PgStat_BackendPending... So we only
check
for IO pending stats here. I'm not sure WAL stats could be non empty
if IO
stats are but the attached now also takes care of pending WAL stats
here.
I've noticed that here's a function for checking if there are any backend
stats waiting for flush:
/*
* Check if there are any backend stats waiting for flush.
*/
bool
pgstat_backend_have_pending_cb(void)
{
return (!pg_memory_is_all_zeros(&PendingBackendStats,
sizeof(struct PgStat_BackendPending)));
}That's right.
The reason I did not add the extra check there is because I have in mind
to
replace the pg_memory_is_all_zeros() checks by a new global variable and
also replace
the checks in pgstat_flush_backend() by a call to
pgstat_backend_have_pending_cb()
(see 0002 in [1]). It means that all of that would be perfectly clean if
0002 in [1] goes in.But yeah, if 0002 in [1] does not go in, then your concern is valid, so
adding
the extra check in the attached.Thanks for the review!
[1]:
/messages/by-id/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internalRegards,
That makes more sense, I'll take a look at thread 1. Separating patches
helps clarify their purpose, but it may also fragment the overall
perspective:) Thanks a lot for your explaination!
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Wed, Mar 05, 2025 at 05:26:40PM +0000, Bertrand Drouvot wrote:
So yeah, back to the issue, we have to pay more attention for the WAL stats
because pgWalUsage is "incremented" without any check with pgstat_tracks_backend_bktype()
(that's not the case for the IO stats where the counters are incremented taking
into account pgstat_tracks_backend_bktype()).So for the moment, in the attached I "just" add a pgstat_tracks_backend_bktype()
check in pgstat_backend_have_pending_cb() but I'm not sure I like it that much...Will think more about it...
After giving more thoughts, I think that the way it's currently done makes sense.
There is no need to check pgstat_tracks_backend_bktype() while incrementing
pgWalUsage or to create a "BackendpgWalUsage" (that would be incremented based
on the pgstat_tracks_backend_bktype()).
In pgstat_create_backend() (called based on pgstat_tracks_backend_bktype()):
- PendingBackendStats is initialized to zero
- prevBackendWalUsage is initialized to pgWalUsage
But:
1. PendingBackendStats is "incremented" during IO operations, so that it makes
sense to ensure that pgstat_tracks_backend_bktype() returns true in those functions
(i.e pgstat_count_backend_io_op_time() and pgstat_count_backend_io_op()).
2. prevBackendWalUsage is not incremented, it's just there to compute the delta
with pgWalUsage in pgstat_flush_backend_entry_wal().
pgstat_flush_backend_entry_wal() is only called from pgstat_flush_backend() that
does the pgstat_tracks_backend_bktype() before. And so is
pgstat_flush_backend_entry_io().
So, I think it's fine the way it's done here. The missing part was in
pgstat_backend_have_pending_cb() which has been fixed.
But this missing part was already there for the IO stats. It just not manifested
yet maybe because PendingBackendStats was full of zeroes by "luck".
Indeed, there is no reason for pgstat_backend_have_pending_cb() to return true if
pgstat_tracks_backend_bktype() is not satisfied.
So it deserves a dedicated patch to fix this already existing issue: 0001 attached.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v16-0001-Add-an-extra-check-in-pgstat_backend_have_pendin.patchtext/x-diff; charset=us-asciiDownload
From 38c718f8cd2c079219e5341e72e99664f150a581 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Thu, 6 Mar 2025 09:49:49 +0000
Subject: [PATCH v16 1/2] Add an extra check in
pgstat_backend_have_pending_cb()
There is no reason for pgstat_backend_have_pending_cb() to not check
for pgstat_tracks_backend_bktype(). It could wrongly reports true should
PendingBackendStats not be full of zeroes.
---
src/backend/utils/activity/pgstat_backend.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
100.0% src/backend/utils/activity/
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index a9343b7b59e..1d94d3176b2 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -222,8 +222,11 @@ pgstat_flush_backend(bool nowait, bits32 flags)
bool
pgstat_backend_have_pending_cb(void)
{
- return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ if (!pgstat_tracks_backend_bktype(MyBackendType))
+ return false;
+ else
+ return (!pg_memory_is_all_zeros(&PendingBackendStats,
+ sizeof(struct PgStat_BackendPending)));
}
/*
--
2.34.1
v16-0002-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From b3edf7c2bd448dc370847ba119ab9007b5b92b8f Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 6 Jan 2025 10:00:00 +0000
Subject: [PATCH v16 2/2] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
doc/src/sgml/monitoring.sgml | 19 ++++++
src/backend/utils/activity/pgstat_backend.c | 70 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 +++++++-
src/include/catalog/pg_proc.dat | 7 +++
src/include/pgstat.h | 37 +++++------
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 14 +++++
src/test/regress/sql/stats.sql | 6 ++
9 files changed, 160 insertions(+), 23 deletions(-)
15.1% doc/src/sgml/
42.6% src/backend/utils/activity/
14.7% src/backend/utils/adt/
8.3% src/include/catalog/
4.2% src/include/utils/
8.0% src/test/regress/expected/
6.1% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..b1710680705 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4866,6 +4866,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 1d94d3176b2..b2c82f37301 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats;
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting
+ * the previous counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened. Avoid
+ * taking lock for nothing in that case.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * We don't update the WAL usage portion of the local WalStats elsewhere.
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;
if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
return false;
entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
@@ -211,6 +271,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -226,7 +289,8 @@ pgstat_backend_have_pending_cb(void)
return false;
else
return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ sizeof(struct PgStat_BackendPending))
+ || pgstat_backend_wal_have_pending());
}
/*
@@ -261,6 +325,8 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 5d3da4b674e..16a1ecb4d90 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9172e1cb9d2..662ce46cbc2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1609,8 +1609,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1659,6 +1659,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 134b3dd8689..064d47f63a4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5954,6 +5954,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v',
+ proparallel => 'r', prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d..06359b9157d 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,25 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 093e6368dbb..b3c303c98cb 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -832,6 +832,8 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -851,6 +853,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 0a44e14d9f4..ad3f7b7e66a 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -423,6 +423,9 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
-- Test pg_stat_wal (and make a temp table so our temp schema exists)
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal (and make a temp table so our temp schema exists)
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -435,6 +438,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
On Thu, Mar 06, 2025 at 10:33:52AM +0000, Bertrand Drouvot wrote:
Indeed, there is no reason for pgstat_backend_have_pending_cb() to return true if
pgstat_tracks_backend_bktype() is not satisfied.So it deserves a dedicated patch to fix this already existing issue:
0001 attached.
pgstat_backend_have_pending_cb(void)
{
- return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ if (!pgstat_tracks_backend_bktype(MyBackendType))
+ return false;
+ else
+ return (!pg_memory_is_all_zeros(&PendingBackendStats,
+ sizeof(struct PgStat_BackendPending)));
So, if I understand your point correctly, it is not a problem on HEAD
because we are never going to update PendingBackendStats in the
checkpointer as pgstat_count_backend_io_op[_time]() blocks any attempt
to do so. However, it becomes a problem with the WAL portion patch
because of the dependency to pgWalUsage which may be updated by the
checkpointer and the pending callback would happily report true in
this case. It would also become a problem if we add in backend stats
a different portion that depends on something external.
An extra check based on pgstat_tracks_backend_bktype() makes sense in
pgstat_backend_have_pending_cb(), yes, forcing the hand of the stats
reports to not do stupid things in a process where we should not
report stats. Good catch from the sanity check in
pgstat_report_stat(), even if this is only a problem once we begin to
use something else than PendingBackendStats for the pending stats.
This has also the merit of making pgstat_report_stat() do less work.
--
Michael
Hi,
On Fri, Mar 07, 2025 at 02:42:13PM +0900, Michael Paquier wrote:
On Thu, Mar 06, 2025 at 10:33:52AM +0000, Bertrand Drouvot wrote:
Indeed, there is no reason for pgstat_backend_have_pending_cb() to return true if
pgstat_tracks_backend_bktype() is not satisfied.So it deserves a dedicated patch to fix this already existing issue:
0001 attached.pgstat_backend_have_pending_cb(void) { - return (!pg_memory_is_all_zeros(&PendingBackendStats, - sizeof(struct PgStat_BackendPending))); + if (!pgstat_tracks_backend_bktype(MyBackendType)) + return false; + else + return (!pg_memory_is_all_zeros(&PendingBackendStats, + sizeof(struct PgStat_BackendPending)));So, if I understand your point correctly, it is not a problem on HEAD
because we are never going to update PendingBackendStats in the
checkpointer as pgstat_count_backend_io_op[_time]() blocks any attempt
to do so.
I think this is wrong on HEAD because we initialize PendingBackendStats to
zeros in pgstat_create_backend() based on the backend type (pgstat_tracks_backend_bktype()).
But when it's time to flush, then pgstat_backend_have_pending_cb() checks
for zeros in PendingBackendStats *without* any check on the backend type.
I think the issue is "masked" on HEAD because PendingBackendStats is
probably automatically initialized with zeros (as being a static variable at
file scope).
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Fri, Mar 07, 2025 at 08:33:04AM +0000, Bertrand Drouvot wrote:
But when it's time to flush, then pgstat_backend_have_pending_cb() checks
for zeros in PendingBackendStats *without* any check on the backend type.I think the issue is "masked" on HEAD because PendingBackendStats is
probably automatically initialized with zeros (as being a static variable at
file scope).
If this weren't true, we would have a lot of problems in more places
than this one. It does not hurt to add an initialization at the top
of pgstat_backend.c for PendingBackendStats, to document the
intention, while we're on it.
Did both things, and applied the result.
--
Michael
Hi,
On Sat, Mar 08, 2025 at 10:57:38AM +0900, Michael Paquier wrote:
On Fri, Mar 07, 2025 at 08:33:04AM +0000, Bertrand Drouvot wrote:
But when it's time to flush, then pgstat_backend_have_pending_cb() checks
for zeros in PendingBackendStats *without* any check on the backend type.I think the issue is "masked" on HEAD because PendingBackendStats is
probably automatically initialized with zeros (as being a static variable at
file scope).If this weren't true, we would have a lot of problems in more places
than this one.
Yeah I fully agree and I think that was fine. I just added "probably" as
cautious wording, as the "We assume this initializes to zeroes" comments
that have been removed in 07e9e28b56d and 88f5ebbbee3 for example.
It does not hurt to add an initialization at the top
of pgstat_backend.c for PendingBackendStats, to document the
intention, while we're on it.
-static PgStat_BackendPending PendingBackendStats;
+static PgStat_BackendPending PendingBackendStats = {0};
Not sure about this change: I think that that would not always work should
PgStat_BackendPending contains padding. I mean there is no guarantee that would
initialize padding bytes to zeroes (if any).
That would not be an issue should we only access the struct
fields in the code, but that's not the case as we're making use of
pg_memory_is_all_zeros() on it.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
That would not be an issue should we only access the struct
fields in the code, but that's not the case as we're making use of
pg_memory_is_all_zeros() on it.
It does not hurt to keep it as it is, honestly.
I've reviewed the last patch of the series, and noticed a couple of
inconsistent comments across it, and some indentation issue.
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;
if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ sizeof(struct PgStat_BackendPending))
+ && !pgstat_backend_wal_have_pending())
return false;
I have one issue with pgstat_flush_backend() and the early exit check
done here. If for example we use FLUSH_WAL but there is some IO data
pending, we would lock the stats entry for nothing. We could also
return true even if there is no pending WAL data if the lock could not
be taken, which would be incorrect because there was no data to flush
to begin with. I think that this should be adjusted so as we limit
the entry lock depending on the flags given in input, like in the
attached.
Thoughts?
--
Michael
Attachments:
v17-0001-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From 353e6c9ff508f52c476ffe4536f1b8033b3ac996 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 10 Mar 2025 16:45:26 +0900
Subject: [PATCH v17] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 41 +++++-----
src/include/utils/pgstat_internal.h | 3 +-
src/backend/utils/activity/pgstat_backend.c | 87 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 +++++-
src/test/regress/expected/stats.out | 17 +++-
src/test/regress/sql/stats.sql | 9 ++-
doc/src/sgml/monitoring.sgml | 19 +++++
9 files changed, 184 insertions(+), 26 deletions(-)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cede992b6e22..42e427f8fe87 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5973,6 +5973,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d5..def6b370ac11 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,29 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+/* -------
+ * PgStat_Backend Backend statistics
+ * -------
+ */
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558b..d5557e6e998c 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 6efbb650aa8d..a34a1b40060e 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats = {0};
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting the previous
+ * counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static inline bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened for WAL
+ * statistics. In this case, avoid unnecessarily modifying the stats
+ * entry.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -194,12 +253,23 @@ bool
pgstat_flush_backend(bool nowait, bits32 flags)
{
PgStat_EntryRef *entry_ref;
+ bool has_pending_data = false;
if (!pgstat_tracks_backend_bktype(MyBackendType))
return false;
- if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ /* Some IO data pending? */
+ if ((flags & PGSTAT_BACKEND_FLUSH_IO) &&
+ !pg_memory_is_all_zeros(&PendingBackendStats,
+ sizeof(struct PgStat_BackendPending)))
+ has_pending_data = true;
+
+ /* Some WAL data pending? */
+ if ((flags & PGSTAT_BACKEND_FLUSH_WAL) &&
+ pgstat_backend_wal_have_pending())
+ has_pending_data = true;
+
+ if (!has_pending_data)
return false;
entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
@@ -211,6 +281,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -226,7 +299,8 @@ pgstat_backend_have_pending_cb(void)
return false;
return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ sizeof(struct PgStat_BackendPending)) ||
+ pgstat_backend_wal_have_pending());
}
/*
@@ -261,6 +335,13 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ /*
+ * Initialize prevBackendWalUsage with pgWalUsage so that
+ * pgstat_backend_flush_cb() can calculate how much pgWalUsage counters
+ * are increased by subtracting prevBackendWalUsage from pgWalUsage.
+ */
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 5d3da4b674e7..16a1ecb4d90d 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9172e1cb9d23..662ce46cbc20 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1609,8 +1609,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1659,6 +1659,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 30d763c4aee8..f77caacc17dd 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -908,8 +908,11 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
-- Test pg_stat_checkpointer checkpointer-related stats, together with pg_stat_wal
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
--- Test pg_stat_wal (and make a temp table so our temp schema exists)
+-- Test pg_stat_wal
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal()
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+-- Make a temp table so our temp schema exists
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -929,6 +932,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 5e7ef20fef6e..c223800fd19c 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -426,9 +426,13 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
-- Test pg_stat_checkpointer checkpointer-related stats, together with pg_stat_wal
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
--- Test pg_stat_wal (and make a temp table so our temp schema exists)
+-- Test pg_stat_wal
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal()
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
+-- Make a temp table so our temp schema exists
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -441,6 +445,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8d..b1710680705c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4866,6 +4866,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
--
2.47.2
Hi,
On Mon, Mar 10, 2025 at 04:46:53PM +0900, Michael Paquier wrote:
On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
That would not be an issue should we only access the struct
fields in the code, but that's not the case as we're making use of
pg_memory_is_all_zeros() on it.It does not hurt to keep it as it is, honestly.
I believe that's worse than before actually. Before padding bytes would "probably"
be set to zeros while now it's certainly not always the case. I think that
we already removed this (see comments === 4 in [1]/messages/by-id/Z44vMD/rALy8pfVE@ip-10-97-1-34.eu-west-3.compute.internal).
I've reviewed the last patch of the series
Thanks!
and noticed a couple of
inconsistent comments across it, and some indentation issue.
I think I ran pgindent though. Anyway, thanks for fixing those!
@@ -199,7 +258,8 @@ pgstat_flush_backend(bool nowait, bits32 flags)
return false;if (pg_memory_is_all_zeros(&PendingBackendStats, - sizeof(struct PgStat_BackendPending))) + sizeof(struct PgStat_BackendPending)) + && !pgstat_backend_wal_have_pending()) return false;I have one issue with pgstat_flush_backend() and the early exit check
done here. If for example we use FLUSH_WAL but there is some IO data
pending, we would lock the stats entry for nothing. We could also
return true even if there is no pending WAL data if the lock could not
be taken, which would be incorrect because there was no data to flush
to begin with. I think that this should be adjusted so as we limit
the entry lock depending on the flags given in input, like in the
attached.
Yeah I agree this needs to be improved, thanks!
+ /* Some IO data pending? */
+ if ((flags & PGSTAT_BACKEND_FLUSH_IO) &&
+ !pg_memory_is_all_zeros(&PendingBackendStats,
+ sizeof(struct PgStat_BackendPending)))
+ has_pending_data = true;
if PgStat_BackendPending contains more than pending_io in the future, then
that would check for zeros in a too large memory region.
I think it's better to check for:
if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
sizeof(struct PgStat_PendingIO)))
like in the attached. Or check on "backend_has_iostats" (if 0002 in [2]/messages/by-id/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal goes in).
+ /* Some WAL data pending? */
+ if ((flags & PGSTAT_BACKEND_FLUSH_WAL) &&
+ pgstat_backend_wal_have_pending())
+ has_pending_data = true;
I think we can use "else if" here (done in the attached) as it's not needed if
has_pending_data is already set to true.
That's the only 2 changes done in the attached as compared to the previous
version.
Regards,
[1]: /messages/by-id/Z44vMD/rALy8pfVE@ip-10-97-1-34.eu-west-3.compute.internal
[2]: /messages/by-id/Z8WYf1jyy4MwOveQ@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v18-0001-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload
From d93548d9b14dfba7c6ef3c8544c441ec29c12361 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Mon, 10 Mar 2025 16:45:26 +0900
Subject: [PATCH v18] per backend WAL statistics
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics) we can more easily add per backend statistics.
This commit adds per backend WAL statistics using the same layer as pg_stat_wal,
except that it is now possible to know how much WAL activity is happening in each
backend rather than an overall aggregate of all the activity. A function called
pg_stat_get_backend_wal() is added to access this data depending on the
PID of a backend.
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
XXX: bump catalog version
XXX: bump of stats file format not required, as backend stats do not
persist on disk.
---
doc/src/sgml/monitoring.sgml | 19 +++++
src/backend/utils/activity/pgstat_backend.c | 86 ++++++++++++++++++++-
src/backend/utils/activity/pgstat_wal.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 26 ++++++-
src/include/catalog/pg_proc.dat | 7 ++
src/include/pgstat.h | 41 +++++-----
src/include/utils/pgstat_internal.h | 3 +-
src/test/regress/expected/stats.out | 17 +++-
src/test/regress/sql/stats.sql | 9 ++-
9 files changed, 183 insertions(+), 26 deletions(-)
13.0% doc/src/sgml/
46.2% src/backend/utils/activity/
12.8% src/backend/utils/adt/
7.2% src/include/catalog/
3.7% src/include/utils/
8.4% src/test/regress/expected/
6.8% src/test/regress/sql/
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 16646f560e8..b1710680705 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4866,6 +4866,25 @@ description | Waiting for a newly initialized WAL file to reach durable storage
</para></entry>
</row>
+ <row>
+ <entry id="pg-stat-get-backend-wal" role="func_table_entry"><para role="func_signature">
+ <indexterm>
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> ( <type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
+ <para>
+ Returns WAL statistics about the backend with the specified
+ process ID. The output fields are exactly the same as the ones in the
+ <structname>pg_stat_wal</structname> view.
+ </para>
+ <para>
+ The function does not return WAL statistics for the checkpointer,
+ the background writer, the startup process and the autovacuum launcher.
+ </para></entry>
+ </row>
+
<row>
<entry role="func_table_entry"><para role="func_signature">
<indexterm>
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 6efbb650aa8..f414a48c242 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -38,6 +38,14 @@
*/
static PgStat_BackendPending PendingBackendStats = {0};
+/*
+ * WAL usage counters saved from pgWalUsage at the previous call to
+ * pgstat_report_wal(). This is used to calculate how much WAL usage
+ * happens between pgstat_report_wal() calls, by subtracting the previous
+ * counters from the current ones.
+ */
+static WalUsage prevBackendWalUsage;
+
/*
* Utility routines to report I/O stats for backends, kept here to avoid
* exposing PendingBackendStats to the outside world.
@@ -184,6 +192,57 @@ pgstat_flush_backend_entry_io(PgStat_EntryRef *entry_ref)
MemSet(&PendingBackendStats.pending_io, 0, sizeof(PgStat_PendingIO));
}
+/*
+ * To determine whether WAL usage happened.
+ */
+static inline bool
+pgstat_backend_wal_have_pending(void)
+{
+ return pgWalUsage.wal_records != prevBackendWalUsage.wal_records;
+}
+
+/*
+ * Flush out locally pending backend WAL statistics. Locking is managed
+ * by the caller.
+ */
+static void
+pgstat_flush_backend_entry_wal(PgStat_EntryRef *entry_ref)
+{
+ PgStatShared_Backend *shbackendent;
+ PgStat_WalCounters *bktype_shstats;
+ WalUsage wal_usage_diff = {0};
+
+ /*
+ * This function can be called even if nothing at all has happened for WAL
+ * statistics. In this case, avoid unnecessarily modifying the stats
+ * entry.
+ */
+ if (!pgstat_backend_wal_have_pending())
+ return;
+
+ shbackendent = (PgStatShared_Backend *) entry_ref->shared_stats;
+ bktype_shstats = &shbackendent->stats.wal_counters;
+
+ /*
+ * Calculate how much WAL usage counters were increased by subtracting the
+ * previous counters from the current ones.
+ */
+ WalUsageAccumDiff(&wal_usage_diff, &pgWalUsage, &prevBackendWalUsage);
+
+#define WALSTAT_ACC(fld, var_to_add) \
+ (bktype_shstats->fld += var_to_add.fld)
+ WALSTAT_ACC(wal_buffers_full, wal_usage_diff);
+ WALSTAT_ACC(wal_records, wal_usage_diff);
+ WALSTAT_ACC(wal_fpi, wal_usage_diff);
+ WALSTAT_ACC(wal_bytes, wal_usage_diff);
+#undef WALSTAT_ACC
+
+ /*
+ * Save the current counters for the subsequent calculation of WAL usage.
+ */
+ prevBackendWalUsage = pgWalUsage;
+}
+
/*
* Flush out locally pending backend statistics
*
@@ -194,12 +253,22 @@ bool
pgstat_flush_backend(bool nowait, bits32 flags)
{
PgStat_EntryRef *entry_ref;
+ bool has_pending_data = false;
if (!pgstat_tracks_backend_bktype(MyBackendType))
return false;
- if (pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)))
+ /* Some IO data pending? */
+ if ((flags & PGSTAT_BACKEND_FLUSH_IO) &&
+ !pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
+ sizeof(struct PgStat_PendingIO)))
+ has_pending_data = true;
+ /* Some WAL data pending? */
+ else if ((flags & PGSTAT_BACKEND_FLUSH_WAL) &&
+ pgstat_backend_wal_have_pending())
+ has_pending_data = true;
+
+ if (!has_pending_data)
return false;
entry_ref = pgstat_get_entry_ref_locked(PGSTAT_KIND_BACKEND, InvalidOid,
@@ -211,6 +280,9 @@ pgstat_flush_backend(bool nowait, bits32 flags)
if (flags & PGSTAT_BACKEND_FLUSH_IO)
pgstat_flush_backend_entry_io(entry_ref);
+ if (flags & PGSTAT_BACKEND_FLUSH_WAL)
+ pgstat_flush_backend_entry_wal(entry_ref);
+
pgstat_unlock_entry(entry_ref);
return false;
@@ -226,7 +298,8 @@ pgstat_backend_have_pending_cb(void)
return false;
return (!pg_memory_is_all_zeros(&PendingBackendStats,
- sizeof(struct PgStat_BackendPending)));
+ sizeof(struct PgStat_BackendPending)) ||
+ pgstat_backend_wal_have_pending());
}
/*
@@ -261,6 +334,13 @@ pgstat_create_backend(ProcNumber procnum)
pgstat_unlock_entry(entry_ref);
MemSet(&PendingBackendStats, 0, sizeof(PgStat_BackendPending));
+
+ /*
+ * Initialize prevBackendWalUsage with pgWalUsage so that
+ * pgstat_backend_flush_cb() can calculate how much pgWalUsage counters
+ * are increased by subtracting prevBackendWalUsage from pgWalUsage.
+ */
+ prevBackendWalUsage = pgWalUsage;
}
/*
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 5d3da4b674e..16a1ecb4d90 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -52,6 +52,7 @@ pgstat_report_wal(bool force)
/* flush wal stats */
(void) pgstat_wal_flush_cb(nowait);
+ pgstat_flush_backend(nowait, PGSTAT_BACKEND_FLUSH_WAL);
/* flush IO stats */
pgstat_flush_io(nowait);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 9172e1cb9d2..662ce46cbc2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1609,8 +1609,8 @@ pg_stat_get_backend_io(PG_FUNCTION_ARGS)
/*
* pg_stat_wal_build_tuple
*
- * Helper routine for pg_stat_get_wal() returning one tuple based on the
- * contents of wal_counters.
+ * Helper routine for pg_stat_get_wal() and pg_stat_get_backend_wal()
+ * returning one tuple based on the contents of wal_counters.
*/
static Datum
pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
@@ -1659,6 +1659,28 @@ pg_stat_wal_build_tuple(PgStat_WalCounters wal_counters,
PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values, nulls)));
}
+/*
+ * Returns WAL statistics for a backend with given PID.
+ */
+Datum
+pg_stat_get_backend_wal(PG_FUNCTION_ARGS)
+{
+ int pid;
+ PgStat_Backend *backend_stats;
+ PgStat_WalCounters bktype_stats;
+
+ pid = PG_GETARG_INT32(0);
+ backend_stats = pgstat_fetch_stat_backend_by_pid(pid, NULL);
+
+ if (!backend_stats)
+ PG_RETURN_NULL();
+
+ bktype_stats = backend_stats->wal_counters;
+
+ /* save tuples with data from this PgStat_WalCounters */
+ return (pg_stat_wal_build_tuple(bktype_stats, backend_stats->stat_reset_timestamp));
+}
+
/*
* Returns statistics of WAL activity
*/
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cede992b6e2..42e427f8fe8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5973,6 +5973,13 @@
proargmodes => '{o,o,o,o,o}',
proargnames => '{wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
prosrc => 'pg_stat_get_wal' },
+{ oid => '8037', descr => 'statistics: backend WAL activity',
+ proname => 'pg_stat_get_backend_wal', provolatile => 'v', proparallel => 'r',
+ prorettype => 'record', proargtypes => 'int4',
+ proallargtypes => '{int4,int8,int8,numeric,int8,timestamptz}',
+ proargmodes => '{i,o,o,o,o,o}',
+ proargnames => '{backend_pid,wal_records,wal_fpi,wal_bytes,wal_buffers_full,stats_reset}',
+ prosrc => 'pg_stat_get_backend_wal' },
{ oid => '6248', descr => 'statistics: information about WAL prefetching',
proname => 'pg_stat_get_recovery_prefetch', prorows => '1', proretset => 't',
provolatile => 'v', prorettype => 'record', proargtypes => '',
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 4aad10b0b6d..def6b370ac1 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -340,24 +340,6 @@ typedef struct PgStat_IO
PgStat_BktypeIO stats[BACKEND_NUM_TYPES];
} PgStat_IO;
-typedef struct PgStat_Backend
-{
- TimestampTz stat_reset_timestamp;
- PgStat_BktypeIO io_stats;
-} PgStat_Backend;
-
-/* ---------
- * PgStat_BackendPending Non-flushed backend stats.
- * ---------
- */
-typedef struct PgStat_BackendPending
-{
- /*
- * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
- */
- PgStat_PendingIO pending_io;
-} PgStat_BackendPending;
-
typedef struct PgStat_StatDBEntry
{
PgStat_Counter xact_commit;
@@ -500,6 +482,29 @@ typedef struct PgStat_WalStats
TimestampTz stat_reset_timestamp;
} PgStat_WalStats;
+/* -------
+ * PgStat_Backend Backend statistics
+ * -------
+ */
+typedef struct PgStat_Backend
+{
+ TimestampTz stat_reset_timestamp;
+ PgStat_BktypeIO io_stats;
+ PgStat_WalCounters wal_counters;
+} PgStat_Backend;
+
+/* ---------
+ * PgStat_BackendPending Non-flushed backend stats.
+ * ---------
+ */
+typedef struct PgStat_BackendPending
+{
+ /*
+ * Backend statistics store the same amount of IO data as PGSTAT_KIND_IO.
+ */
+ PgStat_PendingIO pending_io;
+} PgStat_BackendPending;
+
/*
* Functions in pgstat.c
*/
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 36d228e3558..d5557e6e998 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -622,7 +622,8 @@ extern void pgstat_archiver_snapshot_cb(void);
/* flags for pgstat_flush_backend() */
#define PGSTAT_BACKEND_FLUSH_IO (1 << 0) /* Flush I/O statistics */
-#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO)
+#define PGSTAT_BACKEND_FLUSH_WAL (1 << 1) /* Flush WAL statistics */
+#define PGSTAT_BACKEND_FLUSH_ALL (PGSTAT_BACKEND_FLUSH_IO | PGSTAT_BACKEND_FLUSH_WAL)
extern bool pgstat_flush_backend(bool nowait, bits32 flags);
extern bool pgstat_backend_flush_cb(bool nowait);
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index 30d763c4aee..f77caacc17d 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -908,8 +908,11 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
-- Test pg_stat_checkpointer checkpointer-related stats, together with pg_stat_wal
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
--- Test pg_stat_wal (and make a temp table so our temp schema exists)
+-- Test pg_stat_wal
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal()
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+-- Make a temp table so our temp schema exists
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
-- Checkpoint twice: The checkpointer reports stats after reporting completion
@@ -929,6 +932,18 @@ SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
t
(1 row)
+SELECT pg_stat_force_next_flush();
+ pg_stat_force_next_flush
+--------------------------
+
+(1 row)
+
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+ ?column?
+----------
+ t
+(1 row)
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 5e7ef20fef6..c223800fd19 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -426,9 +426,13 @@ SELECT sessions > :db_stat_sessions FROM pg_stat_database WHERE datname = (SELEC
-- Test pg_stat_checkpointer checkpointer-related stats, together with pg_stat_wal
SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
--- Test pg_stat_wal (and make a temp table so our temp schema exists)
+-- Test pg_stat_wal
SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
+-- Test pg_stat_get_backend_wal()
+SELECT wal_bytes AS backend_wal_bytes_before from pg_stat_get_backend_wal(pg_backend_pid()) \gset
+
+-- Make a temp table so our temp schema exists
CREATE TEMP TABLE test_stats_temp AS SELECT 17;
DROP TABLE test_stats_temp;
@@ -441,6 +445,9 @@ CHECKPOINT;
SELECT num_requested > :rqst_ckpts_before FROM pg_stat_checkpointer;
SELECT wal_bytes > :wal_bytes_before FROM pg_stat_wal;
+SELECT pg_stat_force_next_flush();
+SELECT wal_bytes > :backend_wal_bytes_before FROM pg_stat_get_backend_wal(pg_backend_pid());
+
-- Test pg_stat_get_backend_idset() and some allied functions.
-- In particular, verify that their notion of backend ID matches
-- our temp schema index.
--
2.34.1
Hi,
Thank you for working on this!
I just started reading the code and have a couple of questions.
I think that every time we flush IO or WAL stats, we want(?) to flush
backend stats as well, so would it make sense to move
pgstat_flush_backend() calls to inside of pgstat_flush_io() and
pgstat_wal_flush_cb()? I see that backend statistics are not collected
for some of the backend types but that is already checked in the
pgstat_flush_backend() with pgstat_tracks_backend_bktype().
Also, is there a chance that wal_bytes gets incremented without
wal_records getting incremented? I searched the code and did not find
any example of that but I just wanted to be sure. If there is a case
like that, then pgstat_backend_wal_have_pending() needs to check
wal_bytes instead of wal_records.
--
Regards,
Nazir Bilal Yavuz
Microsoft
Hi,
On Mon, Mar 10, 2025 at 03:08:49PM +0300, Nazir Bilal Yavuz wrote:
Hi,
Thank you for working on this!
I just started reading the code and have a couple of questions.
Thanks for looking at it!
I think that every time we flush IO or WAL stats, we want(?) to flush
backend stats as well,
Yeah, I think that's happening anyway.
so would it make sense to move
pgstat_flush_backend() calls to inside of pgstat_flush_io() and
pgstat_wal_flush_cb()?
I don't think so because pgstat_flush_backend() still needs to be called by the
pgstat_backend_flush_cb() (i.e flush_static_cb) callback (I mean I think this
makes sense to keep this callback around and that it does "really" something).
So for example, for the WAL case, that would mean the backend WAL stats would be
flushed twice: one time from pgstat_wal_flush_cb() (i.e flush_static_cb) callback
and one time from the pgstat_backend_flush_cb() (another flush_static_cb) callback.
I think it's better to keep them separate and reason as they are distinct
types of stats (which they really are). I think we had the same kind of reasoning
while working on [1]/messages/by-id/Z0QjeIkwC0HNI16K@ip-10-97-1-34.eu-west-3.compute.internal.
I see that backend statistics are not collected
for some of the backend types but that is already checked in the
pgstat_flush_backend() with pgstat_tracks_backend_bktype().
Sorry, I don't get it. Do you have a question around that?
Also, is there a chance that wal_bytes gets incremented without
wal_records getting incremented? I searched the code and did not find
any example of that but I just wanted to be sure. If there is a case
like that, then pgstat_backend_wal_have_pending() needs to check
wal_bytes instead of wal_records.
I think that's fine. That's also how pgstat_wal_have_pending_cb() has been
re-factored in 2421e9a51d2.
[1]: /messages/by-id/Z0QjeIkwC0HNI16K@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Mon, 10 Mar 2025 at 17:43, Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
On Mon, Mar 10, 2025 at 03:08:49PM +0300, Nazir Bilal Yavuz wrote:
I think that every time we flush IO or WAL stats, we want(?) to flush
backend stats as well,Yeah, I think that's happening anyway.
so would it make sense to move
pgstat_flush_backend() calls to inside of pgstat_flush_io() and
pgstat_wal_flush_cb()?I don't think so because pgstat_flush_backend() still needs to be called by the
pgstat_backend_flush_cb() (i.e flush_static_cb) callback (I mean I think this
makes sense to keep this callback around and that it does "really" something).So for example, for the WAL case, that would mean the backend WAL stats would be
flushed twice: one time from pgstat_wal_flush_cb() (i.e flush_static_cb) callback
and one time from the pgstat_backend_flush_cb() (another flush_static_cb) callback.I think it's better to keep them separate and reason as they are distinct
types of stats (which they really are). I think we had the same kind of reasoning
while working on [1].
I got it, that makes sense. Thanks for the explanation.
I see that backend statistics are not collected
for some of the backend types but that is already checked in the
pgstat_flush_backend() with pgstat_tracks_backend_bktype().Sorry, I don't get it. Do you have a question around that?
Sorry for the confusion. I did not think of the explanation that you
did above. I was thinking that we do not want to call
pgstat_flush_backend(.., PGSTAT_BACKEND_FLUSH_IO) for the (let's say)
checkpointer as its backend statistics are not collected and that is
the reason why we do not want to put pgstat_flush_backend() inside of
pgstat_flush_io(). Your explanation made it clear now, no other
questions.
--
Regards,
Nazir Bilal Yavuz
Microsoft
On Mon, Mar 10, 2025 at 11:52:26AM +0000, Bertrand Drouvot wrote:
Hi,
On Mon, Mar 10, 2025 at 04:46:53PM +0900, Michael Paquier wrote:
On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
That would not be an issue should we only access the struct
fields in the code, but that's not the case as we're making use of
pg_memory_is_all_zeros() on it.It does not hurt to keep it as it is, honestly.
I believe that's worse than before actually. Before padding bytes would "probably"
be set to zeros while now it's certainly not always the case. I think that
we already removed this (see comments === 4 in [1]).
We still apply the memset(), and the initialization is actually the
same.
I think it's better to check for:
if (pg_memory_is_all_zeros(&PendingBackendStats.pending_io,
sizeof(struct PgStat_PendingIO)))like in the attached. Or check on "backend_has_iostats" (if 0002 in [2] goes in).
Yes, restricting this check to apply on the PgStat_PendingIO makes
sense.
I think we can use "else if" here (done in the attached) as it's not needed if
has_pending_data is already set to true.
Still the blocks with the comments become a bit weird if formulated
this way. Kept this one the same as v17.
And I guess that we're OK here, so applied. That should be the last
one.
--
Michael
Hi,
On Tue, Mar 11, 2025 at 09:06:27AM +0900, Michael Paquier wrote:
On Mon, Mar 10, 2025 at 11:52:26AM +0000, Bertrand Drouvot wrote:
Hi,
On Mon, Mar 10, 2025 at 04:46:53PM +0900, Michael Paquier wrote:
On Sat, Mar 08, 2025 at 07:53:04AM +0000, Bertrand Drouvot wrote:
That would not be an issue should we only access the struct
fields in the code, but that's not the case as we're making use of
pg_memory_is_all_zeros() on it.It does not hurt to keep it as it is, honestly.
I believe that's worse than before actually. Before padding bytes would "probably"
be set to zeros while now it's certainly not always the case. I think that
we already removed this (see comments === 4 in [1]).We still apply the memset(), and the initialization is actually the
same.
Yeah currently there is no issues: there is no padding in the struct and memset()
is done.
That said, memset() is done only if pgstat_tracks_backend_bktype() returns
true (i.e if pgstat_create_backend() is called).
That means that if, in the future, the struct is modified in such a way that
padding is added, then we could end up with non zeros padding bytes for the
backends for which pgstat_tracks_backend_bktype() returns false.
I think that could lead to racy conditions (even if, for the moment, I think that
all is fine as the other pgstat_tracks_backend_bktype() calls should protect us).
And I guess that we're OK here,
Yup.
so applied.
Thanks!
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Michael Paquier <michael@paquier.xyz> writes:
And I guess that we're OK here, so applied. That should be the last
one.
Quite a few buildfarm members are not happy about the initialization
that 9a8dd2c5a added to PendingBackendStats. For instance [1]https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2025-03-11%2004%3A59%3A16&stg=build:
gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -I. -I. -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pgstat_backend.o pgstat_backend.c
pgstat_backend.c:39:1: warning: missing braces around initializer [-Wmissing-braces]
static PgStat_BackendPending PendingBackendStats = {0};
^
pgstat_backend.c:39:1: warning: (near initialization for \342\200\230PendingBackendStats.pending_io\342\200\231) [-Wmissing-braces]
I guess that more than one level of braces is needed for this to
be fully correct? Similar from ayu, batfish, boa, buri, demoiselle,
dhole, rhinoceros, shelduck, siskin.
regards, tom lane
Hi,
On Tue, Mar 11, 2025 at 11:14:24PM -0400, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
And I guess that we're OK here, so applied. That should be the last
one.Quite a few buildfarm members are not happy about the initialization
that 9a8dd2c5a added to PendingBackendStats. For instance [1]:gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -I. -I. -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pgstat_backend.o pgstat_backend.c
pgstat_backend.c:39:1: warning: missing braces around initializer [-Wmissing-braces]
static PgStat_BackendPending PendingBackendStats = {0};
^
pgstat_backend.c:39:1: warning: (near initialization for \342\200\230PendingBackendStats.pending_io\342\200\231) [-Wmissing-braces]I guess that more than one level of braces is needed for this to
be fully correct?
Thanks for the report! I think that it's better to remove the PendingBackendStats
initializer (instead of adding extra braces). The reason is that I'm concerned
about padding bytes (that could be added to the struct in the future) not being
zeroed (see [1]/messages/by-id/Z8/W73+HVo+/pKHZ@ip-10-97-1-34.eu-west-3.compute.internal).
Done that way in the attached.
[1]: /messages/by-id/Z8/W73+HVo+/pKHZ@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Remove-the-PendingBackendStats-initializer.patchtext/x-diff; charset=us-asciiDownload
From d1178e65b5070f539677b1134224d6ca81fd0bbf Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Wed, 12 Mar 2025 05:14:21 +0000
Subject: [PATCH v1] Remove the PendingBackendStats initializer
9a8dd2c5a6d added an initializer to PendingBackendStats, but on some GCC versions
-Werror=missing-braces would report a warning. Instead of adding extra braces,
let's remove the initialization that would prevent padding bytes from being
zeroed.
Per buildfarm members ayu, batfish, boa, buri, demoiselle, dhole, rhinoceros,
shelduck and siskin.
---
src/backend/utils/activity/pgstat_backend.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
100.0% src/backend/utils/activity/
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index da4c7edd772..a8cb54a7732 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -36,7 +36,7 @@
* reported within critical sections so we use static memory in order to avoid
* memory allocation.
*/
-static PgStat_BackendPending PendingBackendStats = {0};
+static PgStat_BackendPending PendingBackendStats;
/*
* WAL usage counters saved from pgWalUsage at the previous call to
--
2.34.1
On Wed, Mar 12, 2025 at 05:37:11AM +0000, Bertrand Drouvot wrote:
Thanks for the report! I think that it's better to remove the PendingBackendStats
initializer (instead of adding extra braces). The reason is that I'm concerned
about padding bytes (that could be added to the struct in the future) not being
zeroed (see [1]).
Okay. I am going to remove this initialization in a couple of minutes
as that's more annoying than I thought.
--
Michael
Hello Michael,
11.03.2025 02:06, Michael Paquier wrote:
And I guess that we're OK here, so applied. That should be the last
one.
Please try the following query:
BEGIN;
SET LOCAL stats_fetch_consistency = snapshot;
SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());
with sanitizers (or under Valgrind). When I run it, I get:
2025-03-28 18:38:08.259 UTC [3415399] LOG: statement: SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());
=================================================================
==3415399==ERROR: AddressSanitizer: heap-use-after-free on address 0x53100003c83c at pc 0x556e3d2d9967 bp 0x7ffda3cd2350
sp 0x7ffda3cd2340
READ of size 4 at 0x53100003c83c thread T0
#0 0x556e3d2d9966 in pgstat_fetch_stat_backend_by_pid .../src/backend/utils/activity/pgstat_backend.c:136
#1 0x556e3d53b671 in pg_stat_get_backend_wal .../src/backend/utils/adt/pgstatfuncs.c:1673
#2 0x556e3cb14045 in ExecMakeTableFunctionResult .../src/backend/executor/execSRF.c:234
#3 0x556e3cb6c0fd in FunctionNext .../src/backend/executor/nodeFunctionscan.c:94
#4 0x556e3cb171d2 in ExecScanFetch ../../../src/include/executor/execScan.h:126
#5 0x556e3cb171d2 in ExecScanExtended ../../../src/include/executor/execScan.h:170
#6 0x556e3cb171d2 in ExecScan .../src/backend/executor/execScan.c:59
#7 0x556e3cb6bbf7 in ExecFunctionScan .../src/backend/executor/nodeFunctionscan.c:269
#8 0x556e3cb0aba9 in ExecProcNodeFirst .../src/backend/executor/execProcnode.c:469
...
Reproduced starting from 76def4cdd.
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
On Fri, Mar 28, 2025 at 09:00:00PM +0200, Alexander Lakhin wrote:
Please try the following query:
BEGIN;
SET LOCAL stats_fetch_consistency = snapshot;
SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());with sanitizers (or under Valgrind). When I run it, I get:
2025-03-28 18:38:08.259 UTC [3415399] LOG: statement: SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());
=================================================================
==3415399==ERROR: AddressSanitizer: heap-use-after-free on address
0x53100003c83c at pc 0x556e3d2d9967 bp 0x7ffda3cd2350 sp 0x7ffda3cd2340
READ of size 4 at 0x53100003c83c thread T0
#0 0x556e3d2d9966 in pgstat_fetch_stat_backend_by_pid .../src/backend/utils/activity/pgstat_backend.c:136
#1 0x556e3d53b671 in pg_stat_get_backend_wal .../src/backend/utils/adt/pgstatfuncs.c:1673
#2 0x556e3cb14045 in ExecMakeTableFunctionResult .../src/backend/executor/execSRF.c:234
#3 0x556e3cb6c0fd in FunctionNext .../src/backend/executor/nodeFunctionscan.c:94
#4 0x556e3cb171d2 in ExecScanFetch ../../../src/include/executor/execScan.h:126
#5 0x556e3cb171d2 in ExecScanExtended ../../../src/include/executor/execScan.h:170
#6 0x556e3cb171d2 in ExecScan .../src/backend/executor/execScan.c:59
#7 0x556e3cb6bbf7 in ExecFunctionScan .../src/backend/executor/nodeFunctionscan.c:269
#8 0x556e3cb0aba9 in ExecProcNodeFirst .../src/backend/executor/execProcnode.c:469
...Reproduced starting from 76def4cdd.
Thanks for the report. I have added an open item to not lose track of
this issue, and will get back to it when I can.
--
Michael
Hi,
On Sat, Mar 29, 2025 at 07:14:16AM +0900, Michael Paquier wrote:
On Fri, Mar 28, 2025 at 09:00:00PM +0200, Alexander Lakhin wrote:
Please try the following query:
BEGIN;
SET LOCAL stats_fetch_consistency = snapshot;
SELECT * FROM pg_stat_get_backend_wal(pg_backend_pid());
Thanks for the report! I'm able to reproduce it on my side. The issue can
also be triggered with pg_stat_get_backend_io().
The issue is that in pgstat_fetch_stat_backend_by_pid() (and with
stats_fetch_consistency set to snapshot) a call to
pgstat_clear_backend_activity_snapshot() is done:
#0 __memset_evex_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:250
#1 0x0000000001833bf2 in wipe_mem (ptr=0x632000018800, size=80800) at ../../../../src/include/utils/memdebug.h:42
#2 0x0000000001834c51 in AllocSetReset (context=0x619000003c80) at aset.c:586
#3 0x000000000184f32d in MemoryContextResetOnly (context=0x619000003c80) at mcxt.c:419
#4 0x0000000001834ede in AllocSetDelete (context=0x619000003c80) at aset.c:636
#5 0x000000000184f79b in MemoryContextDeleteOnly (context=0x619000003c80) at mcxt.c:528
#6 0x000000000184f5a9 in MemoryContextDelete (context=0x619000003c80) at mcxt.c:482
#7 0x0000000001361e84 in pgstat_clear_backend_activity_snapshot () at backend_status.c:541
#8 0x0000000001367f08 in pgstat_clear_snapshot () at pgstat.c:943
#9 0x0000000001368ac3 in pgstat_prep_snapshot () at pgstat.c:1121
#10 0x00000000013680b9 in pgstat_fetch_entry (kind=6, dboid=0, objid=0) at pgstat.c:961
#11 0x000000000136dd05 in pgstat_fetch_stat_backend (procNumber=0) at pgstat_backend.c:94
#12 0x000000000136de7d in pgstat_fetch_stat_backend_by_pid (pid=3294022, bktype=0x0) at pgstat_backend.c:136
*before* we check for "beentry->st_procpid != pid".
I think we can simply move the pgstat_fetch_stat_backend() call at the end
of pgstat_fetch_stat_backend_by_pid(), like in the attached. With this in place
the issue is fixed on my side.
Thoughts?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v1-0001-Fix-heap-use-after-free-in-pgstat_fetch_stat_back.patchtext/x-diff; charset=us-asciiDownload
From 1605f513ad691b463baacc00e3c305655525ea07 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Date: Mon, 31 Mar 2025 07:02:34 +0000
Subject: [PATCH v1] Fix heap-use-after-free in
pgstat_fetch_stat_backend_by_pid()
With stats_fetch_consistency set to snapshot the beentry is reset during
the pgstat_fetch_stat_backend() call. So moving this call at the end of
pgstat_fetch_stat_backend_by_pid().
Reported-by: Alexander Lakhin <exclusion@gmail.com>
---
src/backend/utils/activity/pgstat_backend.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
100.0% src/backend/utils/activity/
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 187c5c76e1e..ec95c302af8 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -133,10 +133,6 @@ pgstat_fetch_stat_backend_by_pid(int pid, BackendType *bktype)
if (!pgstat_tracks_backend_bktype(beentry->st_backendType))
return NULL;
- backend_stats = pgstat_fetch_stat_backend(procNumber);
- if (!backend_stats)
- return NULL;
-
/* if PID does not match, leave */
if (beentry->st_procpid != pid)
return NULL;
@@ -144,6 +140,10 @@ pgstat_fetch_stat_backend_by_pid(int pid, BackendType *bktype)
if (bktype)
*bktype = beentry->st_backendType;
+ backend_stats = pgstat_fetch_stat_backend(procNumber);
+ if (!backend_stats)
+ return NULL;
+
return backend_stats;
}
--
2.34.1
On Mar 31, 2025, at 16:42, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:
I think we can simply move the pgstat_fetch_stat_backend() call at the end
of pgstat_fetch_stat_backend_by_pid(), like in the attached. With this in place
the issue is fixed on my side.
Thanks for the patch, I’ll check all that next week.
--
Michael
On Mon, Mar 31, 2025 at 07:42:19AM +0000, Bertrand Drouvot wrote:
I think we can simply move the pgstat_fetch_stat_backend() call at the end
of pgstat_fetch_stat_backend_by_pid(), like in the attached. With this in place
the issue is fixed on my side.Thoughts?
Confirmed. I agree that it is simpler to move all the accesses to
beentry before attempting to retrieve the pgstats entry.
One thing that itched me a bit in the patch is that we would set
bktype even if we don't have a pgstats entry. The two callers of
pgstat_fetch_stat_backend_by_pid() return tuples full of NULLs and
zeros in this case, discarding the backend type automatically, but
let's keep the API consistent and set the value to B_INVALID if
pgstat_fetch_stat_backend() returns NULL.
I have added a comment warning about not accessing beentry when
fetching the backend pgstats entry, and applied the result. Thanks
for the report, Alexander, and for the patch, Bertrand.
--
Michael