Custom pgstat support performance regression for simple queries

Started by Andres Freund6 months ago14 messages
#1Andres Freund
andres@anarazel.de

Hi,

I wanted to run some performance tests for [1]/messages/by-id/aGKSzFlpQWSh/+2w@ip-10-97-1-34.eu-west-3.compute.internal and looked at the profile of a
workload with a lot of short queries. And was rather surprised to see
pgstat_report_stat() to be the top entry - that certainly didn't use to be the
case.

For the, obviously rather extreme, workload of a pgbench session just running
SELECT 1;
I see about a 6% regression from 17.

That seems too high, even for such an absurd workload.

It's one thing to have a loop over all potential stats kinds when we know
there are pending stats, that's not that frequent. But fc415edf8ca8 changed a
test consisting out of 3 checks of a variable and one direct function call to
a loop over an array with 256 elements, with a number of indirect function
calls - all of that happens *before* pgstat_report_stat() decides that there
are no stats to report.

It seems rather unsurprising that that causes a slowdown.

The pre-check is there to:
/* Don't expend a clock check if nothing to do */

but you made it way more expensive than a clock check would have been (not
counting old vmware installs or such, where clock checks take ages).

If I change the loop count to only be the builtin stats kinds, the overhead
goes away almost completely.

Greetings,

Andres Freund

[1]: /messages/by-id/aGKSzFlpQWSh/+2w@ip-10-97-1-34.eu-west-3.compute.internal

#2Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#1)
Re: Custom pgstat support performance regression for simple queries

On Tue, Jul 22, 2025 at 10:57:06AM -0400, Andres Freund wrote:

It seems rather unsurprising that that causes a slowdown.

The pre-check is there to:
/* Don't expend a clock check if nothing to do */

but you made it way more expensive than a clock check would have been (not
counting old vmware installs or such, where clock checks take ages).

If I change the loop count to only be the builtin stats kinds, the overhead
goes away almost completely.

Noted. I was wondering originally if the threshold for the builtin
and custom kinds should be lowered, as well. Perhaps that's just too
optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN
to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a
strict policy here, but these would make the whole cheaper to begin
with, with enough margin for any new stats kinds.

Then, about the loop used here, I'd be OK to keep the past shortcuts
for the builtin fixed-sized stats kinds with the requirement that new
builtin stats kinds need to hack into pgstat_report_stat() themselves
on efficiency grounds, combined with a second loop for custom kinds,
taken only if pgstat_register_kind() has been used by tracking if
that's the case in a flag. Do you have a specific preference here?
--
Michael

#3Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#2)
Re: Custom pgstat support performance regression for simple queries

Hi,

On Wed, Jul 23, 2025 at 09:54:12AM +0900, Michael Paquier wrote:

Then, about the loop used here, I'd be OK to keep the past shortcuts
for the builtin fixed-sized stats kinds with the requirement that new
builtin stats kinds need to hack into pgstat_report_stat() themselves
on efficiency grounds

Maybe we could use a flag, say:

#define PGSTAT_PENDING_IO (1 << 0)
#define PGSTAT_PENDING_WAL (1 << 1)
#define PGSTAT_PENDING_SLRU (1 << 2)

and check for a pgstat_pending_mask in pgstat_report_stat() instead?

They would need to set pgstat_pending_mask accordingly when they flush, have
pending stats though.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#4Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#3)
Re: Custom pgstat support performance regression for simple queries

On Jul 23, 2025, at 16:33, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:

Maybe we could use a flag, say:

#define PGSTAT_PENDING_IO (1 << 0)
#define PGSTAT_PENDING_WAL (1 << 1)
#define PGSTAT_PENDING_SLRU (1 << 2)

and check for a pgstat_pending_mask in pgstat_report_stat() instead?

They would need to set pgstat_pending_mask accordingly when they flush, have
pending stats though.

The point is just to say to the report path to move further if at least one of the fixed stats kinds has something to flush, then let the loop do its work across all the stats kinds if they have a flush callback, so we don’t really need to mix multiple numbers; we could just have a single boolean flag that any fixed-sized stats kinds can set to let the reporting know that some activity has happened. This would mean that custom fixed-sized kinds would need to interact with this flag as well, but that makes the fast-exit path dirty cheap with or without custom stats kinds.
--
Michael

#5Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#4)
Re: Custom pgstat support performance regression for simple queries

On Wed, Jul 23, 2025 at 05:09:54PM +0900, Michael Paquier wrote:

On Jul 23, 2025, at 16:33, Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote:

Maybe we could use a flag, say:

#define PGSTAT_PENDING_IO (1 << 0)
#define PGSTAT_PENDING_WAL (1 << 1)
#define PGSTAT_PENDING_SLRU (1 << 2)

and check for a pgstat_pending_mask in pgstat_report_stat() instead?

They would need to set pgstat_pending_mask accordingly when they flush, have
pending stats though.

The point is just to say to the report path to move further if at least one of
the fixed stats kinds has something to flush, then let the loop do its work
across all the stats kinds if they have a flush callback,

Right.

so we don’t really need to mix multiple numbers; we could just have a single
boolean flag that any fixed-sized stats kinds can set to let the reporting know
that some activity has happened.

That works to say "there are pending stats" but not well to say "there are no
pending stats".

Indeed, with a single boolean flag, then how could a stat say that it has nothing
pending anymore (when flushing) without saying "all the stats have nothing
pending" (while some may still have pending stats)?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Andres Freund
andres@anarazel.de
In reply to: Bertrand Drouvot (#5)
Re: Custom pgstat support performance regression for simple queries

Hi,

On 2025-07-23 10:23:53 +0000, Bertrand Drouvot wrote:

On Wed, Jul 23, 2025 at 05:09:54PM +0900, Michael Paquier wrote:

so we don’t really need to mix multiple numbers; we could just have a single
boolean flag that any fixed-sized stats kinds can set to let the reporting know
that some activity has happened.

That works to say "there are pending stats" but not well to say "there are no
pending stats".

Indeed, with a single boolean flag, then how could a stat say that it has nothing
pending anymore (when flushing) without saying "all the stats have nothing
pending" (while some may still have pending stats)?

I don't think that's a problem - reset that global flag after checking it at
the start of pgstat_report_stat() and set it to true if partial_flush is true
at the end of pgstat_report_stat().

Greetings,

Andres Freund

#7Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#2)
Re: Custom pgstat support performance regression for simple queries

Hi,

On 2025-07-23 09:54:12 +0900, Michael Paquier wrote:

On Tue, Jul 22, 2025 at 10:57:06AM -0400, Andres Freund wrote:

It seems rather unsurprising that that causes a slowdown.

The pre-check is there to:
/* Don't expend a clock check if nothing to do */

but you made it way more expensive than a clock check would have been (not
counting old vmware installs or such, where clock checks take ages).

If I change the loop count to only be the builtin stats kinds, the overhead
goes away almost completely.

Noted. I was wondering originally if the threshold for the builtin
and custom kinds should be lowered, as well. Perhaps that's just too
optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN
to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a
strict policy here, but these would make the whole cheaper to begin
with, with enough margin for any new stats kinds.

Yes, 256 seems pointlessly large.

Then, about the loop used here, I'd be OK to keep the past shortcuts
for the builtin fixed-sized stats kinds with the requirement that new
builtin stats kinds need to hack into pgstat_report_stat() themselves
on efficiency grounds, combined with a second loop for custom kinds,
taken only if pgstat_register_kind() has been used by tracking if
that's the case in a flag. Do you have a specific preference here?

I think the downstream approach of having a flag that says if there are *any*
pending stats is better.

Greetings,

Andres Freund

#8Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#6)
Re: Custom pgstat support performance regression for simple queries

On Wed, Jul 23, 2025 at 11:59:11AM -0400, Andres Freund wrote:

On 2025-07-23 10:23:53 +0000, Bertrand Drouvot wrote:

Indeed, with a single boolean flag, then how could a stat say that it has nothing
pending anymore (when flushing) without saying "all the stats have nothing
pending" (while some may still have pending stats)?

I don't think that's a problem - reset that global flag after checking it at
the start of pgstat_report_stat() and set it to true if partial_flush is true
at the end of pgstat_report_stat().

That's one way of doing it. While looking at the code, I tend to
think that it is actually simpler to reset it once at the bottom of
pgstat_report_fixed, not touching it at all if we are in a
partial_flush state because we know that we will need to do one report
later on anyway, be it just for the stats in the dshash. If we do
that, we can also skip entirely the flush_static_cb calls if
pgstat_report_fixed is not set while doing the reports.
--
Michael

#9Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#7)
2 attachment(s)
Re: Custom pgstat support performance regression for simple queries

On Wed, Jul 23, 2025 at 12:00:55PM -0400, Andres Freund wrote:

On 2025-07-23 09:54:12 +0900, Michael Paquier wrote:

Noted. I was wondering originally if the threshold for the builtin
and custom kinds should be lowered, as well. Perhaps that's just too
optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN
to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a
strict policy here, but these would make the whole cheaper to begin
with, with enough margin for any new stats kinds.

Yes, 256 seems pointlessly large.

I still cannot put my finger on what a good number should be, but I've
settled to something that I think should be good enough for an initial
release:
PGSTAT_KIND_CUSTOM_MIN 128 -> 24
PGSTAT_KIND_MAX 256 -> 32

These numbers mean that we have enough room for 7 more builtins kinds,
same for custom kinds. Even if this is an issue, this could always be
moved up at some point even across major releases, even if that would
mean for extension developers to switch to a different ID.

Then, about the loop used here, I'd be OK to keep the past shortcuts
for the builtin fixed-sized stats kinds with the requirement that new
builtin stats kinds need to hack into pgstat_report_stat() themselves
on efficiency grounds, combined with a second loop for custom kinds,
taken only if pgstat_register_kind() has been used by tracking if
that's the case in a flag. Do you have a specific preference here?

I think the downstream approach of having a flag that says if there are *any*
pending stats is better.

Works for me. I am finishing with the attached, meaning that
workloads that trigger no stats can take the fast-exit path with a
single boolean check. What do you think?
--
Michael

Attachments:

0001-Lower-bounds-of-pgstats-kinds.patchtext/x-diff; charset=us-asciiDownload
From 05ded16fe6ffd87835ff29b0136e46bb76a81687 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 24 Jul 2025 08:18:06 +0900
Subject: [PATCH 1/2] Lower bounds of pgstats kinds

This changes stats kinds to have the following bounds, making their
handling in core cheaper by default:
- PGSTAT_KIND_CUSTOM_MIN 128 -> 24
- PGSTAT_KIND_MAX 256 -> 32

This should be enough to leave some room for the following years for
built-in and custom stats kinds.  At least that should be enough to
start with this facility for extension developers.
---
 src/include/utils/pgstat_kind.h                           | 6 +++---
 src/test/modules/injection_points/injection_stats.c       | 2 +-
 src/test/modules/injection_points/injection_stats_fixed.c | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/include/utils/pgstat_kind.h b/src/include/utils/pgstat_kind.h
index f44169fd5a3c..eb5f0b3ae6db 100644
--- a/src/include/utils/pgstat_kind.h
+++ b/src/include/utils/pgstat_kind.h
@@ -18,7 +18,7 @@
 
 /* Range of IDs allowed, for built-in and custom kinds */
 #define PGSTAT_KIND_MIN	1		/* Minimum ID allowed */
-#define PGSTAT_KIND_MAX	256		/* Maximum ID allowed */
+#define PGSTAT_KIND_MAX	32		/* Maximum ID allowed */
 
 /* use 0 for INVALID, to catch zero-initialized data */
 #define PGSTAT_KIND_INVALID 0
@@ -46,7 +46,7 @@
 /* Custom stats kinds */
 
 /* Range of IDs allowed for custom stats kinds */
-#define PGSTAT_KIND_CUSTOM_MIN	128
+#define PGSTAT_KIND_CUSTOM_MIN	24
 #define PGSTAT_KIND_CUSTOM_MAX	PGSTAT_KIND_MAX
 #define PGSTAT_KIND_CUSTOM_SIZE	(PGSTAT_KIND_CUSTOM_MAX - PGSTAT_KIND_CUSTOM_MIN + 1)
 
@@ -55,7 +55,7 @@
  * development and have not reserved their own unique kind ID yet. See:
  * https://wiki.postgresql.org/wiki/CustomCumulativeStats
  */
-#define PGSTAT_KIND_EXPERIMENTAL	128
+#define PGSTAT_KIND_EXPERIMENTAL	24
 
 static inline bool
 pgstat_is_kind_builtin(PgStat_Kind kind)
diff --git a/src/test/modules/injection_points/injection_stats.c b/src/test/modules/injection_points/injection_stats.c
index 14903c629e0d..e3947b23ba57 100644
--- a/src/test/modules/injection_points/injection_stats.c
+++ b/src/test/modules/injection_points/injection_stats.c
@@ -59,7 +59,7 @@ static const PgStat_KindInfo injection_stats = {
 /*
  * Kind ID reserved for statistics of injection points.
  */
-#define PGSTAT_KIND_INJECTION	129
+#define PGSTAT_KIND_INJECTION	25
 
 /* Track if stats are loaded */
 static bool inj_stats_loaded = false;
diff --git a/src/test/modules/injection_points/injection_stats_fixed.c b/src/test/modules/injection_points/injection_stats_fixed.c
index 3d0c01bdd05a..bc54c79d190b 100644
--- a/src/test/modules/injection_points/injection_stats_fixed.c
+++ b/src/test/modules/injection_points/injection_stats_fixed.c
@@ -64,7 +64,7 @@ static const PgStat_KindInfo injection_stats_fixed = {
 /*
  * Kind ID reserved for statistics of injection points.
  */
-#define PGSTAT_KIND_INJECTION_FIXED	130
+#define PGSTAT_KIND_INJECTION_FIXED	26
 
 /* Track if fixed-numbered stats are loaded */
 static bool inj_fixed_loaded = false;
-- 
2.50.0

0002-Fix-performance-regression-with-pgstats-reports.patchtext/x-diff; charset=us-asciiDownload
From 6814ddec82e40a14d61ba0a5c93974c76640ab52 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Thu, 24 Jul 2025 09:03:53 +0900
Subject: [PATCH 2/2] Fix performance regression with pgstats reports

The introduction of the new callback to control the flush of pending
entries is proving to show up high in workloads that do not report any
stats (read only, no function calls, etc).

This commit switches the code to use a cheaper approach, with a single
boolean flag that can be switched to "true" by any fixed-numbered stats
kinds to force pgstat_report_stat() to go through one round of reports.
The flag can only be reset by pgstat_report_stat() once a full round of
reports is done.

The callback have_static_pending_cb can be removed as an effect of that.
---
 src/include/utils/pgstat_internal.h         | 30 +++++++++---------
 src/backend/access/transam/xlog.c           | 10 ++++++
 src/backend/utils/activity/pgstat.c         | 35 ++++++---------------
 src/backend/utils/activity/pgstat_backend.c | 14 ++-------
 src/backend/utils/activity/pgstat_io.c      | 10 +-----
 src/backend/utils/activity/pgstat_slru.c    | 10 +-----
 src/backend/utils/activity/pgstat_wal.c     | 20 ++++++------
 7 files changed, 49 insertions(+), 80 deletions(-)

diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index d5557e6e998c..c4c82c1235a6 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -295,19 +295,12 @@ typedef struct PgStat_KindInfo
 	 *
 	 * Returns true if some of the stats could not be flushed, due to lock
 	 * contention for example. Optional.
+	 *
+	 * "pgstat_report_fixed" needs to be set to trigger the flush of pending
+	 * stats.
 	 */
 	bool		(*flush_static_cb) (bool nowait);
 
-	/*
-	 * For fixed-numbered or variable-numbered statistics: Check for pending
-	 * stats in need of flush with flush_static_cb, when these do not use
-	 * PgStat_EntryRef->pending.
-	 *
-	 * Returns true if there are any stats pending for flush, triggering
-	 * flush_static_cb. Optional.
-	 */
-	bool		(*have_static_pending_cb) (void);
-
 	/*
 	 * For fixed-numbered statistics: Reset All.
 	 */
@@ -627,7 +620,6 @@ extern void pgstat_archiver_snapshot_cb(void);
 
 extern bool pgstat_flush_backend(bool nowait, bits32 flags);
 extern bool pgstat_backend_flush_cb(bool nowait);
-extern bool pgstat_backend_have_pending_cb(void);
 extern void pgstat_backend_reset_timestamp_cb(PgStatShared_Common *header,
 											  TimestampTz ts);
 
@@ -676,7 +668,6 @@ extern bool pgstat_function_flush_cb(PgStat_EntryRef *entry_ref, bool nowait);
 
 extern void pgstat_flush_io(bool nowait);
 
-extern bool pgstat_io_have_pending_cb(void);
 extern bool pgstat_io_flush_cb(bool nowait);
 extern void pgstat_io_init_shmem_cb(void *stats);
 extern void pgstat_io_reset_all_cb(TimestampTz ts);
@@ -738,7 +729,6 @@ extern PgStatShared_Common *pgstat_init_entry(PgStat_Kind kind,
  * Functions in pgstat_slru.c
  */
 
-extern bool pgstat_slru_have_pending_cb(void);
 extern bool pgstat_slru_flush_cb(bool nowait);
 extern void pgstat_slru_init_shmem_cb(void *stats);
 extern void pgstat_slru_reset_all_cb(TimestampTz ts);
@@ -750,7 +740,6 @@ extern void pgstat_slru_snapshot_cb(void);
  */
 
 extern void pgstat_wal_init_backend_cb(void);
-extern bool pgstat_wal_have_pending_cb(void);
 extern bool pgstat_wal_flush_cb(bool nowait);
 extern void pgstat_wal_init_shmem_cb(void *stats);
 extern void pgstat_wal_reset_all_cb(TimestampTz ts);
@@ -780,6 +769,19 @@ extern void pgstat_create_transactional(PgStat_Kind kind, Oid dboid, uint64 obji
 
 extern PGDLLIMPORT PgStat_LocalState pgStatLocal;
 
+/*
+ * Track if *any* pending fixed-numbered statistics should be flushed to
+ * shared memory.
+ *
+ * This flag can be switched to true by fixed-numbered statistics to let
+ * pgstat_report_stat() know if it needs to go through one round of
+ * reports, giving the possibility of a fast-path exit if there are no
+ * pending fixed-numbered statistics.
+ *
+ * Statistics callbacks should never reset this flag; pgstat_report_stat()
+ * is in charge of doing that.
+ */
+extern PGDLLIMPORT bool	pgstat_report_fixed;
 
 /*
  * Implementation of inline functions declared above.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index eefffc4277a1..b0891998b243 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -96,6 +96,7 @@
 #include "utils/guc_hooks.h"
 #include "utils/guc_tables.h"
 #include "utils/injection_point.h"
+#include "utils/pgstat_internal.h"
 #include "utils/ps_status.h"
 #include "utils/relmapper.h"
 #include "utils/snapmgr.h"
@@ -1091,6 +1092,9 @@ XLogInsertRecord(XLogRecData *rdata,
 		pgWalUsage.wal_bytes += rechdr->xl_tot_len;
 		pgWalUsage.wal_records++;
 		pgWalUsage.wal_fpi += num_fpi;
+
+		/* Required for the flush of pending stats WAL data */
+		pgstat_report_fixed = true;
 	}
 
 	return EndPos;
@@ -2108,6 +2112,12 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 					LWLockRelease(WALWriteLock);
 					pgWalUsage.wal_buffers_full++;
 					TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
+
+					/*
+					 * Required for the flush of pending stats WAL data, per
+					 * update of pgWalUsage.
+					 */
+					pgstat_report_fixed = true;
 				}
 			}
 		}
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 8b57845e8709..0174df929481 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -212,6 +212,11 @@ int			pgstat_fetch_consistency = PGSTAT_FETCH_CONSISTENCY_CACHE;
 
 PgStat_LocalState pgStatLocal;
 
+/*
+ * Track pending reports for fixed-numbered stats, used by
+ * pgstat_report_stat().
+ */
+bool		pgstat_report_fixed = false;
 
 /* ----------
  * Local data
@@ -370,7 +375,6 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_off = offsetof(PgStatShared_Backend, stats),
 		.shared_data_len = sizeof(((PgStatShared_Backend *) 0)->stats),
 
-		.have_static_pending_cb = pgstat_backend_have_pending_cb,
 		.flush_static_cb = pgstat_backend_flush_cb,
 		.reset_timestamp_cb = pgstat_backend_reset_timestamp_cb,
 	},
@@ -437,7 +441,6 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
 
 		.flush_static_cb = pgstat_io_flush_cb,
-		.have_static_pending_cb = pgstat_io_have_pending_cb,
 		.init_shmem_cb = pgstat_io_init_shmem_cb,
 		.reset_all_cb = pgstat_io_reset_all_cb,
 		.snapshot_cb = pgstat_io_snapshot_cb,
@@ -455,7 +458,6 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 		.shared_data_len = sizeof(((PgStatShared_SLRU *) 0)->stats),
 
 		.flush_static_cb = pgstat_slru_flush_cb,
-		.have_static_pending_cb = pgstat_slru_have_pending_cb,
 		.init_shmem_cb = pgstat_slru_init_shmem_cb,
 		.reset_all_cb = pgstat_slru_reset_all_cb,
 		.snapshot_cb = pgstat_slru_snapshot_cb,
@@ -474,7 +476,6 @@ static const PgStat_KindInfo pgstat_kind_builtin_infos[PGSTAT_KIND_BUILTIN_SIZE]
 
 		.init_backend_cb = pgstat_wal_init_backend_cb,
 		.flush_static_cb = pgstat_wal_flush_cb,
-		.have_static_pending_cb = pgstat_wal_have_pending_cb,
 		.init_shmem_cb = pgstat_wal_init_shmem_cb,
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
@@ -708,29 +709,10 @@ pgstat_report_stat(bool force)
 	}
 
 	/* Don't expend a clock check if nothing to do */
-	if (dlist_is_empty(&pgStatPending))
+	if (dlist_is_empty(&pgStatPending) &&
+		!pgstat_report_fixed)
 	{
-		bool		do_flush = false;
-
-		/* Check for pending stats */
-		for (PgStat_Kind kind = PGSTAT_KIND_MIN; kind <= PGSTAT_KIND_MAX; kind++)
-		{
-			const PgStat_KindInfo *kind_info = pgstat_get_kind_info(kind);
-
-			if (!kind_info)
-				continue;
-			if (!kind_info->have_static_pending_cb)
-				continue;
-
-			if (kind_info->have_static_pending_cb())
-			{
-				do_flush = true;
-				break;
-			}
-		}
-
-		if (!do_flush)
-			return 0;
+		return 0;
 	}
 
 	/*
@@ -815,6 +797,7 @@ pgstat_report_stat(bool force)
 	}
 
 	pending_since = 0;
+	pgstat_report_fixed = false;
 
 	return 0;
 }
diff --git a/src/backend/utils/activity/pgstat_backend.c b/src/backend/utils/activity/pgstat_backend.c
index 51256277e8d3..8714a85e2d93 100644
--- a/src/backend/utils/activity/pgstat_backend.c
+++ b/src/backend/utils/activity/pgstat_backend.c
@@ -66,6 +66,7 @@ pgstat_count_backend_io_op_time(IOObject io_object, IOContext io_context,
 				   io_time);
 
 	backend_has_iostats = true;
+	pgstat_report_fixed = true;
 }
 
 void
@@ -81,6 +82,7 @@ pgstat_count_backend_io_op(IOObject io_object, IOContext io_context,
 	PendingBackendStats.pending_io.bytes[io_object][io_context][io_op] += bytes;
 
 	backend_has_iostats = true;
+	pgstat_report_fixed = true;
 }
 
 /*
@@ -301,18 +303,6 @@ pgstat_flush_backend(bool nowait, bits32 flags)
 	return false;
 }
 
-/*
- * Check if there are any backend stats waiting for flush.
- */
-bool
-pgstat_backend_have_pending_cb(void)
-{
-	if (!pgstat_tracks_backend_bktype(MyBackendType))
-		return false;
-
-	return (backend_has_iostats || pgstat_backend_wal_have_pending());
-}
-
 /*
  * Callback to flush out locally pending backend statistics.
  *
diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c
index d8d26379a571..13ae57ed6498 100644
--- a/src/backend/utils/activity/pgstat_io.c
+++ b/src/backend/utils/activity/pgstat_io.c
@@ -80,6 +80,7 @@ pgstat_count_io_op(IOObject io_object, IOContext io_context, IOOp io_op,
 	pgstat_count_backend_io_op(io_object, io_context, io_op, cnt, bytes);
 
 	have_iostats = true;
+	pgstat_report_fixed = true;
 }
 
 /*
@@ -167,15 +168,6 @@ pgstat_fetch_stat_io(void)
 	return &pgStatLocal.snapshot.io;
 }
 
-/*
- * Check if there any IO stats waiting for flush.
- */
-bool
-pgstat_io_have_pending_cb(void)
-{
-	return have_iostats;
-}
-
 /*
  * Simpler wrapper of pgstat_io_flush_cb()
  */
diff --git a/src/backend/utils/activity/pgstat_slru.c b/src/backend/utils/activity/pgstat_slru.c
index b9e940dde45b..7bd8744accb0 100644
--- a/src/backend/utils/activity/pgstat_slru.c
+++ b/src/backend/utils/activity/pgstat_slru.c
@@ -143,15 +143,6 @@ pgstat_get_slru_index(const char *name)
 	return (SLRU_NUM_ELEMENTS - 1);
 }
 
-/*
- * Check if there are any SLRU stats entries waiting for flush.
- */
-bool
-pgstat_slru_have_pending_cb(void)
-{
-	return have_slrustats;
-}
-
 /*
  * Flush out locally pending SLRU stats entries
  *
@@ -247,6 +238,7 @@ get_slru_entry(int slru_idx)
 	Assert((slru_idx >= 0) && (slru_idx < SLRU_NUM_ELEMENTS));
 
 	have_slrustats = true;
+	pgstat_report_fixed = true;
 
 	return &pending_SLRUStats[slru_idx];
 }
diff --git a/src/backend/utils/activity/pgstat_wal.c b/src/backend/utils/activity/pgstat_wal.c
index 16a1ecb4d90d..0d04480d2f6d 100644
--- a/src/backend/utils/activity/pgstat_wal.c
+++ b/src/backend/utils/activity/pgstat_wal.c
@@ -71,6 +71,15 @@ pgstat_fetch_stat_wal(void)
 	return &pgStatLocal.snapshot.wal;
 }
 
+/*
+ * To determine whether WAL usage happened.
+ */
+static inline bool
+pgstat_wal_have_pending(void)
+{
+	return pgWalUsage.wal_records != prevWalUsage.wal_records;
+}
+
 /*
  * Calculate how much WAL usage counters have increased by subtracting the
  * previous counters from the current ones.
@@ -92,7 +101,7 @@ pgstat_wal_flush_cb(bool nowait)
 	 * This function can be called even if nothing at all has happened. Avoid
 	 * taking lock for nothing in that case.
 	 */
-	if (!pgstat_wal_have_pending_cb())
+	if (!pgstat_wal_have_pending())
 		return false;
 
 	/*
@@ -136,15 +145,6 @@ pgstat_wal_init_backend_cb(void)
 	prevWalUsage = pgWalUsage;
 }
 
-/*
- * To determine whether WAL usage happened.
- */
-bool
-pgstat_wal_have_pending_cb(void)
-{
-	return pgWalUsage.wal_records != prevWalUsage.wal_records;
-}
-
 void
 pgstat_wal_init_shmem_cb(void *stats)
 {
-- 
2.50.0

#10Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#9)
Re: Custom pgstat support performance regression for simple queries

Hi,

On Thu, Jul 24, 2025 at 09:34:45AM +0900, Michael Paquier wrote:

On Wed, Jul 23, 2025 at 12:00:55PM -0400, Andres Freund wrote:

On 2025-07-23 09:54:12 +0900, Michael Paquier wrote:

Noted. I was wondering originally if the threshold for the builtin
and custom kinds should be lowered, as well. Perhaps that's just too
optimistic, so we can first lower things. Say PGSTAT_KIND_CUSTOM_MIN
to 32 and PGSTAT_KIND_MAX to 48 or so to begin with? I don't see a
strict policy here, but these would make the whole cheaper to begin
with, with enough margin for any new stats kinds.

Yes, 256 seems pointlessly large.

I still cannot put my finger on what a good number should be, but I've
settled to something that I think should be good enough for an initial
release:
PGSTAT_KIND_CUSTOM_MIN 128 -> 24
PGSTAT_KIND_MAX 256 -> 32

These numbers mean that we have enough room for 7 more builtins kinds,

11 for builtins kinds? (from 13 to 23)

same for custom kinds.

9 for custom kinds including experimental? (24 -> 32)

I think that gives some room (like we'll almost double the builtins kinds if we
were to use them all).

Even if this is an issue, this could always be
moved up at some point even across major releases, even if that would
mean for extension developers to switch to a different ID.

Yeah...

  * development and have not reserved their own unique kind ID yet. See:
  * https://wiki.postgresql.org/wiki/CustomCumulativeStats
  */
-#define PGSTAT_KIND_EXPERIMENTAL       128
+#define PGSTAT_KIND_EXPERIMENTAL       24

Note that we'll have to update the wiki too.

Then, about the loop used here, I'd be OK to keep the past shortcuts
for the builtin fixed-sized stats kinds with the requirement that new
builtin stats kinds need to hack into pgstat_report_stat() themselves
on efficiency grounds, combined with a second loop for custom kinds,
taken only if pgstat_register_kind() has been used by tracking if
that's the case in a flag. Do you have a specific preference here?

I think the downstream approach of having a flag that says if there are *any*
pending stats is better.

Works for me. I am finishing with the attached, meaning that
workloads that trigger no stats can take the fast-exit path with a
single boolean check. What do you think?

That would work but I'm a bit worried about:

- it's easy to miss and error prone to update pgstat_report_fixed (for new code
path, new builtin fixed-sized stats kinds)

- it's easy to miss for extensions that they have to update this global variable.
Indeed, the cb have_static_pending_cb has been removed (there is the new comment
in PgStat_KindInfo though, but I've the feeling it's easier to miss as compared
to the cb)

What about?

- create a global variable but that should be used only by extensions? Say,
pgstat_report_custom_fixed

- for builtin fixed-sized, just rely on have_iostats, have_slrustats and friends

Then in pgstat_report_stat():

- do the check on have_iostats, have_slrustats and friends and on
pgstat_report_custom_fixed

- reset pgstat_report_custom_fixed

That way I think it's less error prone for the builtin stats and more clear
for the extensions (as at least "custom" is also part of the global variable
name and the check in pgstat_report_stat() would make it clear that we rely on
the custom global variable).

Thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#11Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#10)
Re: Custom pgstat support performance regression for simple queries

On Thu, Jul 24, 2025 at 07:38:46AM +0000, Bertrand Drouvot wrote:

On Thu, Jul 24, 2025 at 09:34:45AM +0900, Michael Paquier wrote:

These numbers mean that we have enough room for 7 more builtins kinds,

11 for builtins kinds? (from 13 to 23)

9 for custom kinds including experimental? (24 -> 32)

I think that gives some room (like we'll almost double the builtins kinds if we
were to use them all).

Yes. Problems with a lack of fingers and caffeine, perhaps.

* development and have not reserved their own unique kind ID yet. See:
* https://wiki.postgresql.org/wiki/CustomCumulativeStats
*/
-#define PGSTAT_KIND_EXPERIMENTAL       128
+#define PGSTAT_KIND_EXPERIMENTAL       24

Note that we'll have to update the wiki too.

Yep, that was on my TODO list.

What about?

- create a global variable but that should be used only by extensions? Say,
pgstat_report_custom_fixed

- for builtin fixed-sized, just rely on have_iostats, have_slrustats and friends

Then in pgstat_report_stat():

- do the check on have_iostats, have_slrustats and friends and on
pgstat_report_custom_fixed

- reset pgstat_report_custom_fixed

That way I think it's less error prone for the builtin stats and more clear
for the extensions (as at least "custom" is also part of the global variable
name and the check in pgstat_report_stat() would make it clear that we rely on
the custom global variable).

Thoughts?

FWIW, I see more benefits in the simplicity of a single flag to govern
both builtin and custom kinds, but I can see that this may come down
to personal taste. A single flag is simpler here, and cheap.

What have_static_pending_cb was originally aiming for is the removal
of any knowledge specific to stats kinds in the central pgstats APIs,
which is why the flags specific to IO, SLRU and WAL have been moved
out into their own files.

Letting custom stats manipulate pgstat_report_fixed or invent a new
pgstat_report_custom_fixed would be logically the same thing, but
this comes down to the fact that we still want to decide if a report
should be triggered if any of the fixed-numbered stats want to let
pgstat_report_stat() do one round of pending stats flush.

Having a second flag would keep this abstraction level intact. Now
that leads to overcomplicating the API for a small gain in
debuggability to know which part of the subsystem wants the report to
happen at early stages of pgstat_report_stat(). If we want to know
exactly which stats kind wants the flush to happen, it may be better
to reuse your idea of one single uint32 or uint64 with one bit
allocated for each stats kind to have this information available in
pgstat_report_fixed(). However, we already know that with the
flush_static_cb callback, as dedicated paths can be taken for each
stats kinds. Once again, this would require stats kind to set their
bit to force a round of reports, gaining more information before we
try to call each flush_static_cb.

I am going to apply the patch to lower the bounds in a bit, to begin
with, then adjust the wiki.
--
Michael

#12Bertrand Drouvot
bertranddrouvot.pg@gmail.com
In reply to: Michael Paquier (#11)
Re: Custom pgstat support performance regression for simple queries

Hi,

On Fri, Jul 25, 2025 at 10:38:59AM +0900, Michael Paquier wrote:

On Thu, Jul 24, 2025 at 07:38:46AM +0000, Bertrand Drouvot wrote:

What about?

- create a global variable but that should be used only by extensions? Say,
pgstat_report_custom_fixed

- for builtin fixed-sized, just rely on have_iostats, have_slrustats and friends

Then in pgstat_report_stat():

- do the check on have_iostats, have_slrustats and friends and on
pgstat_report_custom_fixed

- reset pgstat_report_custom_fixed

That way I think it's less error prone for the builtin stats and more clear
for the extensions (as at least "custom" is also part of the global variable
name and the check in pgstat_report_stat() would make it clear that we rely on
the custom global variable).

Thoughts?

FWIW, I see more benefits in the simplicity of a single flag to govern
both builtin and custom kinds, but I can see that this may come down
to personal taste. A single flag is simpler here, and cheap.

What have_static_pending_cb was originally aiming for is the removal
of any knowledge specific to stats kinds in the central pgstats APIs,
which is why the flags specific to IO, SLRU and WAL have been moved
out into their own files.

Yes, with my idea we'd need to move them back.

Letting custom stats manipulate pgstat_report_fixed or invent a new
pgstat_report_custom_fixed would be logically the same thing, but
this comes down to the fact that we still want to decide if a report
should be triggered if any of the fixed-numbered stats want to let
pgstat_report_stat() do one round of pending stats flush.

Yes.

Having a second flag would keep this abstraction level intact.

Not a second, just one global "pgstat_report_custom_fixed" and the specifics
flags for IO, SLRU, something like:

if (dlist_is_empty(&pgStatPending) &&
!have_iostats &&
!have_slrustats &&
!pgstat_report_custom_fixed &&
!pgstat_have_pending_wal())

Now
that leads to overcomplicating the API

Not sure.

for a small gain in
debuggability to know which part of the subsystem wants the report to
happen at early stages of pgstat_report_stat(). If we want to know
exactly which stats kind wants the flush to happen, it may be better
to reuse your idea of one single uint32 or uint64 with one bit
allocated for each stats kind to have this information available in
pgstat_report_fixed(). However, we already know that with the
flush_static_cb callback, as dedicated paths can be taken for each
stats kinds. Once again, this would require stats kind to set their
bit to force a round of reports, gaining more information before we
try to call each flush_static_cb.

Yeah. I'm fine with one single flag as you proposed, I just have the feeling
it's more error prone.

As an example:

@@ -1091,6 +1092,9 @@ XLogInsertRecord(XLogRecData *rdata,
                pgWalUsage.wal_bytes += rechdr->xl_tot_len;
                pgWalUsage.wal_records++;
                pgWalUsage.wal_fpi += num_fpi;
+
+               /* Required for the flush of pending stats WAL data */
+               pgstat_report_fixed = true;
        }
        return EndPos;
@@ -2108,6 +2112,12 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
                                        LWLockRelease(WALWriteLock);
                                        pgWalUsage.wal_buffers_full++;
                                        TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY_DONE();
+
+                                       /*
+                                        * Required for the flush of pending stats WAL data, per
+                                        * update of pgWalUsage.
+                                        */
+                                       pgstat_report_fixed = true;
                                }

This was not needed before fc415edf8ca8, and I think it was better to use
pgstat_have_pending_wal() to not have to set a flag to every places pgWalUsage.XX
changes.

Having said that, I think the single global flag patch is pretty straightforward
and LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#13Michael Paquier
michael@paquier.xyz
In reply to: Bertrand Drouvot (#12)
Re: Custom pgstat support performance regression for simple queries

On Fri, Jul 25, 2025 at 08:57:29AM +0000, Bertrand Drouvot wrote:

This was not needed before fc415edf8ca8, and I think it was better to use
pgstat_have_pending_wal() to not have to set a flag to every places pgWalUsage.XX
changes.

At the end, this cost makes the separation of the code layers a bit
better, by having pgstat.c not know about the specifics around the
fixed-numbered stats, so..

Having said that, I think the single global flag patch is pretty straightforward
and LGTM.

I have used that and applied it down to v18, closing the open item.
--
Michael

#14Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#13)
Re: Custom pgstat support performance regression for simple queries

On 2025-07-28 08:18:01 +0900, Michael Paquier wrote:

I have used that and applied it down to v18, closing the open item.

Thanks!