per backend WAL statistics
Hi hackers,
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics), we can more easily add per backend statistics.
Please find attached a patch to implement $SUBJECT.
It's using the same layer as pg_stat_wal, except that it is now possible to know
how much WAL activity is happening in each backend rather than an overall
aggregate of all the activity.
A function called pg_stat_get_backend_wal() is added to access this data
depending on the PID of a backend.
=== Outcome ===
With this in place, one could for example:
1. Get the WAL statistics for a backend pid:
postgres=# select * from pg_stat_get_backend_wal(473278);
-[ RECORD 1 ]----+---------
wal_records | 300008
wal_fpi | 7
wal_bytes | 17753097
wal_buffers_full | 933
wal_write | 937
wal_sync | 2
wal_write_time | 0
wal_sync_time | 0
stats_reset |
2. Get the wal_bytes generated by application:
postgres=# SELECT application_name, wal_bytes, round(100 * wal_bytes/sum(wal_bytes) over(),2) AS "%"
FROM (SELECT application_name, sum(wal_bytes) AS wal_bytes
FROM pg_stat_activity, pg_stat_get_backend_wal(pid)
WHERE wal_bytes != 0 GROUP BY application_name);
application_name | wal_bytes | %
------------------+-----------+-------
app1 | 17708761 | 39.95
app2 | 26614797 | 60.05
(2 rows)
3. Get the wal_bytes generated by database:
postgres=# SELECT datname, wal_bytes, round(100 * wal_bytes/sum(wal_bytes) over(),2) AS "%"
FROM (SELECT datname, sum(wal_bytes) AS wal_bytes
FROM pg_stat_activity, pg_stat_get_backend_wal(pid)
WHERE wal_bytes != 0 GROUP BY datname);
datname | wal_bytes | %
---------+-----------+-------
db1 | 35461858 | 80.01
db2 | 8861700 | 19.99
(2 rows)
and much more...
=== Implementation ===
The same limitation as in 9aea73fc61 persists, meaning that Auxiliary processes
are not included in this set of statistics.
The patch is made of 3 sub-patches:
0001: to extract the logic filling pg_stat_get_wal()'s tuple into its own routine.
It adds pg_stat_wal_build_tuple(), a helper routine for pg_stat_get_wal(), that
fills its tuple based on the contents of PgStat_WalStats. Same idea as ff7c40d7fd.
0002: PGSTAT_KIND_BACKEND code refactoring. It refactors some come related to per
backend statistics. It makes the code more generic or more IO statistics focused
as it will be used in 0003 that will introduce per backend WAL statistics. It does
not add any new feature, that's 100% code refactoring to ease 0003 review.
0003: it adds the per backend WAL statistics and the new pg_stat_get_backend_wal()
function, documentation and related test.
=== Remarks ===
R1:
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
R2:
Instead of relying on a new PendingBackendWalStats, we could rely on the
existing PendingWalStats variable. But that would complicate the flush of
per backend and existing wal stats as that would need some coordination. I think
that it's better that each kind has its own pending variable.
R3:
Instead of incrementing the PendingBackendWalStats members individually we could
also "just" assign the PendingWalStats ones once incremented. I thought it's
better to make them "fully independent" though.
R4:
0002 introduces a new PgStat_BackendPending struct. Due to R1, that's not needed
per say but could have been if pgstat_prep_backend_pending() would have been
used. I keep this change as we may want to add more per backend stats in the future.
Looking forward to your feedback,
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Jan 07, 2025 at 08:48:51AM +0000, Bertrand Drouvot wrote:
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics), we can more easily add per backend statistics.Please find attached a patch to implement $SUBJECT.
I've looked at v1-0002 and v1-0001.
+static void
+pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool need_lock)
{
- PgStatShared_Backend *shbackendioent;
- PgStat_BackendPendingIO *pendingent;
+ PgStatShared_Backend *shbackendent;
+ PgStat_BackendPending *pendingent;
PgStat_BktypeIO *bktype_shstats;
+ PgStat_BackendPendingIO *pending_io;
- if (!pgstat_lock_entry(entry_ref, nowait))
- return false;
+ if (need_lock && !pgstat_lock_entry(entry_ref, nowait))
+ return;
The addition of need_lock at this level leads to a result that seems a
bit confusing, where pgstat_backend_flush_cb() passes "false" because
it locks the entry by itself as an effect of v1-0003 with the new area
for WAL. Wouldn't it be cleaner to do an extra pgstat_[un]lock_entry
dance in pgstat_backend_flush_io() instead? Another approach I can
think of that would be slightly cleaner to me is to pass a bits32 to a
single routine that would control if WAL stats, I/O stats or both
should be flushed, keeping pgstat_flush_backend() as name with an
extra argument to decide which parts of the stats should be flushed.
-PgStat_BackendPendingIO *
+PgStat_BackendPending *
This rename makes sense.
#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS] = {0};
bool nulls[PG_STAT_GET_WAL_COLS] = {0};
It feels unnatural to have a PG_STAT_GET_WAL_COLS while it would not
only relate to this function anymore.
--
Michael
Hi,
On Wed, Jan 08, 2025 at 03:21:26PM +0900, Michael Paquier wrote:
On Tue, Jan 07, 2025 at 08:48:51AM +0000, Bertrand Drouvot wrote:
Now that commit 9aea73fc61 added backend-level statistics to pgstats (and
per backend IO statistics), we can more easily add per backend statistics.Please find attached a patch to implement $SUBJECT.
I've looked at v1-0002 and v1-0001.
Thanks for looking at it!
+static void +pgstat_flush_io_entry(PgStat_EntryRef *entry_ref, bool nowait, bool need_lock) { - PgStatShared_Backend *shbackendioent; - PgStat_BackendPendingIO *pendingent; + PgStatShared_Backend *shbackendent; + PgStat_BackendPending *pendingent; PgStat_BktypeIO *bktype_shstats; + PgStat_BackendPendingIO *pending_io;- if (!pgstat_lock_entry(entry_ref, nowait)) - return false; + if (need_lock && !pgstat_lock_entry(entry_ref, nowait)) + return;The addition of need_lock at this level leads to a result that seems a
bit confusing, where pgstat_backend_flush_cb() passes "false" because
it locks the entry by itself as an effect of v1-0003 with the new area
for WAL. Wouldn't it be cleaner to do an extra pgstat_[un]lock_entry
dance in pgstat_backend_flush_io() instead? Another approach I can
think of that would be slightly cleaner to me is to pass a bits32 to a
single routine that would control if WAL stats, I/O stats or both
should be flushed, keeping pgstat_flush_backend() as name with an
extra argument to decide which parts of the stats should be flushed.
Yeah, that's more elegant as it also means that the main callback will not change
(should we add even more stats in the future). Done that way in v2 attached.
-PgStat_BackendPendingIO * +PgStat_BackendPending *This rename makes sense.
#define PG_STAT_GET_WAL_COLS 9
TupleDesc tupdesc;
Datum values[PG_STAT_GET_WAL_COLS] = {0};
bool nulls[PG_STAT_GET_WAL_COLS] = {0};It feels unnatural to have a PG_STAT_GET_WAL_COLS while it would not
only relate to this function anymore.
Has been renamed to PG_STAT_WAL_COLS in the attached.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Jan 08, 2025 at 11:11:59AM +0000, Bertrand Drouvot wrote:
Yeah, that's more elegant as it also means that the main callback will not change
(should we add even more stats in the future). Done that way in v2 attached.
I've put my hands on v2-0002 to begin with something.
+/* flag bits for different types of statistics to flush */
+#define PGSTAT_FLUSH_IO (1 << 0) /* Flush I/O statistics */
+#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO)
These are located and used only in pgstat_backend.c. It seems to me
that we'd better declare them in pgstat_internal.h and extend the
existing pgstat_flush_backend() with an argument so as callers can do
what they want.
+ /* Get our own entry_ref if not provided */
+ if (!entry_ref)
+ entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid,
+ MyProcNumber, false, NULL);
This relates to the previous remark, actually, where I think that it
is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(),
same way as HEAD, and just pass down the flags. pgstat_flush_backend
cannot call directly pgstat_backend_flush_cb(), of course, so I've
settled down to a new pgstat_flush_backend_entry() that handles the
entry locking. This comes at the cost of pgstat_flush_backend_entry()
requiring an extra pgstat_tracks_backend_bktype(), which is not a big
issue, and the patch gets a bit shorter.
--
Michael
Attachments:
v3-0002-PGSTAT_KIND_BACKEND-code-refactoring.patchtext/x-diff; charset=us-asciiDownload+68-32
Hi,
On Thu, Jan 09, 2025 at 01:03:15PM +0900, Michael Paquier wrote:
On Wed, Jan 08, 2025 at 11:11:59AM +0000, Bertrand Drouvot wrote:
Yeah, that's more elegant as it also means that the main callback will not change
(should we add even more stats in the future). Done that way in v2 attached.I've put my hands on v2-0002 to begin with something.
+/* flag bits for different types of statistics to flush */ +#define PGSTAT_FLUSH_IO (1 << 0) /* Flush I/O statistics */ +#define PGSTAT_FLUSH_ALL (PGSTAT_FLUSH_IO)These are located and used only in pgstat_backend.c. It seems to me
that we'd better declare them in pgstat_internal.h and extend the
existing pgstat_flush_backend() with an argument so as callers can do
what they want.+ /* Get our own entry_ref if not provided */ + if (!entry_ref) + entry_ref = pgstat_get_entry_ref(PGSTAT_KIND_BACKEND, InvalidOid, + MyProcNumber, false, NULL);This relates to the previous remark, actually, where I think that it
is cleaner to have pgstat_flush_backend() do pgstat_get_entry_ref(),
same way as HEAD, and just pass down the flags.
I see, so you keep pgstat_flush_backend() calls (with an extra arg) and remove
the new "pgstat_backend_flush_io()" function.
This comes at the cost of pgstat_flush_backend_entry()
requiring an extra pgstat_tracks_backend_bktype(), which is not a big
issue, and the patch gets a bit shorter.
Yeah, all of the above is fine by me.
PFA v3 which is v2 refactoring with your proposed above changes.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Thu, Jan 09, 2025 at 07:05:54AM +0000, Bertrand Drouvot wrote:
PFA v3 which is v2 refactoring with your proposed above changes.
An extra thing I have finished by doing is removing
PgStat_BackendPendingIO, then applied the change. It was useful when
returned as a result of pgstat_prep_backend_pending(), but not so much
with the new PgStat_BackendPending that includes all the pending stats
data.
--
Michael
Hi,
Michael Paquier wrote:
An extra thing I have finished by doing is removing
PgStat_BackendPendingIO, then applied the change. It was useful when
returned as a result of pgstat_prep_backend_pending(), but not so much
with the new PgStat_BackendPending that includes all the pending stats
data.
Yeah, makes sense, thanks!
Please find attached v4 taking into account 2c14037bb5.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote:
Please find attached v4 taking into account 2c14037bb5.
+} PgStat_WalCounters;
+
+typedef struct PgStat_WalStats
+{
+ PgStat_WalCounters wal_counters;
I know that's a nit, but perhaps that could be a patch of its own,
pushing that to pg_stat_wal_build_tuple() to reduce the diffs in the
main patch.
- * Find or create a local PgStat_BackendPending entry for proc number.
+ * Find or create a local PgStat_BackendPendingIO entry for proc number.
Seems like you are undoing a change here.
+ * WAL pending statistics are incremented inside a critical section
+ * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on
+ * PendingBackendWalStats instead.
+ */
+extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;
Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.
--
Michael
Hi,
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote:
Please find attached v4 taking into account 2c14037bb5.
+} PgStat_WalCounters; + +typedef struct PgStat_WalStats +{ + PgStat_WalCounters wal_counters;I know that's a nit, but perhaps that could be a patch of its own,
pushing that to pg_stat_wal_build_tuple() to reduce the diffs in the
main patch.
Done in 0003 attached.
- * Find or create a local PgStat_BackendPending entry for proc number. + * Find or create a local PgStat_BackendPendingIO entry for proc number.Seems like you are undoing a change here.
Arf, nice catch, hopefully that's only the comment.. Fixed in the attached.
+ * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.
I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.
The attached is implementing in 0001 the proposition done in [1]/messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz, i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when needed
Note that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).
While it avoids the failed assertion mentioned above and in [1]/messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz (on
the MemoryContextAllocZero() call), the TAP tests are still failing with a new
failed assertion.
If you apply the whole patch series attached, you'll see that:
make -C src/bin/pg_rewind check PROVE_TESTS=t/001_basic.pl
is failing with something like:
TRAP: failed Assert("CritSectionCount == 0"), File: "mcxt.c", Line: 1107, PID: 3295726
pg18/bin/postgres(ExceptionalCondition+0xbb)[0x59668bee1f6d]
pg18/bin/postgres(MemoryContextCreate+0x46)[0x59668bf2a8fe]
pg18/bin/postgres(AllocSetContextCreateInternal+0x1df)[0x59668bf1bb11]
pg18/bin/postgres(pgstat_prep_pending_entry+0x86)[0x59668bcff8cc]
pg18/bin/postgres(pgstat_prep_backend_pending+0x2b)[0x59668bd024a9]
This one is more problematic because we are in MemoryContextCreate() so that the
"workaround" above does not help. I need to put more thoughts on it but already
sharing the issue here (as also discussed in [1]/messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz).
[1]: /messages/by-id/Z4d_eggsxtBEdJAG@paquier.xyz
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Attachments:
v5-0001-Add-allow_critical_section-to-PgStat_KindInfo-for.patchtext/x-diff; charset=us-asciiDownload+36-1
v5-0002-Extract-logic-filling-pg_stat_get_wal-s-tuple-int.patchtext/x-diff; charset=us-asciiDownload+36-21
v5-0003-Adding-a-new-PgStat_WalCounters-struct.patchtext/x-diff; charset=us-asciiDownload+24-17
v5-0004-per-backend-WAL-statistics.patchtext/x-diff; charset=us-asciiDownload+227-30
Hi,
On 2025-01-16 15:59:31 +0000, Bertrand Drouvot wrote:
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote: + * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.
I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.The attached is implementing in 0001 the proposition done in [1], i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when neededNote that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).
This is a preposterously bad idea. The restriction to not allocate memory in
critical sections exists for a reason, why on earth should this code be
allowed to just opt out of the restriction of not allowing memory allocations
in critical sections?
The only cases where we can somewhat safely allocate memory in critical
sections is when using memory contexts with pre-reserved memory, where there's
a pretty low bound on how much memory is going to be needed. E.g. logging a
message inside a critical section, where elog.c can reset ErrorContext
afterwards.
Greetings,
Andres Freund
Hi,
On Thu, Jan 16, 2025 at 11:38:47AM -0500, Andres Freund wrote:
Hi,
On 2025-01-16 15:59:31 +0000, Bertrand Drouvot wrote:
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote: + * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.The attached is implementing in 0001 the proposition done in [1], i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when neededNote that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).This is a preposterously bad idea. The restriction to not allocate memory in
critical sections exists for a reason,
Thanks for sharing your thoughts on it. In [1]/messages/by-id/66efowskppsns35v5u2m7k4sdnl7yoz5bo64tdjwq7r5lhplrz@y7dme5xwh2r5, you said:
"
My view is that for IO stats no memory allocation should be required - that
used to be the case and should be the case again
"
So, do you think that the initial proposal that has been made here (See R1. in
[2]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
"
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"
and implemented up to v4 is a viable approach?
[1]: /messages/by-id/66efowskppsns35v5u2m7k4sdnl7yoz5bo64tdjwq7r5lhplrz@y7dme5xwh2r5
[2]: /messages/by-id/Z3zqc4o09dM/Ezyz@ip-10-97-1-34.eu-west-3.compute.internal
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
On Thu, Jan 16, 2025 at 11:38:47AM -0500, Andres Freund wrote:
Hi,
On 2025-01-16 15:59:31 +0000, Bertrand Drouvot wrote:
On Wed, Jan 15, 2025 at 03:11:32PM +0900, Michael Paquier wrote:
On Fri, Jan 10, 2025 at 09:40:38AM +0000, Bertrand Drouvot wrote: + * WAL pending statistics are incremented inside a critical section + * (see XLogWrite()), so we can't use pgstat_prep_pending_entry() and we rely on + * PendingBackendWalStats instead. + */ +extern PGDLLIMPORT PgStat_PendingWalStats PendingBackendWalStats;Hmm. This makes me wonder if we should rethink a bit the way pending
entries are retrieved and if we should do it beforehand for the WAL
paths to avoid allocations in some critical sections. Isn't that also
because we avoid calling pgstat_prep_backend_pending() for the I/O
case as only backends are supported now, discarding cases like the
checkpointer where I/O could happen in a critical path? As a whole,
the approach taken by the patch is not really consistent with the
rest.I agree that's better to have a generic solution and to be consistent with
the other variable-numbered stats.The attached is implementing in 0001 the proposition done in [1], i.e:
1. It adds a new allow_critical_section to PgStat_KindInfo for pgstats kinds
2. It ensures to set temporarly allowInCritSection to true when neededNote that for safety reason 0001 does set allowInCritSection back to false
unconditionally (means not checking again for allow_critical_section).This is a preposterously bad idea. The restriction to not allocate memory in
critical sections exists for a reason,Thanks for sharing your thoughts on it. In [1], you said:
"
My view is that for IO stats no memory allocation should be required - that
used to be the case and should be the case again
"So, do you think that the initial proposal that has been made here (See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:
Well, I think this first needs be fixed for for the IO stats change made in
commit 9aea73fc61d
Author: Michael Paquier <michael@paquier.xyz>
Date: 2024-12-19 13:19:22 +0900
Add backend-level statistics to pgstats
Once we have a pattern to model after, we can apply the same scheme here.
"
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general than
that, handling this for all types of backend stats, not just for WAL.
Greetings,
Andres Freund
On Thu, Jan 16, 2025 at 12:44:20PM -0500, Andres Freund wrote:
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
So, do you think that the initial proposal that has been made here (See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:Well, I think this first needs be fixed for for the IO stats change made in
Once we have a pattern to model after, we can apply the same scheme here.
Okay, thanks for the input. I was not sure what you intended
originally with all this part of the backend code, and how much would
be acceptable. The line is clear now.
0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general than
that, handling this for all types of backend stats, not just for WAL.
Agreed to use the same concept for all these parts of the backend
stats kind rather than two of them. Will send a reply on the original
backend I/O thread as well.
--
Michael
Hi,
On Fri, Jan 17, 2025 at 08:43:57AM +0900, Michael Paquier wrote:
On Thu, Jan 16, 2025 at 12:44:20PM -0500, Andres Freund wrote:
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
So, do you think that the initial proposal that has been made here (See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:Well, I think this first needs be fixed for for the IO stats change made in
Once we have a pattern to model after, we can apply the same scheme here.
Okay, thanks for the input. I was not sure what you intended
originally with all this part of the backend code, and how much would
be acceptable. The line is clear now.0003 does not rely on pgstat_prep_backend_pending() for its pending statistics
but on a new PendingBackendWalStats variable. The reason is that the pending wal
statistics are incremented in a critical section (see XLogWrite(), and so
a call to pgstat_prep_pending_entry() could trigger a failed assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 || (context)->allowInCritSection"
"and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general than
that, handling this for all types of backend stats, not just for WAL.Agreed to use the same concept for all these parts of the backend
stats kind rather than two of them. Will send a reply on the original
backend I/O thread as well.
PFA v6 that now relies on the new PendingBackendStats variable introduced in
4feba03d8b9.
Remark: I moved PendingBackendStats back to pgstat.h because I think that the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more complex IO
stats handling). So PendingBackendStats is now visible to the outside world like
PendingWalStats and friends.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Jan 21, 2025 at 07:19:55AM +0000, Bertrand Drouvot wrote:
PFA v6 that now relies on the new PendingBackendStats variable introduced in
4feba03d8b9.Remark: I moved PendingBackendStats back to pgstat.h because I think that the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more complex IO
stats handling). So PendingBackendStats is now visible to the outside world like
PendingWalStats and friends.
You are re-doing here a pattern I was trying to avoid so as we don't
copy-paste more checks based on pgstat_tracks_backend_bktype more than
necessary. I am wondering if we should think harder about the
interface used to register WAL stats, and make it more consistent with
the way pg_stat_io is handled, avoiding the hardcoded attribute
numbers if we have an enum to control which field to update in some
input routine.
As we have only five counters in PgStat_PendingWalStats, the result
you have is not that invasive, true.
Are you sure that the interactions between pgWalUsage, prevWalUsage
and prevBackendWalUsage are correct? As far I got it from a code
read, prevWalUsage, prevBackendWalUsage and their local trackings in
pgstat_backend.c and pgstat_wal.c rely on instrument.c as the primary
source, as pgWalUsage can never be reset. Is that right?
--
Michael
Hi,
On Thu, Jan 23, 2025 at 05:05:30PM +0900, Michael Paquier wrote:
On Tue, Jan 21, 2025 at 07:19:55AM +0000, Bertrand Drouvot wrote:
PFA v6 that now relies on the new PendingBackendStats variable introduced in
4feba03d8b9.Remark: I moved PendingBackendStats back to pgstat.h because I think that the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more complex IO
stats handling). So PendingBackendStats is now visible to the outside world like
PendingWalStats and friends.You are re-doing here a pattern I was trying to avoid so as we don't
copy-paste more checks based on pgstat_tracks_backend_bktype more than
necessary.
I'm not sure I get it. pgstat_tracks_backend_bktype() is also called in
pgstat_count_backend_io_op() and pgstat_count_backend_io_op_time(). What issue
do you see with the extra calls part of this patch?
I am wondering if we should think harder about the
interface used to register WAL stats, and make it more consistent with
the way pg_stat_io is handled, avoiding the hardcoded attribute
numbers if we have an enum to control which field to update in some
input routine.
Not sure as WAL stats just tracks a single dimension unlike IO stats which track
both IOObject and IOContext. What would be the benefit(s)?
As we have only five counters in PgStat_PendingWalStats, the result
you have is not that invasive, true.
And only one dimension.
Are you sure that the interactions between pgWalUsage, prevWalUsage
and prevBackendWalUsage are correct?
I think so and according to my testing I can see WalUsage values
that correlate nicely between pg_stat_wal() and pg_stat_get_backend_wal().
As far I got it from a code
read, prevWalUsage, prevBackendWalUsage and their local trackings in
pgstat_backend.c and pgstat_wal.c rely on instrument.c as the primary
source, as pgWalUsage can never be reset. Is that right?
yeah, IIUC pgWalUsage acts as the primary source that both prevWalUsage and
prevBackendWalUsage diff against to calculate incremental stats.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
Thank you for the patchset. Having per-backend WAL statistics,
in addition to cluster-wide ones, is useful.
I had a few comments while looking at v6-0003-* patch.
+ /*
+ * This could be an auxiliary process but these do not report backend
+ * statistics due to pgstat_tracks_backend_bktype(), so there is no need
+ * for an extra call to AuxiliaryPidGetProc().
+ */
+ if (!proc)
+ PG_RETURN_NULL();
Maybe an explicit call to AuxiliaryPidGetProc() followed by a check
for pgstat_tracks_backend_bktype() would be more maintainable.
Since the processes tracked by AuxiliaryPidGetProc and
pgstat_tracks_backend_bktype might diverge in future.
On that note, it is not clear to me why the WAL writer statistics are not
included in per backend
wal statistics? I understand the same limitation currently exists in
pgstats_track_io_bktype(), but why does that need to be extended to
WAL statistics?
+ <primary>pg_stat_get_backend_wal</primary>
+ </indexterm>
+ <function>pg_stat_get_backend_wal</function> (
<type>integer</type> )
+ <returnvalue>record</returnvalue>
+ </para>
Should the naming describe what is being returned more clearly?
Something like pg_stat_get_backend_wal_activity()? Currently it
suggests that it returns a backend's WAL, which is not the case.
+ if (pgstat_tracks_backend_bktype(MyBackendType))
+ {
+ PendingBackendStats.pending_wal.wal_write++;
+
+ if (track_wal_io_timing)
+ INSTR_TIME_ACCUM_DIFF(PendingBackendStats.pending_wal.wal_write_time,
+ end, start);
+ }
At the risk of nitpicking, may I suggest moving the above code, which is
under the
track_wal_io_timing check, to the existing check before this added chunk?
This way, all code related to track_wal_io_timing will be grouped together,
closer to where the "end" variable is computed.
Thank you,
Rahila Syed
On Tue, Jan 21, 2025 at 12:50 PM Bertrand Drouvot <
bertranddrouvot.pg@gmail.com> wrote:
Show quoted text
Hi,
On Fri, Jan 17, 2025 at 08:43:57AM +0900, Michael Paquier wrote:
On Thu, Jan 16, 2025 at 12:44:20PM -0500, Andres Freund wrote:
On 2025-01-16 17:11:09 +0000, Bertrand Drouvot wrote:
So, do you think that the initial proposal that has been made here
(See R1. in
[2]) i.e make use of a new PendingBackendWalStats variable:
Well, I think this first needs be fixed for for the IO stats change
made in
Once we have a pattern to model after, we can apply the same scheme
here.
Okay, thanks for the input. I was not sure what you intended
originally with all this part of the backend code, and how much would
be acceptable. The line is clear now.0003 does not rely on pgstat_prep_backend_pending() for its pending
statistics
but on a new PendingBackendWalStats variable. The reason is that the
pending wal
statistics are incremented in a critical section (see XLogWrite(),
and so
a call to pgstat_prep_pending_entry() could trigger a failed
assertion:
MemoryContextAllocZero()->"CritSectionCount == 0 ||
(context)->allowInCritSection"
"
and implemented up to v4 is a viable approach?
Yes-ish. I think it would be better to make it slightly more general
than
that, handling this for all types of backend stats, not just for WAL.
Agreed to use the same concept for all these parts of the backend
stats kind rather than two of them. Will send a reply on the original
backend I/O thread as well.PFA v6 that now relies on the new PendingBackendStats variable introduced
in
4feba03d8b9.Remark: I moved PendingBackendStats back to pgstat.h because I think that
the
"simple" pending stats increment that we are adding in xlog.c are not worth
an extra function call overhead (while it made more sense for the more
complex IO
stats handling). So PendingBackendStats is now visible to the outside
world like
PendingWalStats and friends.Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Wed, Jan 29, 2025 at 01:14:09PM +0530, Rahila Syed wrote:
Hi,
Thank you for the patchset. Having per-backend WAL statistics,
in addition to cluster-wide ones, is useful.
Thanks for looking at it!
I had a few comments while looking at v6-0003-* patch.
+ /* + * This could be an auxiliary process but these do not report backend + * statistics due to pgstat_tracks_backend_bktype(), so there is no need + * for an extra call to AuxiliaryPidGetProc(). + */ + if (!proc) + PG_RETURN_NULL();Maybe an explicit call to AuxiliaryPidGetProc() followed by a check
for pgstat_tracks_backend_bktype() would be more maintainable.
Since the processes tracked by AuxiliaryPidGetProc and
pgstat_tracks_backend_bktype might diverge in future.
I think that could make sense but that might need a separate thread as this
is not only related to this patch (already done that way in pg_stat_reset_backend_stats()
and pg_stat_get_backend_io()).
On that note, it is not clear to me why the WAL writer statistics are not
included in per backend
wal statistics? I understand the same limitation currently exists in
pgstats_track_io_bktype(), but why does that need to be extended to
WAL statistics?
WAL writer might be fine but that would not add that much value here because
it's going to appear anyway in pg_stat_io once Nazir's patch [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com gets in.
+ <primary>pg_stat_get_backend_wal</primary> + </indexterm> + <function>pg_stat_get_backend_wal</function> ( <type>integer</type> ) + <returnvalue>record</returnvalue> + </para> Should the naming describe what is being returned more clearly? Something like pg_stat_get_backend_wal_activity()? Currently it suggests that it returns a backend's WAL, which is not the case.
Not sure. It aligns with pg_stat_get_backend_io() and the "stat" in its name
suggests this is related to stats.
+ if (pgstat_tracks_backend_bktype(MyBackendType)) + { + PendingBackendStats.pending_wal.wal_write++; + + if (track_wal_io_timing) + INSTR_TIME_ACCUM_DIFF(PendingBackendStats.pending_wal.wal_write_time, + end, start); + } At the risk of nitpicking, may I suggest moving the above code, which is under the track_wal_io_timing check, to the existing check before this added chunk? This way, all code related to track_wal_io_timing will be grouped together, closer to where the "end" variable is computed.
I think we're waiting [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com to be in before moving forward with this patch. I think
that [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com also touches this part of the code. I'll keep your remark in mind and
see if it still makes sense once [1]/messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com gets in.
[1]: /messages/by-id/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi,
On Thu, Jan 23, 2025 at 09:57:50AM +0000, Bertrand Drouvot wrote:
On Thu, Jan 23, 2025 at 05:05:30PM +0900, Michael Paquier wrote:
As far I got it from a code
read, prevWalUsage, prevBackendWalUsage and their local trackings in
pgstat_backend.c and pgstat_wal.c rely on instrument.c as the primary
source, as pgWalUsage can never be reset. Is that right?yeah, IIUC pgWalUsage acts as the primary source that both prevWalUsage and
prevBackendWalUsage diff against to calculate incremental stats.
Now that a051e71e28a is in, I think that we can reduce the scope of this patch
(i.e reduce the number of stats provided by pg_stat_get_backend_wal()).
I think we can keep:
wal_records
wal_fpi
wal_bytes (because it differs from write_bytes in pg_stat_get_backend_io())
wal_buffers_full
The first 3 are in the WalUsage struct.
I think that:
wal_write (and wal_write_time)
wal_sync (and wal_sync_time)
can be extracted from pg_stat_get_backend_io(), so there is no need to duplicate
this information. The same comment could be done for pg_stat_wal and pg_stat_io
though, but pg_stat_wal already exists so removing fields has not the same
effect.
What are you thoughts about keeping in pg_stat_get_backend_wal() only the
4 stats mentioned above?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Feb 04, 2025 at 08:49:41AM +0000, Bertrand Drouvot wrote:
I think that:
wal_write (and wal_write_time)
wal_sync (and wal_sync_time)
Right. We are not able to get this data from XLogWrite() and
issue_xlog_fsync(), so there is no need to duplicate that anymore in
your patch.
can be extracted from pg_stat_get_backend_io(), so there is no need to duplicate
this information. The same comment could be done for pg_stat_wal and pg_stat_io
though, but pg_stat_wal already exists so removing fields has not the same
effect.What are you thoughts about keeping in pg_stat_get_backend_wal() only the
4 stats mentioned above?
wal_buffers_full is incremented in AdvanceXLInsertBuffer(), part of
PendingWalStats. wal_records, wal_fpi and wal_bytes are part of the
instrumentation field. It looks to me that if you discard the
wal_buffers_full part, the implementation of the data in the backend
could just be tied to the fields coming from WalUsage.
Actually, could it actually be useful to have wal_buffers_full be
available in WalUsage, so as it would show up in EXPLAIN in a
per-query basis with show_wal_usage()? Consolidating that would make
what you are trying it a bit easier, because we would have the
WalUsage and the pg_stat_io parts without any of the PendingWalStats
part. And it is just a counter not that expensive to handle, like the
data for records, fpis and bytes. This extra information could be
useful to have in the context of an EXPLAIN.
--
Michael