Add per-backend lock statistics
Hi hackers,
Now that we have global lock statistics since 4019f725f5d, it could be useful
to have the same kind of information on a per-backend basis.
Indeed, pg_stat_lock gives us cluster-wide aggregates: total waits, total wait
time, total fast-path exceeded across all backends since last reset.
When we see high numbers, we can't answer:
- Which backend is affected the most?
- Is it one backend affected or many?
- Is a specific application or connection pool suffering?
- After a specific workload/application is improved, did its lock behavior
improve?
With per-backend lock stats, we could:
1/ Isolate problematic sessions. We can correlate locks behavior with specific
PIDs visible in pg_stat_activity: identify the exact application_name or user
experiencing lock waits.
2/ Debug live contention. During an incident, we could pinpoint which backends
are experiencing fast-path exhaustion or lock waits without having to reset
global stats and lose history.
3/ Define workload characterization. Different backend types may have very
different lock profiles. Per-backend stats would let us see this directly.
4/ Compare before/after per session. We could measure a single backend's lock
behavior across a specific operation, which is impossible with global counters
that include metrics from all other backends.
IO and WAL stats already have per-backend counterparts (pg_stat_get_backend_io(),
pg_stat_get_backend_wal()). Lock stats are the same class of operational data:
having them only at the global level is an inconsistency that limits observability.
As far the technical implementation:
This data can be retrieved with a new system function called
pg_stat_get_backend_lock(), that returns one tuple per lock type based on the PID
provided in input.
pgstat_flush_backend() gains a new flag value, able to control the flush of the
lock stats.
This patch relies mostly on the infrastructure provided by 9aea73fc61d4, that
has introduced backend statistics.
The overhead (2 functions calls and counters increments) on the hot path (normal
lock acquisition) is zero: counters are only incremented on paths that are already
"slow" (post deadlock timeout waits, fast-path slot exhaustion) and does not add
that much memory per-backend: PgStat_PendingLock is 288 bytes.
The patch is made of 2 sub-patches:
0001: Refactor pg_stat_get_lock() to use a helper function
Extract the tuple-building logic from pg_stat_get_lock() into a new
static helper pg_stat_lock_build_tuples(). This is in preparation for
pg_stat_get_backend_lock() which will reuse the same helper, following
the pattern established by pg_stat_io_build_tuples() for IO stats and
pg_stat_wal_build_tuple() for WAL stats.
0002: Add per-backend lock statistics
As discussed above.
Looking forward to your feedback,
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed Jun 3, 2026 at 1:59 PM UTC, Bertrand Drouvot wrote:
Hi hackers,
Now that we have global lock statistics since 4019f725f5d, it could be useful
to have the same kind of information on a per-backend basis.Indeed, pg_stat_lock gives us cluster-wide aggregates: total waits, total wait
time, total fast-path exceeded across all backends since last reset.When we see high numbers, we can't answer:
- Which backend is affected the most?
- Is it one backend affected or many?
- Is a specific application or connection pool suffering?
- After a specific workload/application is improved, did its lock behavior
improve?With per-backend lock stats, we could:
1/ Isolate problematic sessions. We can correlate locks behavior with specific
PIDs visible in pg_stat_activity: identify the exact application_name or user
experiencing lock waits.2/ Debug live contention. During an incident, we could pinpoint which backends
are experiencing fast-path exhaustion or lock waits without having to reset
global stats and lose history.3/ Define workload characterization. Different backend types may have very
different lock profiles. Per-backend stats would let us see this directly.4/ Compare before/after per session. We could measure a single backend's lock
behavior across a specific operation, which is impossible with global counters
that include metrics from all other backends.IO and WAL stats already have per-backend counterparts (pg_stat_get_backend_io(),
pg_stat_get_backend_wal()). Lock stats are the same class of operational data:
having them only at the global level is an inconsistency that limits observability.
The motivation makes sense to me.
As far the technical implementation:
This data can be retrieved with a new system function called
pg_stat_get_backend_lock(), that returns one tuple per lock type based on the PID
provided in input.pgstat_flush_backend() gains a new flag value, able to control the flush of the
lock stats.This patch relies mostly on the infrastructure provided by 9aea73fc61d4, that
has introduced backend statistics.The overhead (2 functions calls and counters increments) on the hot path (normal
lock acquisition) is zero: counters are only incremented on paths that are already
"slow" (post deadlock timeout waits, fast-path slot exhaustion) and does not add
that much memory per-backend: PgStat_PendingLock is 288 bytes.The patch is made of 2 sub-patches:
0001: Refactor pg_stat_get_lock() to use a helper function
+static void +pg_stat_lock_build_tuples(ReturnSetInfo *rsinfo, + PgStat_LockEntry *lock_stats, + TimestampTz stat_reset_timestamp)
I think that the alignment of the second and third arguments could be
off by one. They should line up with the capital R in ReturnSetInfo.
- values[i] = TimestampTzGetDatum(lock_stats->stat_reset_timestamp); + if (stat_reset_timestamp != 0) + values[i] = TimestampTzGetDatum(stat_reset_timestamp); + else + nulls[i] = true;
It's not super clear to me why this changed in the first patch. Perhaps
it is meant to be in the second patch? I see in the second patch that we
use the stat_reset_timestamp from the backend stats instead of the lock
stats in pg_stat_get_backend_lock(). The motivation makes sense. It
might be cleaner to move the change into patch 2.
0002: Add per-backend lock statistics
+ Returns lock statistics about the backend with the specified + process ID. The output fields are exactly the same as the ones in the + <structname>pg_stat_lock</structname> view.
It probably makes sense to link to pg_stat_lock here.
Other than the few comments I had, this patchset looks good. It follows
patterns that were already established with the per-backend IO and WAL
stats.
--
Tristan Partin
PostgreSQL Contributors Team
AWS (https://aws.amazon.com)
Hi,
On Thu, Jun 04, 2026 at 09:32:39PM +0000, Tristan Partin wrote:
On Wed Jun 3, 2026 at 1:59 PM UTC, Bertrand Drouvot wrote:
The motivation makes sense to me.
Thanks for looking at it and sharing your thoughts!
0001: Refactor pg_stat_get_lock() to use a helper function
+static void +pg_stat_lock_build_tuples(ReturnSetInfo *rsinfo, + PgStat_LockEntry *lock_stats, + TimestampTz stat_reset_timestamp)I think that the alignment of the second and third arguments could be
off by one. They should line up with the capital R in ReturnSetInfo.
They look ok to me in the C file, what about you?
- values[i] = TimestampTzGetDatum(lock_stats->stat_reset_timestamp); + if (stat_reset_timestamp != 0) + values[i] = TimestampTzGetDatum(stat_reset_timestamp); + else + nulls[i] = true;It's not super clear to me why this changed in the first patch.
It's to make less "noise" in the second patch and keep the second patch focusing
only on the "new feature". It's to ease to review but could be merged before
being pushed would the commiter decides to do so.
0002: Add per-backend lock statistics
+ Returns lock statistics about the backend with the specified + process ID. The output fields are exactly the same as the ones in the + <structname>pg_stat_lock</structname> view.It probably makes sense to link to pg_stat_lock here.
Not sure as that would not be consistent with pg_stat_get_backend_io and
pg_stat_get_backend_wal descriptions in monitoring.sgml.
Other than the few comments I had, this patchset looks good. It follows
patterns that were already established with the per-backend IO and WAL
stats.
Thanks!
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Fri Jun 5, 2026 at 8:29 AM UTC, Bertrand Drouvot wrote:
Hi,
On Thu, Jun 04, 2026 at 09:32:39PM +0000, Tristan Partin wrote:
On Wed Jun 3, 2026 at 1:59 PM UTC, Bertrand Drouvot wrote:
0001: Refactor pg_stat_get_lock() to use a helper function
+static void +pg_stat_lock_build_tuples(ReturnSetInfo *rsinfo, + PgStat_LockEntry *lock_stats, + TimestampTz stat_reset_timestamp)I think that the alignment of the second and third arguments could be
off by one. They should line up with the capital R in ReturnSetInfo.
Probably just my editor being weird if it looks good to you!
They look ok to me in the C file, what about you?
- values[i] = TimestampTzGetDatum(lock_stats->stat_reset_timestamp); + if (stat_reset_timestamp != 0) + values[i] = TimestampTzGetDatum(stat_reset_timestamp); + else + nulls[i] = true;It's not super clear to me why this changed in the first patch.
It's to make less "noise" in the second patch and keep the second patch focusing
only on the "new feature". It's to ease to review but could be merged before
being pushed would the commiter decides to do so.
Sounds good. To me it made the review a little more difficult, but
I understand the motivation.
0002: Add per-backend lock statistics
+ Returns lock statistics about the backend with the specified + process ID. The output fields are exactly the same as the ones in the + <structname>pg_stat_lock</structname> view.It probably makes sense to link to pg_stat_lock here.
Not sure as that would not be consistent with pg_stat_get_backend_io and
pg_stat_get_backend_wal descriptions in monitoring.sgml.
Makes sense. Maybe we can update that in a future documentation update.
I'll go ahead and submit something in a separate thread.
--
Tristan Partin
PostgreSQL Contributors Team
AWS (https://aws.amazon.com)
On Wed, Jun 03, 2026 at 01:58:41PM +0000, Bertrand Drouvot wrote:
0001: Refactor pg_stat_get_lock() to use a helper function
Extract the tuple-building logic from pg_stat_get_lock() into a new
static helper pg_stat_lock_build_tuples(). This is in preparation for
pg_stat_get_backend_lock() which will reuse the same helper, following
the pattern established by pg_stat_io_build_tuples() for IO stats and
pg_stat_wal_build_tuple() for WAL stats.
- values[i] = TimestampTzGetDatum(lock_stats->stat_reset_timestamp);
+ if (stat_reset_timestamp != 0)
+ values[i] = TimestampTzGetDatum(stat_reset_timestamp);
+ else
+ nulls[i] = true;
Wait a minute here. I was wondering for a couple of minutes if we
should do that on HEAD as well, but we have reset_after_failure that
would set it to a nice value for the persistent part of the data..
That looks OK.
0002: Add per-backend lock statistics
+/* used by pgstat_lock.c for lock stats tracked in backends */
+extern void pgstat_count_backend_lock_waits(uint8 locktag_type, long msecs);
+extern void pgstat_count_backend_lock_fastpath_exceeded(uint8 locktag_type);
extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);
Nit. pgstat_fetch_stat_backend() and routines listed below are not
related to pgstat_lock.c. Add a newline perhaps?
I don't see much popping out on a closer read of 0002 (well, we've
discussed this patch and being able to see the balancing of lock
acquisitions across live backends is something that can be handy). As
far as I can see, you rely on the same infra as what has been done for
IO and WAL. Nice to see backend_has_lockstats being kept local to
pgstat_backend.c.
--
Michael
Hi,
On Wed, Jun 24, 2026 at 02:57:47PM +0900, Michael Paquier wrote:
On Wed, Jun 03, 2026 at 01:58:41PM +0000, Bertrand Drouvot wrote:
0001: Refactor pg_stat_get_lock() to use a helper function
Extract the tuple-building logic from pg_stat_get_lock() into a new
static helper pg_stat_lock_build_tuples(). This is in preparation for
pg_stat_get_backend_lock() which will reuse the same helper, following
the pattern established by pg_stat_io_build_tuples() for IO stats and
pg_stat_wal_build_tuple() for WAL stats.- values[i] = TimestampTzGetDatum(lock_stats->stat_reset_timestamp); + if (stat_reset_timestamp != 0) + values[i] = TimestampTzGetDatum(stat_reset_timestamp); + else + nulls[i] = true;Wait a minute here. I was wondering for a couple of minutes if we
should do that on HEAD as well, but we have reset_after_failure that
would set it to a nice value for the persistent part of the data..
Right and per-backend stats are not written to disk, hence the need here.
0002: Add per-backend lock statistics
+/* used by pgstat_lock.c for lock stats tracked in backends */ +extern void pgstat_count_backend_lock_waits(uint8 locktag_type, long msecs); +extern void pgstat_count_backend_lock_fastpath_exceeded(uint8 locktag_type); extern PgStat_Backend *pgstat_fetch_stat_backend(ProcNumber procNumber);Nit. pgstat_fetch_stat_backend() and routines listed below are not
related to pgstat_lock.c. Add a newline perhaps?
Yeah. This is not introduced by the patch, as it's currently not related to
pgstat_io.c on HEAD either, but let's clean it in passing. Done in v2 (that's
the only change compared to v1).
far as I can see, you rely on the same infra as what has been done for
IO and WAL.
Right.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi Bertrand-san,
I tested the patch locally and did not find any functional issue.
I have two suggestions below.
== Doc suggestion ==
When the workload uses parallel scans, pg_stat_lock.fastpath_exceeded
grows more than what pg_stat_get_backend_lock shows for any
individual pid. The gap is the parallel workers' contribution:
each worker locks the valid subplans independently, accumulates into
its own per-backend entry, and the entry is dropped at worker exit
-- so the contribution is not folded into the leader's per-backend
view.
This is a property of the per-backend stats infrastructure rather
than something this patch introduces, but since one of the stated
motivations is "Isolate problematic sessions", users may
intuitively expect parallel-worker contributions to be visible
under the leader's pid. A short note in the docs of the per-backend
functions clarifying that
parallel-worker contributions are not aggregated into the leader's
entry would help avoid that misunderstanding.
== Column suggestion for pg_stat_lock ==
pg_stat_io has a backend_type column, which lets users still see
parallel-worker contributions in aggregate (via WHERE
backend_type='background worker') after workers exit. pg_stat_lock
has only locktype, so worker contributions blend into the relation
row and cannot be separated even in aggregate.
This may be out of scope for the present patch, but I wonder if
adding a backend_type axis to pg_stat_lock could be considered in a
follow-up patch. It would give an alternative attribution path
(similar to pg_stat_io's backend_type column) when per-backend
statistics cannot help.
Regards,
Tatsuya Kawata
Hi Kawata-san,
On Wed, Jun 24, 2026 at 10:49:44PM +0900, Tatsuya Kawata wrote:
Hi Bertrand-san,
I tested the patch locally and did not find any functional issue.
Thanks!
I have two suggestions below.
== Doc suggestion ==
When the workload uses parallel scans, pg_stat_lock.fastpath_exceeded
grows more than what pg_stat_get_backend_lock shows for any
individual pid. The gap is the parallel workers' contribution:
each worker locks the valid subplans independently, accumulates into
its own per-backend entry, and the entry is dropped at worker exit
-- so the contribution is not folded into the leader's per-backend
view.This is a property of the per-backend stats infrastructure rather
than something this patch introduces, but since one of the stated
motivations is "Isolate problematic sessions", users may
intuitively expect parallel-worker contributions to be visible
under the leader's pid. A short note in the docs of the per-backend
functions clarifying that
parallel-worker contributions are not aggregated into the leader's
entry would help avoid that misunderstanding.
That's right, and the same could be said for per-backend I/O and WAL stats.
The stats are flushed when the transaction finish and then are visible from that
moment. The stats are gone once the backend exit. For parallel workers this window
is very short (between the flush and the exit) so that we can say that their stats
are not visible in practice.
I think that flushing statistics within running transactions [1]/messages/by-id/CAA5RZ0uA-4qcD3+2hjcE_-zQUBhvWf5foPM2vzYneFKrJLsBDQ@mail.gmail.com could help to
see what's going on for parallel workers too.
That said, I'm not sure the doc needs any clarifications given that those functions
take a PID as parameter and that they state something like "Returns I/O
/WAL statistics about the backend with the specified process ID".
== Column suggestion for pg_stat_lock ==
pg_stat_io has a backend_type column, which lets users still see
parallel-worker contributions in aggregate (via WHERE
backend_type='background worker') after workers exit. pg_stat_lock
has only locktype, so worker contributions blend into the relation
row and cannot be separated even in aggregate.This may be out of scope for the present patch, but I wonder if
adding a backend_type axis to pg_stat_lock could be considered in a
follow-up patch. It would give an alternative attribution path
(similar to pg_stat_io's backend_type column) when per-backend
statistics cannot help.
It's not related to this thread so that might be worth a dedicated one but I'm
not sure that would be more actionable while consuming more resources.
[1]: /messages/by-id/CAA5RZ0uA-4qcD3+2hjcE_-zQUBhvWf5foPM2vzYneFKrJLsBDQ@mail.gmail.com
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com