Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity

Started by Alex Friedman11 months ago7 messages
#1Alex Friedman
alexf01@gmail.com
1 attachment(s)

Hi,

This small doc change patch is following up on a past discussion about
discrepancies between state and wait_event in pg_stat_activity:

/messages/by-id/ab1c0a7d-e789-5ef5-1180-42708ac6fe2d@postgrespro.ru

As this kind of question is raised by PG users from time to time, the goal is to
clarify that such discrepancies are to be expected. The attached patch reuses
Robert Haas's eloquent wording from his response in the above thread. I've tried
to keep it short and to the point, but it can be made more verbose if needed.

Best regards,

Alex Friedman

Attachments:

v1-0001-Clarify-possibility-of-ephemeral-discrepancies-be.patchtext/plain; charset=UTF-8; name=v1-0001-Clarify-possibility-of-ephemeral-discrepancies-be.patchDownload
From 3cab620d67d200ff4ccb1870f63cbf75a50d0df6 Mon Sep 17 00:00:00 2001
From: Alex Friedman <alexf01@gmail.com>
Date: Wed, 26 Feb 2025 19:59:59 +0200
Subject: [PATCH v1] Clarify possibility of ephemeral discrepancies between
 state and wait_event in pg_stat_activity.

---
 doc/src/sgml/monitoring.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..57fcd8ab52b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1016,7 +1016,9 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
     it may or may not be <literal>waiting</literal> on some event.  If the state
     is <literal>active</literal> and <structfield>wait_event</structfield> is non-null, it
     means that a query is being executed, but is being blocked somewhere
-    in the system.
+    in the system. To keep the reporting low-overhead, the system uses very lightweight
+    synchronization. As a result, ephemeral discrepancies between <structfield>wait_event</structfield>
+    and <structfield>state</structfield> are possible by nature.
    </para>
   </note>
 
-- 
2.41.0

#2Sami Imseih
samimseih@gmail.com
In reply to: Alex Friedman (#1)
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity

I am not sure if the wait_event vs state relationship needs to
be documented specifically. I can think of another discrepancy
such as query_id = NULL and state = active, which occurs when
the query is still being parsed and jumbled and a query_id is not yet
available. There are probably other ephemeral discrepancies
across all these fields.

Another common pattern is joining pg_stat_activity and pg_locks,
and that will have the same problem. Of course, these are different
views being joined, so maybe there isn't an expectation of 100%
accuracy, but worth calling this out as well.

If we do need to document anything, which I am not convinced we should,
it should be more generic.

--

Sami Imseih
Amazon Web Services (AWS)

#3Alex Friedman
alexf01@gmail.com
In reply to: Sami Imseih (#2)
1 attachment(s)
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity

On 26/02/2025 22:00, Sami Imseih wrote:

If we do need to document anything, which I am not convinced we should,
it should be more generic.

Thanks for the feedback, I've attached a v2 patch which has wording that's a bit
more generic.

It's also worth noting that pg_locks already has a full paragraph explaining
inconsistencies, so in my opinion it's worth it at least mentioning something
similar here for pg_stat_activity.

Best regards,

Alex Friedman

Attachments:

v2-0001-Clarify-possibility-of-ephemeral-discrepancies-be.patchtext/plain; charset=UTF-8; name=v2-0001-Clarify-possibility-of-ephemeral-discrepancies-be.patchDownload
From fbbfc623e16ed97176c0ccf0ebc534d118e9f252 Mon Sep 17 00:00:00 2001
From: Alex Friedman <alexf01@gmail.com>
Date: Wed, 26 Feb 2025 19:59:59 +0200
Subject: [PATCH v2] Clarify possibility of ephemeral discrepancies between
 state and wait_event in pg_stat_activity.

---
 doc/src/sgml/monitoring.sgml | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..de49769d407 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1016,7 +1016,11 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
     it may or may not be <literal>waiting</literal> on some event.  If the state
     is <literal>active</literal> and <structfield>wait_event</structfield> is non-null, it
     means that a query is being executed, but is being blocked somewhere
-    in the system.
+    in the system. To keep the reporting low-overhead, the system uses very lightweight
+    synchronization. As a result, ephemeral discrepancies between the view's columns,
+    for example between <structfield>wait_event</structfield> and
+    <structfield>state</structfield>, or between <structfield>state</structfield> and
+    <structfield>query_id</structfield>, are possible by nature.
    </para>
   </note>
 
-- 
2.41.0

#4Sami Imseih
samimseih@gmail.com
In reply to: Alex Friedman (#3)
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity

It's also worth noting that pg_locks already has a full paragraph explaining
inconsistencies, so in my opinion it's worth it at least mentioning something
similar here for pg_stat_activity.

yes, that is a different consistency from the one I was referring to with
regards to a join between pg_locks and pg_stat_activity, but I do
agree that it is worth calling out the expectation for pg_stat_activity.

Thanks for the feedback, I've attached a v2 patch which has wording that's a bit
more generic.

A few comments. I don't like the use of "lightweight" here as it is
usually referring
to LWLocks ( lightweight locks ), which can cause confusion. Also,if
we are going
to mention specific examples, I think we will need to explain further what the
discrepancy will look like. What about we do something much more
simplified, such
as the below:

"""
To keep the reporting overhead low, the system does not attempt to synchronize
activity data for a backend. As a result, ephemeral discrepancies may
exist between
the view’s columns.
"""

--
Sami Imseih
Amazon Web Services (AWS)

#5Alex Friedman
alexf01@gmail.com
In reply to: Sami Imseih (#4)
1 attachment(s)
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity

discrepancy will look like. What about we do something much more
simplified, such
as the below:

"""
To keep the reporting overhead low, the system does not attempt to synchronize
activity data for a backend. As a result, ephemeral discrepancies may
exist between
the view’s columns.
"""

Yes, I believe it makes sense to make it more generic. Attached v3 with a slight
tweak:

+    in the system. To keep the reporting overhead low, the system does not 
attempt to
+    synchronize different aspects of activity data for a backend. As a result, 
ephemeral
+    discrepancies may exist between the view's columns.

Best regards,

Alex Friedman

Attachments:

v3-0001-Clarify-possibility-of-ephemeral-discrepancies-be.patchtext/plain; charset=UTF-8; name=v3-0001-Clarify-possibility-of-ephemeral-discrepancies-be.patchDownload
From 58de88469f6201ae698ee34debcdec028526a72a Mon Sep 17 00:00:00 2001
From: Alex Friedman <alexf01@gmail.com>
Date: Wed, 26 Feb 2025 19:59:59 +0200
Subject: [PATCH v3] Clarify possibility of ephemeral discrepancies between
 state and wait_event in pg_stat_activity.

---
 doc/src/sgml/monitoring.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 9178f1d34ef..0e34b3509b8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1016,7 +1016,9 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
     it may or may not be <literal>waiting</literal> on some event.  If the state
     is <literal>active</literal> and <structfield>wait_event</structfield> is non-null, it
     means that a query is being executed, but is being blocked somewhere
-    in the system.
+    in the system. To keep the reporting overhead low, the system does not attempt to
+    synchronize different aspects of activity data for a backend. As a result, ephemeral
+    discrepancies may exist between the view's columns.
    </para>
   </note>
 
-- 
2.41.0

#6Sami Imseih
samimseih@gmail.com
In reply to: Alex Friedman (#5)
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity

Thanks for the update. This LGTM! and I will mark as RFC.

--
Sami

#7Michael Paquier
michael@paquier.xyz
In reply to: Sami Imseih (#6)
Re: Doc: clarify possibility of ephemeral discrepancies between state and wait_event in pg_stat_activity

On Mon, Mar 03, 2025 at 11:35:15AM -0600, Sami Imseih wrote:

Thanks for the update. This LGTM! and I will mark as RFC.

Yes, agreed that there is no specific need to be precise about the
attributes that can become inconsistent, as this would also depend on
the addition of more states, or event more attributes.

The wording of v3 is OK by me, so applied. Perhaps this could be
tweaked more, so if there are any comments, feel free.
--
Michael