Flush some statistics within running transactions

Started by Bertrand Drouvotabout 2 months ago59 messages

bertranddrouvot.pg@gmail.com

about 2 months ago

Hi hackers,

Long running transactions can accumulate significant statistics (WAL, IO, ...)
that remain unflushed until the transaction ends. This delays visibility of
resource usage in monitoring views like pg_stat_io and pg_stat_wal.

This patch series introduce the ability to $SUBJECT (suggested in [1]/messages/by-id/erpzwxoptqhuptdrtehqydzjapvroumkhh7lc6poclbhe7jk7l@l3yfsq5q4pw7) to:

- improve monitoring of long running transactions
- avoid missing places where we should flush statistics (like the one fixed in
039549d70f6)

The patch series is made of 3 sub-patches:

0001: Add pgstat_report_anytime_stat() for periodic stats flushing

It introduces pgstat_report_anytime_stat(), which flushes non transactional
statistics even inside active transactions. A new timeout handler fires every
second to call this function, ensuring timely stats visibility without waiting
for transaction completion.

Implementation details:

- Add PgStat_FlushBehavior enum to classify stats kinds:
* FLUSH_ANYTIME: Stats that can always be flushed (WAL, IO, ...)
* FLUSH_AT_TXN_BOUNDARY: Stats requiring transaction boundaries

- Modify pgstat_flush_pending_entries() and pgstat_flush_fixed_stats() to accept
a boolean anytime_only parameter:
* When false: flushes all stats (existing behavior)
* When true: flushes only FLUSH_ANYTIME stats and skips FLUSH_AT_TXN_BOUNDARY
stats

- Register ANYTIME_STATS_UPDATE_TIMEOUT that fires every 1 second, calling
pgstat_report_anytime_stat(false)

Remarks:

- The force parameter in pgstat_report_anytime_stat() is currently unused (always
called with force=false) but reserved for future use cases requiring immediate flushing.

The 1 second flush interval is currently hardcoded but we could imagine increase
it or make it configurable. I ran some benchmarks and did not notice any noticeable
performance regression even with a large number of pending entries.

0002: Remove useless calls to flush some stats

Now that some stats can be flushed outside of transaction boundaries, remove
useless calls to flush some stats. Those calls were in place because
before 0001 stats were flushed only at transaction boundaries.

Remarks:

- it reverts 039549d70f6 (it just keeps its tests)
- it can't be done for checkpointer and bgworker for example because they don't
have a flush callback to call
- it can't be done for auxiliary process (walsummarizer for example) because they
currently do not register the new timeout handler
- we may want to improve the current behavior to "fix" the 2 above

0003: Add FLUSH_MIXED support and implement it for RELATION stats

This patch extends the non transactional stats infrastructure to support statistics
kinds with mixed transaction behavior: some fields are transactional (e.g., tuple
inserts/updates/deletes) while others are non transactional (e.g., sequential scans
blocks read, ...).

It introduces FLUSH_MIXED as a third flush behavior type, alongside FLUSH_ANYTIME
and FLUSH_AT_TXN_BOUNDARY. For FLUSH_MIXED kinds, a new flush_anytime_cb callback
enables partial flushing of only the non transactional fields during running
transactions.

Some tests are also added.

Implementation details:

- Add FLUSH_MIXED to PgStat_FlushBehavior enum
- Add flush_anytime_cb to PgStat_KindInfo for partial flushing callback
- Update pgstat_flush_pending_entries() to call flush_anytime_cb for
FLUSH_MIXED entries when in anytime_only mode
- Keep FLUSH_MIXED entries in the pending list after partial flush, as
transactional fields still need to be flushed at transaction boundary

RELATION stats are making use of FLUSH_MIXED:

- Change RELATION from TXN_ALL to FLUSH_MIXED
- Implement pgstat_relation_flush_anytime_cb() to flush only read related
stats: numscans, tuples_returned, tuples_fetched, blocks_fetched,
blocks_hit
- Clear these fields after flushing to prevent double counting when
pgstat_relation_flush_cb() runs at transaction commit
- Transactional stats (tuples_inserted, tuples_updated, tuples_deleted,
live_tuples, dead_tuples) remain pending until transaction boundary

Remark:

We could also imagine adding a new flush_anytime_static_cb() callback for
future FLUSH_MIXED fixed amount stats.

[1]: /messages/by-id/erpzwxoptqhuptdrtehqydzjapvroumkhh7lc6poclbhe7jk7l@l3yfsq5q4pw7

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Bertrand Drouvot (#1)

Re: Flush some statistics within running transactions

Hi,

Thanks for these patches!

I took a quick look at the patches and I have some general comments.

Long running transactions can accumulate significant statistics (WAL, IO, ...)
that remain unflushed until the transaction ends. This delays visibility of
resource usage in monitoring views like pg_stat_io and pg_stat_wal.

+1. I do think this is a good idea. Long-running transactions cause accumulated
stats to appear as spikes in monitoring tools rather than as gradual activity.
This would help level out, though not eliminate, those artificial spikes.

The 1 second flush interval is currently hardcoded but we could imagine increase
it or make it configurable.

Someone may want to turn this off as well. I think a GUC will be needed.

RELATION stats are making use of FLUSH_MIXED:

stats: numscans, tuples_returned, tuples_fetched, blocks_fetched,
blocks_hit

I’m concerned that fields being temporarily out of sync might impact monitoring
calculations, if the formula is dealing with fields that have
different flush strategies.
That said, minor discrepancies are usually tolerable for monitoring
data analysis.

For the numscans, should we not also update the scan timestamp?

--
Sami Imseih
Amazon Web Services (AWS)

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Sami Imseih (#2)

Re: Flush some statistics within running transactions

Hi,

On Wed, Jan 14, 2026 at 09:54:17PM -0600, Sami Imseih wrote:

I took a quick look at the patches and I have some general comments.

Thanks!

Long running transactions can accumulate significant statistics (WAL, IO, ...)
that remain unflushed until the transaction ends. This delays visibility of
resource usage in monitoring views like pg_stat_io and pg_stat_wal.

+1. I do think this is a good idea. Long-running transactions cause accumulated
stats to appear as spikes in monitoring tools rather than as gradual activity.
This would help level out, though not eliminate, those artificial spikes.

Yeah.

The 1 second flush interval is currently hardcoded but we could imagine increase
it or make it configurable.

Someone may want to turn this off as well. I think a GUC will be needed.

I gave this more thoughts and I wonder if this should be configurable at all.
I mean, we don't do it for PGSTAT_MIN_INTERVAL, PGSTAT_MAX_INTERVAL and
PGSTAT_IDLE_INTERVAL. We could imagine make it configurable if it produces
noticeable performance impact but that's not what I observed.

RELATION stats are making use of FLUSH_MIXED:

stats: numscans, tuples_returned, tuples_fetched, blocks_fetched,
blocks_hit

I’m concerned that fields being temporarily out of sync might impact monitoring
calculations, if the formula is dealing with fields that have
different flush strategies.

That's a good point. Maybe we should document the fields flush strategy?

That said, minor discrepancies are usually tolerable for monitoring
data analysis.

For the numscans, should we not also update the scan timestamp?

The problem is that we could not call GetCurrentTransactionStopTimestamp(), so
we would need to call GetCurrentTimestamp() instead. I'm not sure that calling
GetCurrentTimestamp() every second would be a real issue though, and if it is
maybe we could increase this 1s value.

That said I agree that having seq_scan being updated and not last_seq_scan is not
that great.

Maybe we should keep this in mind and see what to do depending where this thread
is going (I mean if the current proposed design has to be changed).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Bertrand Drouvot (#3)

Re: Flush some statistics within running transactions

The 1 second flush interval is currently hardcoded but we could imagine increase
it or make it configurable.

Someone may want to turn this off as well. I think a GUC will be needed.

I gave this more thoughts and I wonder if this should be configurable at all.
I mean, we don't do it for PGSTAT_MIN_INTERVAL, PGSTAT_MAX_INTERVAL and
PGSTAT_IDLE_INTERVAL. We could imagine make it configurable if it produces
noticeable performance impact but that's not what I observed.

Is there a reason we need a new constant (PGSTAT_ANYTIME_FLUSH_INTERVAL)
for anytime flushes and can't rely on the existing PGSTAT_MIN_INTERVAL?

Also, How did you benchmark? I am less concerned about long running
transactions,
background processes and more about short/high concurrency transactions seeing
additional overhead due to additional flushing. Is that latter a concern?

stats: numscans, tuples_returned, tuples_fetched, blocks_fetched,
blocks_hit

I’m concerned that fields being temporarily out of sync might impact monitoring
calculations, if the formula is dealing with fields that have
different flush strategies.

That's a good point. Maybe we should document the fields flush strategy?

Yeah, we will need to document this.

That said, minor discrepancies are usually tolerable for monitoring
data analysis.

For the numscans, should we not also update the scan timestamp?

The problem is that we could not call GetCurrentTransactionStopTimestamp(), so
we would need to call GetCurrentTimestamp() instead. I'm not sure that calling
GetCurrentTimestamp() every second would be a real issue though, and if it is
maybe we could increase this 1s value.

That said I agree that having seq_scan being updated and not last_seq_scan is not
that great.

with v3 , I checked by running seq scans in a long running transaction,
and I observed both for these values being updated at the same time. I think
this is OK.

# pgstat_relation_flush_anytime_cb
```
tabentry->numscans += lstats->counts.numscans;
if (lstats->counts.numscans)
{
TimestampTz t = GetCurrentTimestamp();

if (t > tabentry->lastscan)
tabentry->lastscan = t;
}
```
and

# pgstat_relation_flush_cb
```
if (lstats->counts.numscans)
{
TimestampTz t = GetCurrentTransactionStopTimestamp();

if (t > tabentry->lastscan)
tabentry->lastscan = t;
}
```

--
Sami Imseih
Amazon Web Services (AWS)

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Sami Imseih (#4)

Re: Flush some statistics within running transactions

with v3 , I checked by running seq scans in a long running transaction,

Sorry I mean 0003

--
Sami Imseih
Amazon Web Services (AWS)

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Sami Imseih (#4)

Re: Flush some statistics within running transactions

Hi,

On Thu, Jan 15, 2026 at 11:25:18AM -0600, Sami Imseih wrote:

The 1 second flush interval is currently hardcoded but we could imagine increase
it or make it configurable.

Someone may want to turn this off as well. I think a GUC will be needed.

I gave this more thoughts and I wonder if this should be configurable at all.
I mean, we don't do it for PGSTAT_MIN_INTERVAL, PGSTAT_MAX_INTERVAL and
PGSTAT_IDLE_INTERVAL. We could imagine make it configurable if it produces
noticeable performance impact but that's not what I observed.

Is there a reason we need a new constant (PGSTAT_ANYTIME_FLUSH_INTERVAL)
for anytime flushes and can't rely on the existing PGSTAT_MIN_INTERVAL?

It currently gives flexibility for testing. If we agree that 1s is the right value
to set and that it should not be configurable then yeah we could replace it with
PGSTAT_MIN_INTERVAL then.

Also, How did you benchmark? I am less concerned about long running
transactions,
background processes and more about short/high concurrency transactions seeing
additional overhead due to additional flushing. Is that latter a concern?

I ran 3 kinds of tests:

1/
pgbench -c 32 -j 4 -T 60 -f short.sql -n -r $DB

with short.sql:

\set t1 random(1, 100)
\set t2 random(1, 100)
\set t3 random(1, 100)
\set t4 random(1, 100)
\set t5 random(1, 100)
\set t6 random(1, 100)
\set t7 random(1, 100)
\set t8 random(1, 100)
\set t9 random(1, 100)
\set t10 random(1, 100)
\set row random(1, 1000)

BEGIN;
UPDATE t:t1 SET val = val + 1 WHERE id = :row;
UPDATE t:t2 SET val = val + 1 WHERE id = :row;
UPDATE t:t3 SET val = val + 1 WHERE id = :row;
UPDATE t:t4 SET val = val + 1 WHERE id = :row;
UPDATE t:t5 SET val = val + 1 WHERE id = :row;
UPDATE t:t6 SET val = val + 1 WHERE id = :row;
UPDATE t:t7 SET val = val + 1 WHERE id = :row;
UPDATE t:t8 SET val = val + 1 WHERE id = :row;
UPDATE t:t9 SET val = val + 1 WHERE id = :row;
UPDATE t:t10 SET val = val + 1 WHERE id = :row;
COMMIT;

2/
psql $DB -f long.sql

with long.sql:

DO $$
BEGIN
FOR i IN 1..100 LOOP
EXECUTE format('TRUNCATE TABLE t%s', i);
EXECUTE format('INSERT INTO t%s SELECT generate_series(1, 1000000)', i);
EXECUTE format('UPDATE t%s SET val = val + 1', i);
EXECUTE format('SELECT COUNT(1) FROM t%s', i);
END LOOP;
END $$;

3/
pgbench -i -s 50 $DB
pgbench -c 32 -j 4 -T 60 -N -n -r $DB

I don't think this feature could add a noticeable performance impact, so the tests
have been that simple. Do you think we should worry more?

I’m concerned that fields being temporarily out of sync might impact monitoring
calculations, if the formula is dealing with fields that have
different flush strategies.

That's a good point. Maybe we should document the fields flush strategy?

Yeah, we will need to document this.

Will do in the next version.

I checked by running seq scans in a long running transaction,
and I observed both for these values being updated at the same time. I think
this is OK.

I do think the same.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Bertrand Drouvot (#6)

Re: Flush some statistics within running transactions

I took a look at 0001 in depth.

I don't think this feature could add a noticeable performance impact, so the tests
have been that simple. Do you think we should worry more?

One observation is there's no coordination between ANYTIME and
TXN_BOUNDARY flushes. While PGSTAT_MIN_INTERVAL
prevents a backend from flushing more than once per second, a backend can
still perform both an ANYTIME flush and a TXN_BOUNDARY flush within
the same 1-second window. Not saying this will be a real problem in
the real-world,
but we definitely took measures in the current implementation to avoid
this scenario.

A few other comments on 0001

+               /* Skip if completely idle */
+               if (!DoingCommandRead || IsTransactionOrTransactionBlock())
+                       pgstat_report_anytime_stat(false);

Does this need to be conditional? worst case, we return right away with an empty
list. Best case, is we are consistently flushing.

+       /*
+        * When in anytime_only mode, the list may not be empty because
+        * FLUSH_AT_TXN_BOUNDARY entries were skipped.
+        */
+       Assert(!anytime_only || dlist_is_empty(&pgStatPending) ==
!have_pending);

Checking for !anytime_only is unnecessary here.
"list_is_empty(&pgStatPending) == !have_pending"
should be true regardless of ANYTIME or TXN_BOUNDARY, right?

Below are a couple of edits for comments I felt would improve
readability of the code.

1/
 /*
- * Flush non-transactional stats
- *
- * This is safe to call even inside a transaction. It only flushes stats
- * kinds marked as FLUSH_ANYTIME.
- *
- * This allows long running transactions to report activity without waiting
- * for transaction to finish.
+ * Flushes only FLUSH_ANYTIME stats using non-blocking locks. Transactional
+ * stats (FLUSH_AT_TXN_BOUNDARY) remain pending until transaction boundary.
+ * Safe to call inside transactions.
  */

2/
 typedef enum PgStat_FlushBehavior
 {
-       FLUSH_ANYTIME,                          /* All fields can
flush anytime */
-       FLUSH_AT_TXN_BOUNDARY,          /* All fields need transaction
boundary */
+       FLUSH_ANYTIME,                          /* All fields can be
flushed anytime,
+                                                                *
including within transactions */
+       FLUSH_AT_TXN_BOUNDARY,          /* All fields can only be flushed at
+                                                                *
transaction boundary */
 } PgStat_FlushBehavior;

I will start looking at the remaining patches next.

--
Sami Imseih
Amazon Web Services (AWS)

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Sami Imseih (#7)

Re: Flush some statistics within running transactions

Hi,

On Fri, Jan 16, 2026 at 10:44:48AM -0600, Sami Imseih wrote:

I took a look at 0001 in depth.

Thanks!

I don't think this feature could add a noticeable performance impact, so the tests
have been that simple. Do you think we should worry more?

One observation is there's no coordination between ANYTIME and
TXN_BOUNDARY flushes. While PGSTAT_MIN_INTERVAL
prevents a backend from flushing more than once per second, a backend can
still perform both an ANYTIME flush and a TXN_BOUNDARY flush within
the same 1-second window. Not saying this will be a real problem in
the real-world,
but we definitely took measures in the current implementation to avoid
this scenario.

Right. I think that the PGSTAT_MIN_INTERVAL throttling was put in place to prevent
flushing too frequently when the backend has a high commit rate. But here, while
it's true that we don't follow that rule (means a backend could flush more than one
time per second), that would be a maximum of 2 times (given that ANYTIME is
flushing every second). So, I'm not sure that this single extra flush is worth
worrying about. Plus we'd certainly need an extra GetCurrentTimestamp() call, so
I'm not sure it's worth it.

A few other comments on 0001
+               /* Skip if completely idle */
+               if (!DoingCommandRead || IsTransactionOrTransactionBlock())
+                       pgstat_report_anytime_stat(false);
Does this need to be conditional? worst case, we return right away with an empty
list. Best case, is we are consistently flushing.

Yeah, I think we could remove this check and just rely on the ones in
pgstat_report_anytime_stat(). Done in the attached.

+ Assert(!anytime_only || dlist_is_empty(&pgStatPending) ==
!have_pending);

Checking for !anytime_only is unnecessary here.
"list_is_empty(&pgStatPending) == !have_pending"
should be true regardless of ANYTIME or TXN_BOUNDARY, right?

Right, thanks for catching it, it was remaining garbage from my dev iterations.

Below are a couple of edits for comments I felt would improve
readability of the code.

Done as suggested.

I will start looking at the remaining patches next.

Thanks!

Note that I also updated the doc in 0003 for the stats that have mixed fields.

BTW, I think that we could also make the Function stat kind as flush any time,
thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Bertrand Drouvot (#8)

Re: Flush some statistics within running transactions

Thanks for the updates!

I don't think this feature could add a noticeable performance impact, so the tests
have been that simple. Do you think we should worry more?

One observation is there's no coordination between ANYTIME and
TXN_BOUNDARY flushes. While PGSTAT_MIN_INTERVAL
prevents a backend from flushing more than once per second, a backend can
still perform both an ANYTIME flush and a TXN_BOUNDARY flush within
the same 1-second window. Not saying this will be a real problem in
the real-world,
but we definitely took measures in the current implementation to avoid
this scenario.

Right. I think that the PGSTAT_MIN_INTERVAL throttling was put in place to prevent
flushing too frequently when the backend has a high commit rate. But here, while
it's true that we don't follow that rule (means a backend could flush more than one
time per second), that would be a maximum of 2 times (given that ANYTIME is
flushing every second). So, I'm not sure that this single extra flush is worth
worrying about. Plus we'd certainly need an extra GetCurrentTimestamp() call, so
I'm not sure it's worth it.

Yeah, all PGSTAT_MIN_INTERVAL does is throttle pgstat_flush_pending_entries.
Even in the current state, it does not limit how many kinds are flushed, etc.
I consider the ANYTIME flushes the same as just adding another stats kind.
So, I am not really worried about either.

I have some more comments:

-- v2-0001

#1.

+/* When to call pgstat_report_anytime_stat() again */
+#define PGSTAT_ANYTIME_FLUSH_INTERVAL       1000
+

We should just use PGSTAT_MIN_INTERVAL.

#2.

instead of ".flush_behavior", maybe ".flush_mode"? "mode" in the name is better
for configuration fields.

#3.

+/*
+ * Flush behavior for statistics kinds.
+ */
+typedef enum PgStat_FlushBehavior
+{
+       FLUSH_ANYTIME,                          /* All fields can be
flushed anytime,
+                                                                *
including within transactions */
+       FLUSH_AT_TXN_BOUNDARY,          /* All fields can only be flushed at
+                                                                *
transaction boundary */
+} PgStat_FlushBehavior;

FLUSH_AT_TXN_BOUNDARY should be the first value in PgStat_FlushBehavior.
Otherwise kinds ( built-in or custom ) that do not specify a flush_behavior
will default to FLUSH_ANYTIME. I don't think this is what we want.
FLUSH_AT_TXN_BOUNDARY should be the default.

#4. Can we add a test here? Maybe generate some wal inside a long
running transaction and
make sure the stats are updated after > 1 second

-- v2-0002

No comments for this one. With ANYTIME, indeed those flushes are not needed.

-- v2-0003

#1. Should we maybe make this a bit longer? maybe 2 or 3 seconds?
May make the tests slightly longer, but maybe better for test stability.

```
+step s1_sleep: SELECT pg_sleep(1.5);
+pg_sleep
+--------
```

#2.
+       /*
+        * Check if there are any non-transactional stats to flush. Avoid
+        * unnecessarily locking the entry if nothing accumulated.
+        */
+       if (lstats->counts.numscans > 0 ||
+               lstats->counts.tuples_returned > 0 ||
+               lstats->counts.tuples_fetched > 0 ||
+               lstats->counts.blocks_fetched > 0 ||
+               lstats->counts.blocks_hit > 0)
+               has_nontxn_stats = true;
+
+       if (!has_nontxn_stats)
+               return true;

Can we just do this without a has_nontxn_stats?
This is also the same patter as a regular flush, although
in the case `pg_memory_is_all_zeros` is used.

```
if (lstats->counts.numscans == 0 &&
lstats->counts.tuples_returned == 0 &&
lstats->counts.tuples_fetched == 0 &&
lstats->counts.blocks_fetched == 0 &&
lstats->counts.blocks_hit == 0)
return true;
```

#3.
+    are updated while the transactions are in progress. This means
that we can see
+    those statistics being updated without having to wait until the transaction
+    finishes.
+   </para>

The "This means ...... " line used several times does not add value, IMO.
"are updated while the transactions are in progress." is sufficient.

#4.
+  <note>
+   <para>
+    All the statistics are updated while the transactions are in
progress, except
+    for <structfield>xact_commit</structfield>,
<structfield>xact_rollback</structfield>,
+    <structfield>tup_inserted</structfield>,
<structfield>tup_updated</structfield> and
+    <structfield>tup_deleted</structfield> that are updated only when
the transactions
+    finish.
+   </para>
+  </note>

Only these 5 fields from pgstat_relation_flush_anytime_cb, so only the below are
"All the statistics are updated while the transactions are in progress", right?

numscans
tuples_returned
tuples_fetched
blocks_fetched
blocks_hit

--
Sami Imseih
Amazon Web Services (AWS)

#10

Zsolt Parragi

zsolt.parragi@percona.com

about 2 months ago

In reply to: Bertrand Drouvot (#8)

Re: Flush some statistics within running transactions

Hello

@@ -264,6 +266,12 @@ typedef struct PgStat_KindInfo
/* Flush behavior */
PgStat_FlushBehavior flush_behavior;

+ /*
+ * For PGSTAT_FLUSH_MIXED kinds: callback to flush only some fields. If
+ * NULL for a MIXED kind, treated as PGSTAT_FLUSH_AT_TXN_BOUNDARY.
+ */
+ bool (*flush_anytime_cb) (PgStat_EntryRef *entry_ref, bool nowait);
+

The comment seems to use incorrect names, shouldn't be FLUSH_MIXED and
FLUSH_AT_TXN_BOUNDARY without PGSTAT_?

#11

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Sami Imseih (#9)

Re: Flush some statistics within running transactions

Hi,

On Tue, Jan 20, 2026 at 01:27:55PM -0600, Sami Imseih wrote:

I have some more comments:

Thanks!

-- v2-0001

#1.
+/* When to call pgstat_report_anytime_stat() again */
+#define PGSTAT_ANYTIME_FLUSH_INTERVAL       1000
+
We should just use PGSTAT_MIN_INTERVAL.

Okay, done. We can still switch to a dedicated one if we feel the need later on.

#2.

instead of ".flush_behavior", maybe ".flush_mode"? "mode" in the name is better
for configuration fields.

Sounds good.

#3.

FLUSH_AT_TXN_BOUNDARY should be the first value in PgStat_FlushBehavior.
Otherwise kinds ( built-in or custom ) that do not specify a flush_behavior
will default to FLUSH_ANYTIME. I don't think this is what we want.
FLUSH_AT_TXN_BOUNDARY should be the default.

Good point, agreed and done.

#4. Can we add a test here? Maybe generate some wal inside a long
running transaction and
make sure the stats are updated after > 1 second

I'm not sure, that's also somehow the purpose of 0002 (with 039549d70f6 being
reverted).

0001 and 0002 could be merged and pushed as one commit. That said I'm not opposed
if you feel strongly about it.

-- v2-0003

#1. Should we maybe make this a bit longer? maybe 2 or 3 seconds?
May make the tests slightly longer, but maybe better for test stability.
```
+step s1_sleep: SELECT pg_sleep(1.5);
+pg_sleep
+--------
```

Not sure, we could increase if we see the test failing.

#2.
+       /*
+        * Check if there are any non-transactional stats to flush. Avoid
+        * unnecessarily locking the entry if nothing accumulated.
+        */
+       if (lstats->counts.numscans > 0 ||
+               lstats->counts.tuples_returned > 0 ||
+               lstats->counts.tuples_fetched > 0 ||
+               lstats->counts.blocks_fetched > 0 ||
+               lstats->counts.blocks_hit > 0)
+               has_nontxn_stats = true;
+
+       if (!has_nontxn_stats)
+               return true;

Can we just do this without a has_nontxn_stats?

Yeah.

#3.
+    are updated while the transactions are in progress. This means
that we can see
+    those statistics being updated without having to wait until the transaction
+    finishes.
+   </para>
The "This means ...... " line used several times does not add value, IMO.
"are updated while the transactions are in progress." is sufficient.

Removed.

#4.
+  <note>
+   <para>
+    All the statistics are updated while the transactions are in
progress, except
+    for <structfield>xact_commit</structfield>,
<structfield>xact_rollback</structfield>,
+    <structfield>tup_inserted</structfield>,
<structfield>tup_updated</structfield> and
+    <structfield>tup_deleted</structfield> that are updated only when
the transactions
+    finish.
+   </para>
+  </note>
Only these 5 fields from pgstat_relation_flush_anytime_cb, so only the below are
"All the statistics are updated while the transactions are in progress", right?

numscans
tuples_returned
tuples_fetched
blocks_fetched
blocks_hit

No, 0003 also changes the flush mode for the database KIND. All the fields that
I mentioned are inherited from relations stats and are flushed only at transaction
boundaries (so they don't appear in pg_stat_database until the transaction
finishes). Does that make sense? (if the database kind is not switched to
flush any time then none would appear while the transaction is in progress, even
the ones inherited from relations stats).

PFA v3, also taking care of Zsolt's comment (thanks!) done up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#12

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Bertrand Drouvot (#11)

Re: Flush some statistics within running transactions

Thanks for the updated patches!

No, 0003 also changes the flush mode for the database KIND. All the fields that
I mentioned are inherited from relations stats and are flushed only at transaction
boundaries (so they don't appear in pg_stat_database until the transaction
finishes). Does that make sense?

yes, I understand it clearly now.

But, the Note under pg_stat_database reads like this:

"All the statistics are updated while the transactions are in progress,
except for xact_commit, xact_rollback, tup_inserted, tup_updated
and tup_deleted that are updated only when the transactions finish."

But that is not true for all pg_stat_database fields, such as session_time,
active_time, idle_in_transaction_time, etc. From what I can tell some of their
fields are updated when the connection is closed. For example
in one session run "select pg_sleep(10)" and in another session monitor
pg_stat_database.active_time. That will not be updated until the session
is closed.

This is because these are not relation stats, which makes sense. The
Note section should elaborate more on this, right?

--
Sami Imseih
Amazon Web Services (AWS)

#13

Michael Paquier

michael@paquier.xyz

about 2 months ago

In reply to: Bertrand Drouvot (#11)

Re: Flush some statistics within running transactions

On Wed, Jan 21, 2026 at 10:34:09AM +0000, Bertrand Drouvot wrote:

No, 0003 also changes the flush mode for the database KIND. All the fields that
I mentioned are inherited from relations stats and are flushed only at transaction
boundaries (so they don't appear in pg_stat_database until the transaction
finishes). Does that make sense? (if the database kind is not switched to
flush any time then none would appear while the transaction is in progress, even
the ones inherited from relations stats).

PFA v3, also taking care of Zsolt's comment (thanks!) done up-thread.

While reading through 0001, I got to question on which properties
and/or assumptions of a stats kind one has to rely on to decide to
what flush_mode should be set. To put is simpler, why don't we just
do a periodic pgstat_report_stat(false) call that would flush all the
stats for all stats kinds based on the new timeout registered,
expanding a bit the flush we currently do when idle in
ProcessInterrupts()? It seems that one point of contention should be
that we should be careful with entries in the shmem hash table that
have been created in a transactional way, but we may already flush
them while we are in a transaction state, no? Are there any fields in
a stats kind that we do may not want to flush? If yes, it sounds to
me that it would be better to document these in the structures to
explain the reason why a flush mode is chosen over the other.

I am also not convinced that we have to be that aggressive with these
extra flushes. The target is long-running analytical queries, that
could take minutes or even hours. Using the same value as
PGSTAT_IDLE_INTERVAL (10s), perhaps renaming the value while on it,
would be a more natural fit. A 1s vs 10s report interval does not
really matter for long analytical queries, where I'd imagine data
being picked up on at least a 30s interval, at the shortest. Of
course, one may want to get a more "live" representation of the data
with more aggressive flushes, but is that really helpful for
long-running queries to have more granularity, stressing more the
shmem state?
--
Michael

#14

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Michael Paquier (#13)

Re: Flush some statistics within running transactions

No, 0003 also changes the flush mode for the database KIND. All the fields that
I mentioned are inherited from relations stats and are flushed only at transaction
boundaries (so they don't appear in pg_stat_database until the transaction
finishes). Does that make sense? (if the database kind is not switched to
flush any time then none would appear while the transaction is in progress, even
the ones inherited from relations stats).

PFA v3, also taking care of Zsolt's comment (thanks!) done up-thread.

While reading through 0001, I got to question on which properties
and/or assumptions of a stats kind one has to rely on to decide to
what flush_mode should be set. To put is simpler, why don't we just
do a periodic pgstat_report_stat(false) call that would flush all the
stats for all stats kinds based on the new timeout registered,
expanding a bit the flush we currently do when idle in
ProcessInterrupts()?

There are some important cases in which we would want to
distinguish between a "transaction boundary" flush vs an
"anytime" flush.

For example, xact_commit/rollback. I would want those
fields to be in sync with tuples_inserted/updated/deleted
to allow for accurate calculations like number of inserts
per commit, etc.

Another one would be n_mod_since_analyze, That should
only be updated after commit (or not after rollback). Otherwise,
it may throw autovanalyze threshold calculations way off. Same
for n_dead_tup and autovacuum.

I am also not convinced that we have to be that aggressive with these
extra flushes. The target is long-running analytical queries, that
could take minutes or even hours. Using the same value as
PGSTAT_IDLE_INTERVAL (10s),

PGSTAT_IDLE_INTERVAL is flushing an idle backend every 10 seconds
IIUC. So this value only applies when outside of a transaction.

A 1s vs 10s report interval does not really matter for long analytical queries.

Sure, Bertrand mentioned early in the thread that the anytime flushes
could be made configurable. Perhaps that is a good idea where we can
default with something large like 10s intervals for anytime flushes, but allow
the user to configure a more frequent flushes ( although I would think
that 1 sec is the minimum we should allow ).

--
Sami Imseih
Amazon Web Services (AWS)

#15

Fujii Masao

masao.fujii@gmail.com

about 2 months ago

In reply to: Sami Imseih (#14)

Re: Flush some statistics within running transactions

On Thu, Jan 22, 2026 at 10:41 AM Sami Imseih <samimseih@gmail.com> wrote:

No, 0003 also changes the flush mode for the database KIND. All the fields that
I mentioned are inherited from relations stats and are flushed only at transaction
boundaries (so they don't appear in pg_stat_database until the transaction
finishes). Does that make sense? (if the database kind is not switched to
flush any time then none would appear while the transaction is in progress, even
the ones inherited from relations stats).

PFA v3, also taking care of Zsolt's comment (thanks!) done up-thread.

While reading through 0001, I got to question on which properties
and/or assumptions of a stats kind one has to rely on to decide to
what flush_mode should be set. To put is simpler, why don't we just
do a periodic pgstat_report_stat(false) call that would flush all the
stats for all stats kinds based on the new timeout registered,
expanding a bit the flush we currently do when idle in
ProcessInterrupts()?

There are some important cases in which we would want to
distinguish between a "transaction boundary" flush vs an
"anytime" flush.

For example, xact_commit/rollback. I would want those
fields to be in sync with tuples_inserted/updated/deleted
to allow for accurate calculations like number of inserts
per commit, etc.

Another one would be n_mod_since_analyze, That should
only be updated after commit (or not after rollback). Otherwise,
it may throw autovanalyze threshold calculations way off. Same
for n_dead_tup and autovacuum.

I am also not convinced that we have to be that aggressive with these
extra flushes. The target is long-running analytical queries, that
could take minutes or even hours. Using the same value as
PGSTAT_IDLE_INTERVAL (10s),

PGSTAT_IDLE_INTERVAL is flushing an idle backend every 10 seconds
IIUC. So this value only applies when outside of a transaction.

A 1s vs 10s report interval does not really matter for long analytical queries.

Sure, Bertrand mentioned early in the thread that the anytime flushes
could be made configurable. Perhaps that is a good idea where we can
default with something large like 10s intervals for anytime flushes, but allow
the user to configure a more frequent flushes ( although I would think
that 1 sec is the minimum we should allow ).

+1 on adding an option to control the interval. With a fixed interval
(for example, 1s), log_lock_waits messages could be emitted that frequently,
which may be annoying for some users.

Of course, it would be even better if these periodic wakeups did not trigger
log_lock_waits messages at all, though.

Regards,

--
Fujii Masao

#16

Michael Paquier

michael@paquier.xyz

about 2 months ago

In reply to: Sami Imseih (#14)

Re: Flush some statistics within running transactions

On Wed, Jan 21, 2026 at 07:41:30PM -0600, Sami Imseih wrote:

Another one would be n_mod_since_analyze, That should
only be updated after commit (or not after rollback). Otherwise,
it may throw autovanalyze threshold calculations way off. Same
for n_dead_tup and autovacuum.

Point taken. It sounds like it is going to be super important to
document in the patch these kind of current expectations, so as one
does not flip the flush mode one way or another incorrectly, or
assigns an incorrect flush mode when adding a new stats kind. It's
probably worth documenting that the end-of-transaction flush should be
the default norm, while the out-of-transaction case should be an
exception one needs to be careful of.

Sure, Bertrand mentioned early in the thread that the anytime flushes
could be made configurable. Perhaps that is a good idea where we can
default with something large like 10s intervals for anytime flushes, but allow
the user to configure a more frequent flushes ( although I would think
that 1 sec is the minimum we should allow ).

Sure, I am just mentioning that we should not be that aggressive for
everybody. If this can be made configurable on a call-basis, even if
it means a new GUC, that may be better in the long run.
--
Michael

#17

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Fujii Masao (#15)

Re: Flush some statistics within running transactions

Hi,

On Thu, Jan 22, 2026 at 10:56:48AM +0900, Fujii Masao wrote:

On Thu, Jan 22, 2026 at 10:41 AM Sami Imseih <samimseih@gmail.com> wrote:

Sure, Bertrand mentioned early in the thread that the anytime flushes
could be made configurable. Perhaps that is a good idea where we can
default with something large like 10s intervals for anytime flushes, but allow
the user to configure a more frequent flushes ( although I would think
that 1 sec is the minimum we should allow ).

+1 on adding an option to control the interval. With a fixed interval
(for example, 1s), log_lock_waits messages could be emitted that frequently,
which may be annoying for some users.

Of course, it would be even better if these periodic wakeups did not trigger
log_lock_waits messages at all, though.

pgstat_report_anytime_stat() is called with the force parameter set to false,
means that the flushes are done with nowait = true means that LWLockConditionalAcquire()
is used. In that case, do you still see cases where log_lock_waits messages could
be triggered due to the new flush?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#18

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Michael Paquier (#16)

Re: Flush some statistics within running transactions

Hi,

On Thu, Jan 22, 2026 at 11:28:06AM +0900, Michael Paquier wrote:

On Wed, Jan 21, 2026 at 07:41:30PM -0600, Sami Imseih wrote:

Another one would be n_mod_since_analyze, That should
only be updated after commit (or not after rollback). Otherwise,
it may throw autovanalyze threshold calculations way off. Same
for n_dead_tup and autovacuum.

Point taken. It sounds like it is going to be super important to
document in the patch these kind of current expectations, so as one
does not flip the flush mode one way or another incorrectly, or
assigns an incorrect flush mode when adding a new stats kind. It's
probably worth documenting that the end-of-transaction flush should be
the default norm, while the out-of-transaction case should be an
exception one needs to be careful of.

Agreed, I'll add more explanations around that.

Sure, Bertrand mentioned early in the thread that the anytime flushes
could be made configurable. Perhaps that is a good idea where we can
default with something large like 10s intervals for anytime flushes, but allow
the user to configure a more frequent flushes ( although I would think
that 1 sec is the minimum we should allow ).

Sure, I am just mentioning that we should not be that aggressive for
everybody.

I'm not opposed to increase the flush frequency but I suppose most of the monitoring
tools are sampling at a 1s frequency. So, if we set the flush frequency to say 10s,
that would result in "spikes" every 10s. That's misleading, because it's not a
spike in activity, it's a delay in reporting.

I think that would make sense if we expect the 1s interval to have a negative
impact, but that's not what I expect and observed.

If this can be made configurable on a call-basis, even if
it means a new GUC, that may be better in the long run.

If we think that the 1s interval is a problem, we could go in that direction.
Though it might be better to hardcode a larger value instead of letting the users
set values that could be problematic.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Sami Imseih (#12)

Re: Flush some statistics within running transactions

Hi,

On Wed, Jan 21, 2026 at 05:41:13PM -0600, Sami Imseih wrote:

Thanks for the updated patches!

No, 0003 also changes the flush mode for the database KIND. All the fields that
I mentioned are inherited from relations stats and are flushed only at transaction
boundaries (so they don't appear in pg_stat_database until the transaction
finishes). Does that make sense?

yes, I understand it clearly now.

But, the Note under pg_stat_database reads like this:

"All the statistics are updated while the transactions are in progress,
except for xact_commit, xact_rollback, tup_inserted, tup_updated
and tup_deleted that are updated only when the transactions finish."

But that is not true for all pg_stat_database fields, such as session_time,
active_time, idle_in_transaction_time, etc. From what I can tell some of their
fields are updated when the connection is closed. For example
in one session run "select pg_sleep(10)" and in another session monitor
pg_stat_database.active_time. That will not be updated until the session
is closed.

This is because these are not relation stats, which makes sense. The
Note section should elaborate more on this, right?

Yeah, so, while pgstat_database_flush_cb() is now called every second (if there
are pending stats), not all the stats would have their pending entries updated.

For example, pgstat_update_dbstats() updates some of them: xact_commit, xact_rollback,
blk_read_time, blk_write_time, session_time, active_time and idle_in_transaction_time
but only at transaction boundaries. Indeed, pgstat_update_dbstats() is only called
during pgstat_report_stat() and not during pgstat_report_anytime_stat().

I think that we could:

1. Update the doc as you suggest

2. Call a modified version of pgstat_update_dbstats() in pgstat_report_anytime_stat()
that would update blk_read_time, blk_write_time, session_time, active_time and
idle_in_transaction_time but that would require an extra GetCurrentTimestamp()
call.

3. Call a modified version of pgstat_update_dbstats() in pgstat_report_anytime_stat()
that would update the same as in 2. except session_time then avoiding the need
of a GetCurrentTimestamp() extra call.

I'm tempted to vote for 1. as I'm not sure of the added value of 2. and 3.,
thoughts?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#20

Fujii Masao

masao.fujii@gmail.com

about 2 months ago

In reply to: Bertrand Drouvot (#17)

Re: Flush some statistics within running transactions

On Thu, Jan 22, 2026 at 4:43 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:

Hi,

On Thu, Jan 22, 2026 at 10:56:48AM +0900, Fujii Masao wrote:

On Thu, Jan 22, 2026 at 10:41 AM Sami Imseih <samimseih@gmail.com> wrote:

Sure, Bertrand mentioned early in the thread that the anytime flushes
could be made configurable. Perhaps that is a good idea where we can
default with something large like 10s intervals for anytime flushes, but allow
the user to configure a more frequent flushes ( although I would think
that 1 sec is the minimum we should allow ).

+1 on adding an option to control the interval. With a fixed interval
(for example, 1s), log_lock_waits messages could be emitted that frequently,
which may be annoying for some users.

Of course, it would be even better if these periodic wakeups did not trigger
log_lock_waits messages at all, though.

pgstat_report_anytime_stat() is called with the force parameter set to false,
means that the flushes are done with nowait = true means that LWLockConditionalAcquire()
is used. In that case, do you still see cases where log_lock_waits messages could
be triggered due to the new flush?

I haven't read the patch in detail yet, but after applying patch 0001 and
causing a lock wait (for example, using the steps below), I observed that
log_lock_waits messages are emitted every second.

[session 1]
create table tbl as select id from generate_series(1, 10) id;
begin;
select * from tbl where id = 1 for update;

[session 2]
begin;
select * from tbl where id = 1 for update;

With this setup, the following messages were logged once per second:

LOG: process 72199 still waiting for ShareLock on transaction 771
after 63034.119 ms
DETAIL: Process holding the lock: 72190. Wait queue: 72199.

Regards,

--
Fujii Masao

#21

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 2 months ago

In reply to: Fujii Masao (#20)

#22

Sami Imseih

samimseih@gmail.com

about 2 months ago

In reply to: Bertrand Drouvot (#19)

#23

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 month ago

In reply to: Bertrand Drouvot (#21)

#24

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 month ago

In reply to: Sami Imseih (#22)

#25

Sami Imseih

samimseih@gmail.com

about 1 month ago

In reply to: Bertrand Drouvot (#24)

#26

Michael Paquier

michael@paquier.xyz

about 1 month ago

In reply to: Bertrand Drouvot (#23)

#27

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 month ago

In reply to: Michael Paquier (#26)

#28

Alvaro Herrera

alvherre@2ndquadrant.com

about 1 month ago

In reply to: Fujii Masao (#20)

#29

Sami Imseih

samimseih@gmail.com

about 1 month ago

In reply to: Bertrand Drouvot (#27)

#30

Sami Imseih

samimseih@gmail.com

about 1 month ago

In reply to: Sami Imseih (#29)

#31

Michael Paquier

michael@paquier.xyz

about 1 month ago

In reply to: Sami Imseih (#30)

#32

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 month ago

In reply to: Sami Imseih (#30)

#33

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 month ago

In reply to: Michael Paquier (#31)

#34

Zsolt Parragi

zsolt.parragi@percona.com

about 1 month ago

In reply to: Bertrand Drouvot (#32)

#35

Sami Imseih

samimseih@gmail.com

about 1 month ago

In reply to: Bertrand Drouvot (#32)

#36

Sami Imseih

samimseih@gmail.com

about 1 month ago

In reply to: Sami Imseih (#35)

#37

Sami Imseih

samimseih@gmail.com

about 1 month ago

In reply to: Zsolt Parragi (#34)

#38

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 month ago

In reply to: Sami Imseih (#36)

#39

Sami Imseih

samimseih@gmail.com

24 days ago

In reply to: Bertrand Drouvot (#38)

#40

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

22 days ago

In reply to: Sami Imseih (#39)

#41

Sami Imseih

samimseih@gmail.com

21 days ago

In reply to: Bertrand Drouvot (#40)

#42

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

21 days ago

In reply to: Sami Imseih (#41)

#43

Sami Imseih

samimseih@gmail.com

21 days ago

In reply to: Bertrand Drouvot (#42)

#44

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

20 days ago

In reply to: Sami Imseih (#43)

#45

Jakub Wartak

jakub.wartak@enterprisedb.com

20 days ago

In reply to: Bertrand Drouvot (#44)

#46

Michael Paquier

michael@paquier.xyz

19 days ago

In reply to: Bertrand Drouvot (#44)

#47

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

19 days ago

In reply to: Jakub Wartak (#45)

#48

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

19 days ago

In reply to: Michael Paquier (#46)

#49

Sami Imseih

samimseih@gmail.com

19 days ago

In reply to: Bertrand Drouvot (#48)

#50

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

18 days ago

In reply to: Sami Imseih (#49)

#51

Sami Imseih

samimseih@gmail.com

15 days ago

In reply to: Bertrand Drouvot (#50)

#52

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

15 days ago

In reply to: Sami Imseih (#51)

#53

Sami Imseih

samimseih@gmail.com

14 days ago

In reply to: Bertrand Drouvot (#52)

#54

Michael Paquier

michael@paquier.xyz

14 days ago

In reply to: Bertrand Drouvot (#48)

#55

Michael Paquier

michael@paquier.xyz

14 days ago

In reply to: Sami Imseih (#53)

#56

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

14 days ago

In reply to: Sami Imseih (#53)

#57

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

14 days ago

In reply to: Michael Paquier (#54)

#58

Sami Imseih

samimseih@gmail.com

14 days ago

In reply to: Bertrand Drouvot (#57)

#59

Michael Paquier

michael@paquier.xyz

12 days ago

In reply to: Bertrand Drouvot (#57)

Flush some statistics within running transactions

Attachments:

Attachments:

Attachments: