Is a pg_stat_force_next_flush() call sufficient for regression tests?

Started by Tomas Vondraover 2 years ago3 messages
#1Tomas Vondra
tomas.vondra@enterprisedb.com

Hi,

I noticed eelpout failed with this in stats.sql, in the pg_stat_io tests
added a couple months ago [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2023-07-03%2011%3A09%3A13:

@@ -1415,7 +1415,7 @@
        :io_sum_vac_strategy_after_reuses >
:io_sum_vac_strategy_before_reuses;
  ?column? | ?column?
 ----------+----------
- t        | t
+ t        | f
 (1 row)

The failure seems completely unrelated to the new commit, so this seems
like some randomness / timing issue. The failing bit does this:

----------------------------
VACUUM (PARALLEL 0, BUFFER_USAGE_LIMIT 128) test_io_vac_strategy;
SELECT pg_stat_force_next_flush();
SELECT sum(reuses) AS reuses, sum(reads) AS reads
FROM pg_stat_io WHERE context = 'vacuum' \gset io_sum_vac_strategy_after_
SELECT :io_sum_vac_strategy_after_reads > :io_sum_vac_strategy_before_reads,
:io_sum_vac_strategy_after_reuses >
:io_sum_vac_strategy_before_reuses;
----------------------------

So I'm wondering if pg_stat_force_next_flush() is enough - AFAICS this
only sets some flag for the *next* pgstat_report_stat() call, but how do
we know that happens before the query execution?

Shouldn't there be something like pg_stat_flush() that actually does the
flushing, instead of just setting the flag?

regards

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2023-07-03%2011%3A09%3A13
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2023-07-03%2011%3A09%3A13

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Tomas Vondra (#1)
2 attachment(s)
Re: Is a pg_stat_force_next_flush() call sufficient for regression tests?

At Mon, 3 Jul 2023 15:45:52 +0200, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote in

So I'm wondering if pg_stat_force_next_flush() is enough - AFAICS this
only sets some flag for the *next* pgstat_report_stat() call, but how do
we know that happens before the query execution?

Shouldn't there be something like pg_stat_flush() that actually does the
flushing, instead of just setting the flag?

The reason for the function is that pg_stat_flush() is supposed not to
be called within a transaction. AFAICS pg_stat_force_next_flush()
takes effect after a successfull transaction end and before the next
command execution.

To verify this, I put in an assertion to check that the flag gets
consumed before reading of pg_stat_io (a.diff), then ran pgbench with
the attached custom script. As expected, it didn't fire at all during
several trials. When I wrapped all lines in t.sql within a
begin-commit block, the assertion fired off immediately as a matter of
course.

Is there any chance concurrent backends or some other things can
actually hinder the backend from reusing buffers?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

a.difftext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index d743fc0b28..6529e737ea 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -225,7 +225,7 @@ static dlist_head pgStatPending = DLIST_STATIC_INIT(pgStatPending);
  * Force the next stats flush to happen regardless of
  * PGSTAT_MIN_INTERVAL. Useful in test scripts.
  */
-static bool pgStatForceNextFlush = false;
+bool pgStatForceNextFlush = false;
 
 /*
  * Force-clear existing snapshot before next use when stats_fetch_consistency
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 2a4c8ef87f..7f60b2834a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -1352,7 +1352,9 @@ pg_stat_get_io(PG_FUNCTION_ARGS)
 	ReturnSetInfo *rsinfo;
 	PgStat_IO  *backends_io_stats;
 	Datum		reset_time;
+	extern bool pgStatForceNextFlush;
 
+	Assert(!pgStatForceNextFlush);
 	InitMaterializedSRF(fcinfo, 0);
 	rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
 
t.sqltext/plain; charset=us-asciiDownload
#3Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Kyotaro Horiguchi (#2)
Re: Is a pg_stat_force_next_flush() call sufficient for regression tests?

On 7/4/23 04:29, Kyotaro Horiguchi wrote:

At Mon, 3 Jul 2023 15:45:52 +0200, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote in

So I'm wondering if pg_stat_force_next_flush() is enough - AFAICS this
only sets some flag for the *next* pgstat_report_stat() call, but how do
we know that happens before the query execution?

Shouldn't there be something like pg_stat_flush() that actually does the
flushing, instead of just setting the flag?

The reason for the function is that pg_stat_flush() is supposed not to
be called within a transaction. AFAICS pg_stat_force_next_flush()
takes effect after a successfull transaction end and before the next
command execution.

Sure, if we're supposed to report the stats only at the end of a
transaction, that makes sense. But then why didn't that happen here?

To verify this, I put in an assertion to check that the flag gets
consumed before reading of pg_stat_io (a.diff), then ran pgbench with
the attached custom script. As expected, it didn't fire at all during
several trials. When I wrapped all lines in t.sql within a
begin-commit block, the assertion fired off immediately as a matter of
course.

If I understand correctly, this just verifies that

1) if everything goes well, we report the stats at the end of the
transaction (otherwise the case without BEGIN/COMMIT would fail)

2) we don't report stats when in a transaction (with the BEGIN/COMMIT)

But the eelpout failure clearly suggests this may misbehave.

Is there any chance concurrent backends or some other things can
actually hinder the backend from reusing buffers?

No idea. I'm not very familiar with the reworked pgstat system, but
either the pgstat_report_stat() was not called for some reason, or it
decided there's nothing to report (i.e. have_iostats==false). Not sure
why would that happen.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company