stats.sql might fail due to shared buffers also used by parallel tests

Started by Alexander Lakhinabout 1 year ago12 messages
#1Alexander Lakhin
exclusion@gmail.com
1 attachment(s)

Hello hackers,

I'd like to share my investigation of one mysterious stats.sql failure
occurred in December: [1].
The difference of the failure is:
--- C:/prog/bf/root/HEAD/pgsql/src/test/regress/expected/stats.out 2024-09-18 19:31:14.665516500 +0000
+++ C:/prog/bf/root/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out 2024-12-17 
09:57:08.944588500 +0000
@@ -1291,7 +1291,7 @@
  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
   ?column?
  ----------
- t
+ f
  (1 row)

027_stream_regress_primary.log contains:
2024-12-17 09:57:06.778 UTC [8568:109] pg_regress/stats LOG: statement: SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
      FROM pg_stat_io
      WHERE object = 'relation'
2024-12-17 09:57:06.779 UTC [8568:110] pg_regress/stats LOG: statement: CREATE TABLE test_io_shared(a int);
2024-12-17 09:57:06.780 UTC [8568:111] pg_regress/stats LOG: statement: INSERT INTO test_io_shared SELECT i FROM
generate_series(1,100)i;
...
2024-12-17 09:57:06.782 UTC [8568:115] pg_regress/stats LOG: statement: CHECKPOINT;
2024-12-17 09:57:06.794 UTC [3664:40] LOG:  checkpoint starting: immediate force wait
2024-12-17 09:57:06.856 UTC [3664:41] LOG:  checkpoint complete: wrote 0 buffers (0.0%), wrote 1 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.062 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=1875 kB, estimate=52682 kB; lsn=0/14A2F410, redo lsn=0/14A2F3B8
2024-12-17 09:57:06.857 UTC [8568:116] pg_regress/stats LOG: statement: CHECKPOINT;
2024-12-17 09:57:06.857 UTC [3664:42] LOG:  checkpoint starting: immediate force wait
2024-12-17 09:57:06.859 UTC [3664:43] LOG:  checkpoint complete: wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=0 kB, estimate=47414 kB; lsn=0/14A2F4E0, redo lsn=0/14A2F488
2024-12-17 09:57:06.859 UTC [8568:117] pg_regress/stats LOG: statement: SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
      FROM pg_stat_io
      WHERE object = 'relation'
2024-12-17 09:57:06.860 UTC [8568:118] pg_regress/stats LOG: statement: SELECT 77693 > 77693;

The corresponding test case (added by 10a082bf7, dated 2023-02-11) is:
SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
  FROM pg_stat_io
  WHERE object = 'relation' \gset io_sum_shared_before_
...
CREATE TABLE test_io_shared(a int);
INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
...
CHECKPOINT;
CHECKPOINT;
...
SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
  FROM pg_stat_get_backend_io(pg_backend_pid())
  WHERE object = 'relation' \gset my_io_sum_shared_after_
SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;

So, as we can see from the log above, both checkpoints wrote 0 buffers,
moreover, no other process/backend wrote any buffers.

After series of experiments, I found that this situation is possible when
other backend steals shared buffers from checkpointer, without making new
buffers dirty, and then delays flushing it's statistics.

Please find attached a reproducer for this issue: I've added two sleeps
to stats.sql and reduced parallel_schedule, but I believe it's a legal
change, which doesn't affect the test case essentially.

With bufchurn.sql running concurrently. and TEMP_CONFIG containing
shared_buffers = 1MB (as in the 027_stream_regress test in question), I can
see exactly the same failure:
$ echo 'shared_buffers = 1MB' >/tmp/extra.config; TEMP_CONFIG=/tmp/extra.config make -s check
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# using temp instance on port 65312 with PID 129700
ok 1         - test_setup                                356 ms
ok 2         - create_index                             1002 ms
# parallel group (2 tests):  stats bufchurn
not ok 3     + stats                                   21176 ms
...

diff -U3 .../src/test/regress/expected/stats.out .../src/test/regress/results/stats.out
--- .../src/test/regress/expected/stats.out  2025-01-06 08:03:07.720275980 +0200
+++ .../src/test/regress/results/stats.out   2025-01-06 08:15:34.602170721 +0200
@@ -1321,7 +1321,7 @@
  SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
   ?column?
  ----------
- t
+ f
  (1 row)
...

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2024-12-17%2008%3A59%3A44

Best regards,
Alexander

Attachments:

stats-bufs-write-repro.patchtext/x-patch; charset=UTF-8; name=stats-bufs-write-repro.patchDownload
diff --git a/src/test/regress/expected/bufchurn.out b/src/test/regress/expected/bufchurn.out
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/src/test/regress/expected/stats.out b/src/test/regress/expected/stats.out
index a0317b7208..651898b9f4 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -834,6 +834,12 @@ SELECT num_requested AS rqst_ckpts_before FROM pg_stat_checkpointer \gset
 SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
 CREATE TEMP TABLE test_stats_temp AS SELECT 17;
 DROP TABLE test_stats_temp;
+SELECT pg_sleep(10);
+ pg_sleep 
+----------
+ 
+(1 row)
+
 -- Checkpoint twice: The checkpointer reports stats after reporting completion
 -- of the checkpoint. But after a second checkpoint we'll see at least the
 -- results of the first.
@@ -1298,6 +1304,12 @@ SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
  t
 (1 row)
 
+SELECT pg_sleep(10);
+ pg_sleep 
+----------
+ 
+(1 row)
+
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (usually not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1edd9e45eb..2142842a31 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -1,138 +1,4 @@
-# ----------
-# src/test/regress/parallel_schedule
-#
-# Most test scripts can be run after running just test_setup and possibly
-# create_index.  Exceptions to this rule are documented below.
-#
-# By convention, we put no more than twenty tests in any one parallel group;
-# this limits the number of connections needed to run the tests.
-# ----------
-
-# required setup steps
 test: test_setup
-
-# ----------
-# The first group of parallel tests
-# ----------
-test: boolean char name varchar text int2 int4 int8 oid float4 float8 bit numeric txid uuid enum money rangetypes pg_lsn regproc
-
-# ----------
-# The second group of parallel tests
-# multirangetypes depends on rangetypes
-# multirangetypes shouldn't run concurrently with type_sanity
-# ----------
-test: strings md5 numerology point lseg line box path polygon circle date time timetz timestamp timestamptz interval inet macaddr macaddr8 multirangetypes
-
-# ----------
-# Another group of parallel tests
-# geometry depends on point, lseg, line, box, path, polygon, circle
-# horology depends on date, time, timetz, timestamp, timestamptz, interval
-# ----------
-test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comments expressions unicode xid mvcc database stats_import
-
-# ----------
-# Load huge amounts of data
-# We should split the data files into single files and then
-# execute two copy tests in parallel, to check that copy itself
-# is concurrent safe.
-# ----------
-test: copy copyselect copydml copyencoding insert insert_conflict
-
-# ----------
-# More groups of parallel tests
-# Note: many of the tests in later groups depend on create_index
-# ----------
-test: create_function_c create_misc create_operator create_procedure create_table create_type create_schema
-test: create_index create_index_spgist create_view index_including index_including_gist
-
-# ----------
-# Another group of parallel tests
-# ----------
-test: create_aggregate create_function_sql create_cast constraints triggers select inherit typed_table vacuum drop_if_exists updatable_views roleattributes create_am hash_func errors infinite_recurse
-
-# ----------
-# sanity_check does a vacuum, affecting the sort order of SELECT *
-# results. So it should not run parallel to other tests.
-# ----------
-test: sanity_check
-
-# ----------
-# Another group of parallel tests
-# aggregates depends on create_aggregate
-# join depends on create_misc
-# ----------
-test: select_into select_distinct select_distinct_on select_implicit select_having subselect union case join aggregates transactions random portals arrays btree_index hash_index update delete namespace prepared_xacts
-
-# ----------
-# Another group of parallel tests
-# ----------
-test: brin gin gist spgist privileges init_privs security_label collate matview lock replica_identity rowsecurity object_address tablesample groupingsets drop_operator password identity generated_stored join_hash
-
-# ----------
-# Additional BRIN tests
-# ----------
-test: brin_bloom brin_multi
-
-# ----------
-# Another group of parallel tests
-# psql depends on create_am
-# amutils depends on geometry, create_index_spgist, hash_index, brin
-# ----------
-test: create_table_like alter_generic alter_operator misc async dbsize merge misc_functions sysviews tsrf tid tidscan tidrangescan collate.utf8 collate.icu.utf8 incremental_sort create_role without_overlaps
-
-# collate.linux.utf8 and collate.icu.utf8 tests cannot be run in parallel with each other
-test: rules psql psql_crosstab amutils stats_ext collate.linux.utf8 collate.windows.win1252
-
-# ----------
-# Run these alone so they don't run out of parallel workers
-# select_parallel depends on create_misc
-# ----------
-test: select_parallel
-test: write_parallel
-test: vacuum_parallel
-
-# no relation related tests can be put in this group
-test: publication subscription
-
-# ----------
-# Another group of parallel tests
-# select_views depends on create_view
-# ----------
-test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps advisory_lock indirect_toast equivclass
-
-# ----------
-# Another group of parallel tests (JSON related)
-# ----------
-test: json jsonb json_encoding jsonpath jsonpath_encoding jsonb_jsonpath sqljson sqljson_queryfuncs sqljson_jsontable
-
-# ----------
-# Another group of parallel tests
-# with depends on create_misc
-# NB: temp.sql does a reconnect which transiently uses 2 connections,
-# so keep this parallel group to at most 19 tests
-# ----------
-test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml
-
-# ----------
-# Another group of parallel tests
-#
-# The stats test resets stats, so nothing else needing stats access can be in
-# this group.
-# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
-
-# event_trigger depends on create_am and cannot run concurrently with
-# any test that runs DDL
-# oidjoins is read-only, though, and should run late for best coverage
-test: oidjoins event_trigger
-
-# event_trigger_login cannot run concurrently with any other tests because
-# on-login event handling could catch connection of a concurrent test.
-test: event_trigger_login
-
-# this test also uses event triggers, so likewise run it by itself
-test: fast_default
-
-# run tablespace test at the end because it drops the tablespace created during
-# setup that other tests may use.
-test: tablespace
+test: create_index
+test: stats bufchurn
+# test: ministats bufchurn
diff --git a/src/test/regress/sql/bufchurn.sql b/src/test/regress/sql/bufchurn.sql
new file mode 100644
index 0000000000..c29ac4acf9
--- /dev/null
+++ b/src/test/regress/sql/bufchurn.sql
@@ -0,0 +1,22 @@
+SELECT pg_sleep(2);
+CREATE TABLE t0(i int, t text);
+ALTER TABLE t0 ALTER t SET STORAGE PLAIN;
+INSERT INTO t0 SELECT g, g || repeat('x', 8000) FROM generate_series(1, 256) g;
+
+CREATE TABLE t1(i int, t text);
+ALTER TABLE tl ALTER t SET STORAGE PLAIN;
+INSERT INTO t1 SELECT g, g || repeat('x', 8000) FROM generate_series(1, 256) g;
+
+EXPLAIN ANALYZE SELECT SUM(i) FROM t0 GROUP BY t;
+EXPLAIN ANALYZE SELECT SUM(i) FROM t1 GROUP BY t;
+
+\c -
+SELECT 'SELECT pg_sleep(1);' FROM generate_series(1, 10)
+\gexec
+
+BEGIN;
+SELECT 'EXPLAIN ANALYZE SELECT SUM(i) FROM t0; EXPLAIN ANALYZE SELECT SUM(i) FROM t1;' FROM generate_series(1, 20)
+\gexec
+
+SELECT pg_sleep(10);
+ROLLBACK;
diff --git a/src/test/regress/sql/stats.sql b/src/test/regress/sql/stats.sql
index 399c72bbcf..fcec1be377 100644
--- a/src/test/regress/sql/stats.sql
+++ b/src/test/regress/sql/stats.sql
@@ -426,6 +426,7 @@ SELECT wal_bytes AS wal_bytes_before FROM pg_stat_wal \gset
 CREATE TEMP TABLE test_stats_temp AS SELECT 17;
 DROP TABLE test_stats_temp;
 
+SELECT pg_sleep(10);
 -- Checkpoint twice: The checkpointer reports stats after reporting completion
 -- of the checkpoint. But after a second checkpoint we'll see at least the
 -- results of the first.
@@ -632,6 +633,7 @@ SELECT sum(extends) AS my_io_sum_shared_after_extends
   WHERE context = 'normal' AND object = 'relation' \gset
 SELECT :my_io_sum_shared_after_extends > :my_io_sum_shared_before_extends;
 
+SELECT pg_sleep(10);
 -- After a checkpoint, there should be some additional IOCONTEXT_NORMAL writes
 -- and fsyncs in the global stats (usually not for the backend).
 -- See comment above for rationale for two explicit CHECKPOINTs.
#2Shlok Kyal
shlok.kyal.oss@gmail.com
In reply to: Alexander Lakhin (#1)
Re: stats.sql might fail due to shared buffers also used by parallel tests

On Mon, 6 Jan 2025 at 12:30, Alexander Lakhin <exclusion@gmail.com> wrote:

Hello hackers,

I'd like to share my investigation of one mysterious stats.sql failure
occurred in December: [1].
The difference of the failure is:
--- C:/prog/bf/root/HEAD/pgsql/src/test/regress/expected/stats.out 2024-09-18 19:31:14.665516500 +0000
+++ C:/prog/bf/root/HEAD/pgsql.build/testrun/recovery/027_stream_regress/data/results/stats.out 2024-12-17
09:57:08.944588500 +0000
@@ -1291,7 +1291,7 @@
SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
?column?
----------
- t
+ f
(1 row)

027_stream_regress_primary.log contains:
2024-12-17 09:57:06.778 UTC [8568:109] pg_regress/stats LOG: statement: SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
FROM pg_stat_io
WHERE object = 'relation'
2024-12-17 09:57:06.779 UTC [8568:110] pg_regress/stats LOG: statement: CREATE TABLE test_io_shared(a int);
2024-12-17 09:57:06.780 UTC [8568:111] pg_regress/stats LOG: statement: INSERT INTO test_io_shared SELECT i FROM
generate_series(1,100)i;
...
2024-12-17 09:57:06.782 UTC [8568:115] pg_regress/stats LOG: statement: CHECKPOINT;
2024-12-17 09:57:06.794 UTC [3664:40] LOG: checkpoint starting: immediate force wait
2024-12-17 09:57:06.856 UTC [3664:41] LOG: checkpoint complete: wrote 0 buffers (0.0%), wrote 1 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.062 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=1875 kB, estimate=52682 kB; lsn=0/14A2F410, redo lsn=0/14A2F3B8
2024-12-17 09:57:06.857 UTC [8568:116] pg_regress/stats LOG: statement: CHECKPOINT;
2024-12-17 09:57:06.857 UTC [3664:42] LOG: checkpoint starting: immediate force wait
2024-12-17 09:57:06.859 UTC [3664:43] LOG: checkpoint complete: wrote 0 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL
file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0, longest=0.000 s,
average=0.000 s; distance=0 kB, estimate=47414 kB; lsn=0/14A2F4E0, redo lsn=0/14A2F488
2024-12-17 09:57:06.859 UTC [8568:117] pg_regress/stats LOG: statement: SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
FROM pg_stat_io
WHERE object = 'relation'
2024-12-17 09:57:06.860 UTC [8568:118] pg_regress/stats LOG: statement: SELECT 77693 > 77693;

The corresponding test case (added by 10a082bf7, dated 2023-02-11) is:
SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
FROM pg_stat_io
WHERE object = 'relation' \gset io_sum_shared_before_
...
CREATE TABLE test_io_shared(a int);
INSERT INTO test_io_shared SELECT i FROM generate_series(1,100)i;
...
CHECKPOINT;
CHECKPOINT;
...
SELECT sum(writes) AS writes, sum(fsyncs) AS fsyncs
FROM pg_stat_get_backend_io(pg_backend_pid())
WHERE object = 'relation' \gset my_io_sum_shared_after_
SELECT :my_io_sum_shared_after_writes >= :my_io_sum_shared_before_writes;

So, as we can see from the log above, both checkpoints wrote 0 buffers,
moreover, no other process/backend wrote any buffers.

After series of experiments, I found that this situation is possible when
other backend steals shared buffers from checkpointer, without making new
buffers dirty, and then delays flushing it's statistics.

Please find attached a reproducer for this issue: I've added two sleeps
to stats.sql and reduced parallel_schedule, but I believe it's a legal
change, which doesn't affect the test case essentially.

With bufchurn.sql running concurrently. and TEMP_CONFIG containing
shared_buffers = 1MB (as in the 027_stream_regress test in question), I can
see exactly the same failure:
$ echo 'shared_buffers = 1MB' >/tmp/extra.config; TEMP_CONFIG=/tmp/extra.config make -s check
# +++ regress check in src/test/regress +++
# initializing database system by copying initdb template
# using temp instance on port 65312 with PID 129700
ok 1 - test_setup 356 ms
ok 2 - create_index 1002 ms
# parallel group (2 tests): stats bufchurn
not ok 3 + stats 21176 ms
...

diff -U3 .../src/test/regress/expected/stats.out .../src/test/regress/results/stats.out
--- .../src/test/regress/expected/stats.out  2025-01-06 08:03:07.720275980 +0200
+++ .../src/test/regress/results/stats.out   2025-01-06 08:15:34.602170721 +0200
@@ -1321,7 +1321,7 @@
SELECT :io_sum_shared_after_writes > :io_sum_shared_before_writes;
?column?
----------
- t
+ f
(1 row)
...

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&amp;dt=2024-12-17%2008%3A59%3A44

Hi Hackers,

I have also encountered a similar buildfarm failure [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2025-06-28%2011%3A02%3A30.

| 1/1 + subscription 142 ms FAIL
1/1 postgresql:regress / regress/regress ERROR 284.85s exit status 1

diff --strip-trailing-cr -U3
c:/build-farm-local/buildroot/HEAD/pgsql/src/test/regress/expected/subscription.out
c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/regress/regress/results/subscription.out
--- c:/build-farm-local/buildroot/HEAD/pgsql/src/test/regress/expected/subscription.out
2025-06-28 20:13:02 +0900
+++ c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/regress/regress/results/subscription.out
2025-06-28 20:35:21 +0900
@@ -70,7 +70,7 @@
 SELECT :'prev_stats_reset' < stats_reset FROM
pg_stat_subscription_stats WHERE subname = 'regress_testsub';
  ?column?
 ----------
- t
+ f
 (1 row)

logs around this statement do not show unexpected thing:
2025-06-28 20:35:20.857 JST client backend[6092]
pg_regress/subscription STATEMENT: CREATE SUBSCRIPTION
regress_testsub CONNECTION 'testconn' PUBLICATION testpub;
2025-06-28 20:35:20.857 JST client backend[6092]
pg_regress/subscription ERROR: publication name "foo" used more than
once
2025-06-28 20:35:20.857 JST client backend[6092]
pg_regress/subscription STATEMENT: CREATE SUBSCRIPTION
regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION
foo, testpub, foo WITH (connect = false);
2025-06-28 20:35:20.858 JST client backend[6092]
pg_regress/subscription WARNING: subscription was created, but is not
connected
2025-06-28 20:35:20.858 JST client backend[6092]
pg_regress/subscription HINT: To initiate replication, you must
manually create the replication slot, enable the subscription, and
refresh the subscription.
2025-06-28 20:35:20.861 JST client backend[6092]
pg_regress/subscription ERROR: subscription "regress_testsub" already
exists

In subscription.sql, I looked at the test. The statement are like:
-- Reset the stats again and check if the new reset_stats is updated.
SELECT stats_reset as prev_stats_reset FROM pg_stat_subscription_stats
WHERE subname = 'regress_testsub' \gset
SELECT pg_stat_reset_subscription_stats(oid) FROM pg_subscription
WHERE subname = 'regress_testsub';
SELECT :'prev_stats_reset' < stats_reset FROM
pg_stat_subscription_stats WHERE subname = 'regress_testsub';

So according to me, I suspect the following causes
1) The time difference between 'prev_stats_reset' and current
'stats_reset' value is less than 1 microseconds.
'stats_reset' is of type 'timestamp with time zone' and the content of
it is like: '2025-06-30 21:01:07.925253+05:30'. So if the time
difference between 'prev_stats_reset' and current 'stats_reset' is
less than 1 microseconds. The query 'SELECT :'prev_stats_reset' <
stats_reset FROM pg_stat_subscription_stats WHERE subname =
'regress_testsub'' might return 'false' instead of 'true'.
But I was not able to reproduce such a scenario after multiple
testing. Even in high end machines, it takes at least a few
microseconds. Also there are multiple cases where we did similar
timestamp comparison in 'stats.sql' as well. And, I didn't find any
other failure related to such case. So, I feel this is not possible.

2) pg_stat_reset_subscription_stats(oid) function did not reset the stats.
We have a shared hash 'pgStatLocal.shared_hash'. If the entry
reference (for the subscription) is not found while executing
'pg_stat_reset_subscription_stats(oid)'. It may not be able to reset
the stats. Maybe somehow this shared hash is getting dropped..
Also, it could be failing due to the same reason as Alexander has
reproduced in another case [2]/messages/by-id/fe0391a8-dfa9-41c3-bf1c-7ea058e40f30@gmail.com.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2025-06-28%2011%3A02%3A30
[2]: /messages/by-id/fe0391a8-dfa9-41c3-bf1c-7ea058e40f30@gmail.com

Thanks and Regards,
Shlok Kyal

#3Alexander Lakhin
exclusion@gmail.com
In reply to: Shlok Kyal (#2)
Re: stats.sql might fail due to shared buffers also used by parallel tests

Hello Shlok,

03.07.2025 09:54, Shlok Kyal wrote:

I have also encountered a similar buildfarm failure [1].

| 1/1 + subscription 142 ms FAIL
1/1 postgresql:regress / regress/regress ERROR 284.85s exit status 1

diff --strip-trailing-cr -U3
c:/build-farm-local/buildroot/HEAD/pgsql/src/test/regress/expected/subscription.out
c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/regress/regress/results/subscription.out
--- c:/build-farm-local/buildroot/HEAD/pgsql/src/test/regress/expected/subscription.out
2025-06-28 20:13:02 +0900
+++ c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/regress/regress/results/subscription.out
2025-06-28 20:35:21 +0900
@@ -70,7 +70,7 @@
SELECT :'prev_stats_reset' < stats_reset FROM
pg_stat_subscription_stats WHERE subname = 'regress_testsub';
?column?
----------
- t
+ f
(1 row)

Thank you for your attention to this! Yes, I also noticed that failure,
but could not reproduce it yet.

So according to me, I suspect the following causes
1) The time difference between 'prev_stats_reset' and current
'stats_reset' value is less than 1 microseconds.
'stats_reset' is of type 'timestamp with time zone' and the content of
it is like: '2025-06-30 21:01:07.925253+05:30'. So if the time
difference between 'prev_stats_reset' and current 'stats_reset' is
less than 1 microseconds. The query 'SELECT :'prev_stats_reset' <
stats_reset FROM pg_stat_subscription_stats WHERE subname =
'regress_testsub'' might return 'false' instead of 'true'.
But I was not able to reproduce such a scenario after multiple
testing. Even in high end machines, it takes at least a few
microseconds. Also there are multiple cases where we did similar
timestamp comparison in 'stats.sql' as well. And, I didn't find any
other failure related to such case. So, I feel this is not possible.

Did you try that on Windows (hamerkop is a Windows animal)? IIUC,
GetCurrentTimestamp() -> gettimeofday() implemented on Windows via
GetSystemTimePreciseAsFileTime(), and it has 100ns resolution,
according to [1]https://stackoverflow.com/questions/4444090/measuring-time-with-a-resolution-of-microseconds-in-c.

I tried it on a Windows VM and saw the following numbers:
SELECT :'prev_stats_reset' < stats_reset, :'prev_stats_reset', stats_reset FROM pg_stat_subscription_stats WHERE subname
= 'regress_testsub';
 ?column? |              ?column?               | stats_reset
----------+-------------------------------------+-------------------------------------
 t        | Fri Jul 04 11:46:10.224323 2025 PDT | Fri Jul 04 11:46:10.224975 2025 PDT
 (1 row)

that is, the difference is about 600 ns — if I could make the VM 6x faster
or run the test on a bare metal, maybe I'd get the test failed.

Although the overall duration of the subscription doesn't differ much:
My VM:
# parallel group (2 tests):  subscription publication
ok 164       + publication                               806 ms
ok 165       + subscription                              210 ms
vs
hamerkop:
# parallel group (2 tests):  subscription publication
ok 164       + publication                              2383 ms
not ok 165   + subscription                              142 ms

So probably the failure is caused by something else.

2) pg_stat_reset_subscription_stats(oid) function did not reset the stats.
We have a shared hash 'pgStatLocal.shared_hash'. If the entry
reference (for the subscription) is not found while executing
'pg_stat_reset_subscription_stats(oid)'. It may not be able to reset
the stats. Maybe somehow this shared hash is getting dropped..
Also, it could be failing due to the same reason as Alexander has

I don't think 2) is relevant here, because shared buffers shouldn't affect
subscription's statistics.

[1]: https://stackoverflow.com/questions/4444090/measuring-time-with-a-resolution-of-microseconds-in-c

Best regards.
Alexander

#4Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#3)
RE: stats.sql might fail due to shared buffers also used by parallel tests

Dear Alexander,

So according to me, I suspect the following causes
1) The time difference between 'prev_stats_reset' and current
'stats_reset' value is less than 1 microseconds.
'stats_reset' is of type 'timestamp with time zone' and the content of
it is like: '2025-06-30 21:01:07.925253+05:30'. So if the time
difference between 'prev_stats_reset' and current 'stats_reset' is
less than 1 microseconds. The query 'SELECT :'prev_stats_reset' <
stats_reset FROM pg_stat_subscription_stats WHERE subname =
'regress_testsub'' might return 'false' instead of 'true'.
But I was not able to reproduce such a scenario after multiple
testing. Even in high end machines, it takes at least a few
microseconds. Also there are multiple cases where we did similar
timestamp comparison in 'stats.sql' as well. And, I didn't find any
other failure related to such case. So, I feel this is not possible.

Did you try that on Windows (hamerkop is a Windows animal)? IIUC,
GetCurrentTimestamp() -> gettimeofday() implemented on Windows via
GetSystemTimePreciseAsFileTime(), and it has 100ns resolution,

Hmm. I'm not familiar with the Windows environment, but I have the doubt for it.

GetSystemTimePreciseAsFileTime() returns FILETIME structure, which represents the
time UTC with 100-nanosecod intervals [1]https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-filetime. The stack overflow seemed to refer it.
However, the document for GetSystemTimePreciseAsFileTime() says that the
resolution is < 1 us [2]https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime. Also, MS doc [3]https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps does not say that
GetSystemTimePreciseAsFileTime() returns value monotonically.
Another API QueryPerformanceCounter() seems to have the monotony.

A bit old document [4]http://www.windowstimestamp.com/description#:~:text=2.1.4.2.%C2%A0%C2%A0Desktop%20Applications%3A%20GetSystemTimePreciseAsFileTime() also raised the possibility:

```
Consecutive calls may return the same result. The call time is less than the
smallest increment of the system time. The granularity is in the sub-microsecond
regime. The function may be used for time measurements but some care has to be
taken: Time differences may be ZERO.
```

Also, what if the the system clock is modified during the test via NTP?

2) pg_stat_reset_subscription_stats(oid) function did not reset the stats.
We have a shared hash 'pgStatLocal.shared_hash'. If the entry
reference (for the subscription) is not found while executing
'pg_stat_reset_subscription_stats(oid)'. It may not be able to reset
the stats. Maybe somehow this shared hash is getting dropped..
Also, it could be failing due to the same reason as Alexander has

I don't think 2) is relevant here, because shared buffers shouldn't affect
subscription's statistics.

To confirm; we do not consider the possibility that pgstat_get_entry_ref() returns
NULL right?

[1]: https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-filetime
[2]: https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-getsystemtimepreciseasfiletime
[3]: https://learn.microsoft.com/en-us/windows/win32/sysinfo/acquiring-high-resolution-time-stamps
[4]: http://www.windowstimestamp.com/description#:~:text=2.1.4.2.%C2%A0%C2%A0Desktop%20Applications%3A%20GetSystemTimePreciseAsFileTime()

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#5Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#4)
Re: stats.sql might fail due to shared buffers also used by parallel tests

Hello Kuroda-san,

Thank you for your attention to this!

15.07.2025 10:33, Hayato Kuroda (Fujitsu) wrote:

GetSystemTimePreciseAsFileTime() returns FILETIME structure, which represents the
time UTC with 100-nanosecod intervals [1]. The stack overflow seemed to refer it.
However, the document for GetSystemTimePreciseAsFileTime() says that the
resolution is < 1 us [2]. Also, MS doc [3] does not say that
GetSystemTimePreciseAsFileTime() returns value monotonically.
Another API QueryPerformanceCounter() seems to have the monotony.

A bit old document [4] also raised the possibility:

```
Consecutive calls may return the same result. The call time is less than the
smallest increment of the system time. The granularity is in the sub-microsecond
regime. The function may be used for time measurements but some care has to be
taken: Time differences may be ZERO.
```

Also, what if the the system clock is modified during the test via NTP?

Yeah, I made a simple test for GetSystemTimePreciseAsFileTime() and
confirmed that in my VM it provides sub-microsecond precision. Regarding
NTP, I think the second failure of this ilk [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2025-07-09%2011%3A02%3A23 makes this cause close to
impossible. (Can't wait for the third one to gather more information.)

2) pg_stat_reset_subscription_stats(oid) function did not reset the stats.
We have a shared hash 'pgStatLocal.shared_hash'. If the entry
reference (for the subscription) is not found while executing
'pg_stat_reset_subscription_stats(oid)'. It may not be able to reset
the stats. Maybe somehow this shared hash is getting dropped..
Also, it could be failing due to the same reason as Alexander has

I don't think 2) is relevant here, because shared buffers shouldn't affect
subscription's statistics.

To confirm; we do not consider the possibility that pgstat_get_entry_ref() returns
NULL right?

I've held a simple experiment with a modification like this:
@@ -1078,6 +1078,7 @@ pgstat_reset_entry(PgStat_Kind kind, Oid dboid, uint64 objid, TimestampTz ts)
     Assert(!pgstat_get_kind_info(kind)->fixed_amount);

     entry_ref = pgstat_get_entry_ref(kind, dboid, objid, false, NULL);
+if (rand() % 3 == 0) entry_ref = NULL;
     if (!entry_ref || entry_ref->shared_entry->dropped)

and got several failures like:
--- .../postgresql/src/test/regress/expected/subscription.out 2025-04-25 10:27:32.851554400 -0700
+++ .../postgresql/build/testrun/regress/regress/results/subscription.out 2025-07-20 00:05:05.667903300 -0700
@@ -56,7 +56,7 @@
  SELECT subname, stats_reset IS NULL stats_reset_is_null FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
       subname     | stats_reset_is_null
  -----------------+---------------------
- regress_testsub | f
+ regress_testsub | t
  (1 row)

 -- Reset the stats again and check if the new reset_stats is updated.
@@ -68,11 +68,9 @@
 (1 row)

  SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscription_stats WHERE subname = 'regress_testsub';
- ?column?
-----------
- t
-(1 row)
-
+ERROR:  syntax error at or near ":"
+LINE 1: SELECT :'prev_stats_reset' < stats_reset FROM pg_stat_subscr...
+
--- .../postgresql/src/test/regress/expected/stats.out    2025-04-25 10:27:36.930322500 -0700
+++ .../postgresql/build/testrun/regress/regress/results/stats.out 2025-07-20 00:05:19.579864900 -0700
@@ -1720,7 +1720,7 @@
  SELECT :my_io_stats_pre_reset > :my_io_stats_post_backend_reset;
   ?column?
  ----------
- t
+ f
  (1 row)
...

Thus, if there is some issue with pgstat_get_entry_ref(), then it should
be specific to subscriptions and come out in that place only (given the
information we have now).

So I still suspect some Windows/concrete animal's peculiarity.

Nagata-san, could you please share the configuration of hamerkop? If it's
running inside VM, what virtualization software is used?

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2025-07-09%2011%3A02%3A23

Best regards.
Alexander

#6Yugo Nagata
nagata@sraoss.co.jp
In reply to: Alexander Lakhin (#5)
Re: stats.sql might fail due to shared buffers also used by parallel tests

Hello Alexander,

Nagata-san, could you please share the configuration of hamerkop? If it's
running inside VM, what virtualization software is used?

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2025-07-09%2011%3A02%3A23

I am not the person in charge of it, so I added the responsible email address to the CC list.
Takatsuka-san, could you please take care of this?

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

#7Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#5)
Re: stats.sql might fail due to shared buffers also used by parallel tests

Hello Kuroda-san,

20.07.2025 11:00, Alexander Lakhin wrote:

Yeah, I made a simple test for GetSystemTimePreciseAsFileTime() and
confirmed that in my VM it provides sub-microsecond precision. Regarding
NTP, I think the second failure of this ilk [1] makes this cause close to
impossible. (Can't wait for the third one to gather more information.)

And here it is [1]:
diff --strip-trailing-cr -U3 c:/build-farm-local/buildroot/HEAD/pgsql/src/test/isolation/expected/stats.out 
c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/isolation/isolation/results/stats.out
--- c:/build-farm-local/buildroot/HEAD/pgsql/src/test/isolation/expected/stats.out 2025-07-22 20:08:30 +0900
+++ c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/isolation/isolation/results/stats.out 2025-07-22 20:30:47 +0900
@@ -3729,7 +3729,7 @@
  name |pg_stat_get_function_calls|total_above_zero|self_above_zero
  --------------+--------------------------+----------------+---------------
-test_stat_func|                         1|t |t
+test_stat_func|                         1|f |f
  (1 row)

Not related to subscriptions this time, but still related to pg_stat and
time measurement.

There was the same looking failure on caiman, which is running on
continuously updated Fedora: [2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=caiman&amp;dt=2025-02-23%2006%3A54%3A54.

I could not reproduce that one either, but suspected the OS kernel:
It had 6.14.0-0.rc3.20250219git6537cfb395f3.31.fc43.x86_64 at the moment
of the failure and updated to 6.14.0-0.rc4.36.fc43.x86_64 on 2025-02-27.
There was no failures of this kind since then.

So I think we could observe such anomalies if, say, the OS kernel can't
read system clock in time (stalls for a millisecond when accessing it)...

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2025-07-22%2011%3A02%3A15
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=caiman&amp;dt=2025-02-23%2006%3A54%3A54

Best regards,
Alexander

#8TAKATSUKA Haruka
harukat@sraoss.co.jp
In reply to: Alexander Lakhin (#1)
Re: [Buildfarm:84] Re: stats.sql might fail due to shared buffers also used by parallel tests

Hello Alexander,

On Wed, 23 Jul 2025 00:55:37 +0000
Yugo Nagata - Buildfarm <buildfarm@ml.sraoss.co.jp> wrote:

Nagata-san, could you please share the configuration of hamerkop? If it's
running inside VM, what virtualization software is used?

It's vmware ESXi 7.0.3 (21930508).
This Windows enable auto-synchronize with 'time.windows.com'.
VMware Tools 12.1.5 build 20735119 is installed in the VM.

Thanks,
Haruka Takatsuka

Show quoted text

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop&amp;dt=2025-07-09%2011%3A02%3A23

I am not the person in charge of it, so I added the responsible email address to the CC list.
Takatsuka-san, could you please take care of this?

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>
Cache content lost

#9Hayato Kuroda (Fujitsu)
kuroda.hayato@fujitsu.com
In reply to: Alexander Lakhin (#7)
RE: stats.sql might fail due to shared buffers also used by parallel tests

Dear Alexander,

And here it is [1]:
diff --strip-trailing-cr -U3
c:/build-farm-local/buildroot/HEAD/pgsql/src/test/isolation/expected/stats.ou
t
c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/isolation/isolation/res
ults/stats.out
---
c:/build-farm-local/buildroot/HEAD/pgsql/src/test/isolation/expected/stats.ou
t 2025-07-22 20:08:30 +0900
+++
c:/build-farm-local/buildroot/HEAD/pgsql.build/testrun/isolation/isolation/res
ults/stats.out 2025-07-22 20:30:47 +0900
@@ -3729,7 +3729,7 @@

name |pg_stat_get_function_calls|total_above_zero|self_above_zero
--------------+--------------------------+----------------+---------------
-test_stat_func|                         1|t |t
+test_stat_func|                         1|f |f
(1 row)

Not related to subscriptions this time, but still related to pg_stat and
time measurement.

It looks like for me that we measured the execution time of the function in
millisecond but it was "zero", right?

So I think we could observe such anomalies if, say, the OS kernel can't
read system clock in time (stalls for a millisecond when accessing it)...

I also feel like that. But if so, how should we fix tests? We must remove all
stuff which assumes the time is monotonic?

Best regards,
Hayato Kuroda
FUJITSU LIMITED

#10Alexander Lakhin
exclusion@gmail.com
In reply to: TAKATSUKA Haruka (#8)
Re: [Buildfarm:84] Re: stats.sql might fail due to shared buffers also used by parallel tests

Hello Takatsuka-san,

23.07.2025 08:48, TAKATSUKA Haruka wrote:

Hello Alexander,

On Wed, 23 Jul 2025 00:55:37 +0000
Yugo Nagata - Buildfarm<buildfarm@ml.sraoss.co.jp> wrote:

Nagata-san, could you please share the configuration of hamerkop? If it's
running inside VM, what virtualization software is used?

It's vmware ESXi 7.0.3 (21930508).
This Windows enable auto-synchronize with 'time.windows.com'.
VMware Tools 12.1.5 build 20735119 is installed in the VM.

Thank you for your prompt response!

I used VirtualBox, not VMWare, when trying to reproduce the issue, maybe
that's the reason why I failed. It looks like VMWare has it's own
specifics related to timekeeping: [1]https://knowledge.broadcom.com/external/article?legacyId=1318.

Maybe you could try tools.syncTime = "0" by any chance?

There is also an interesting note in VMware Tools 11 docs [2]https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/tools/11-1-0/vmware-tools-administration-11-1-0/configuring-vmware-tools-components/using-vmware-tools-configuration-utility/enable-periodic-time-synchronization.html:
If the clock on the guest operating system is ahead of the clock on the
host, VMware Tools causes the clock on the guest to run more slowly until
the clocks are synchronized.

But still it's hard to say without experimentation whether this can cause
the observed effect.

[1]: https://knowledge.broadcom.com/external/article?legacyId=1318
[2]: https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/tools/11-1-0/vmware-tools-administration-11-1-0/configuring-vmware-tools-components/using-vmware-tools-configuration-utility/enable-periodic-time-synchronization.html
https://techdocs.broadcom.com/us/en/vmware-cis/vsphere/tools/11-1-0/vmware-tools-administration-11-1-0/configuring-vmware-tools-components/using-vmware-tools-configuration-utility/enable-periodic-time-synchronization.html

Best regards,
Alexander

#11TAKATSUKA Haruka
harukat@sraoss.co.jp
In reply to: Alexander Lakhin (#10)
Re: [Buildfarm:84] Re: stats.sql might fail due to shared buffers also used by parallel tests

On Wed, 23 Jul 2025 23:00:00 +0300
Alexander Lakhin <exclusion@gmail.com> wrote:

{snip}

Nagata-san, could you please share the configuration of hamerkop? If it's
running inside VM, what virtualization software is used?

It's vmware ESXi 7.0.3 (21930508).
This Windows enable auto-synchronize with 'time.windows.com'.
VMware Tools 12.1.5 build 20735119 is installed in the VM.

Thank you for your prompt response!

I used VirtualBox, not VMWare, when trying to reproduce the issue, maybe
that's the reason why I failed. It looks like VMWare has it's own
specifics related to timekeeping: [1].

Maybe you could try tools.syncTime = "0" by any chance?

It has been already tools.syncTime = "0" so far.
I confirmed the following GUI setting.

VM Options
VMware Tools
Time "Synchronize guest time with host": unchecked

Thanks,
Haruka Takatsuka

#12Alexander Lakhin
exclusion@gmail.com
In reply to: Hayato Kuroda (Fujitsu) (#9)
1 attachment(s)
Re: stats.sql might fail due to shared buffers also used by parallel tests

Hello Kuroda-san and Takatsuka-san,

24.07.2025 03:49, TAKATSUKA Haruka wrote:

{snip}

Maybe you could try tools.syncTime = "0" by any chance?

It has been already tools.syncTime = "0" so far.
I confirmed the following GUI setting.
...

23.07.2025 09:15, Hayato Kuroda (Fujitsu) wrote:

It looks like for me that we measured the execution time of the function in
millisecond but it was "zero", right?

Yes, my understanding is the same.

So I think we could observe such anomalies if, say, the OS kernel can't
read system clock in time (stalls for a millisecond when accessing it)...

I also feel like that. But if so, how should we fix tests? We must remove all
stuff which assumes the time is monotonic?

From what Takatsuka-san shared on hamerkop's configuration, I still
suspect there could be some platform specifics there. I've found another
interesting reading on the subject, which describes effects of CPU
pressure and mentions other low-level parameters, e. g.
monitor_control.virtual_rdtsc: [1]https://www.vmware.com/docs/vmware_timekeeping.

Probably there could be some experiments performed there to measure the
maximum timer resolution (e. g. with a simple program attached).

I also observed a failure of pg_stat_statements on ARMv7 device in the past:
--- .../contrib/pg_stat_statements/expected/entry_timestamp.out 2024-04-11 07:20:32.563588101 +0300
+++ .../contrib/pg_stat_statements/results/entry_timestamp.out 2024-04-15 11:16:00.217396694 +0300
@@ -45,7 +45,7 @@
  WHERE query LIKE '%STMTTS%';
   total | minmax_plan_zero | minmax_exec_zero | minmax_stats_since_after_ref | stats_since_after_ref
  -------+------------------+------------------+------------------------------+-----------------------
-     2 |                0 |                0 |                            0 |                     0
+     2 |                0 |                1 |                            0 |                     0
  (1 row)

with clocksource =  32k_counter, which gave me the maximum resolution
0.030517 sec.

So if to choose fixing tests, then it's not clear to me, what lowest timer
resolution to consider acceptable.

[1]: https://www.vmware.com/docs/vmware_timekeeping

Best regards,
Alexander

Attachments:

test-gst.ctext/x-csrc; charset=UTF-8; name=test-gst.cDownload