Regression test postgres_fdw might fail due to autovacuum
Hello hackers,
A recent buildfarm failure [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2025-03-14%2003%3A47%3A50 with the following diagnostics:
72/72 postgresql:postgres_fdw-running / postgres_fdw-running/regress ERROR 19.04s exit status 1
postgres_fdw-running/regress/results/postgres_fdw.out
--- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-03-11 15:21:27.681846597
+0000
+++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out
2025-03-14 04:02:32.573999799 +0000
@@ -6392,6 +6392,7 @@
UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
------+----+-----+----+----+----+------------+----
+ 2010 | 0 | bar | | | | ft2 |
2001 | 1 | bar | | | | ft2 |
2002 | 2 | bar | | | | ft2 |
2003 | 3 | bar | | | | ft2 |
@@ -6401,7 +6402,6 @@
2007 | 7 | bar | | | | ft2 |
2008 | 8 | bar | | | | ft2 |
2009 | 9 | bar | | | | ft2 |
- 2010 | 0 | bar | | | | ft2 |
(10 rows)
EXPLAIN (verbose, costs off)
shows that the UPDATE ... RETURNING query might be unstable due to lack of
ORDER BY.
I managed to reproduce this failure locally with the attached patch
applied to make the test repeatable and with:
sed 's/REGRESS = postgres_fdw.*/REGRESS = $(shell printf "postgres_fdw %.0s" `seq 50`)/' -i contrib/postgres_fdw/Makefile
(Running 10 instances of the test in parallel eases reproducing as well.)
I also added SELECT ctid, * FROM ft2; just after the query in question and
found out that the results of the SELECT more unstable, but for the UPDATE
to produce the difference, the tuple with c1 == 2010 must reside on a new
page. For example:
--- /home/vagrant/postgresql/contrib/postgres_fdw_2/expected/postgres_fdw.out 2025-03-22 05:21:39.414773284 +0000
+++ /home/vagrant/postgresql/contrib/postgres_fdw_2/results/postgres_fdw.out 2025-03-23 04:43:34.608281935 +0000
@@ -6399,6 +6399,7 @@
UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;
c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8
------+----+-----+----+----+----+------------+----
+ 2010 | 0 | bar | | | | ft2 |
2001 | 1 | bar | | | | ft2 |
2002 | 2 | bar | | | | ft2 |
2003 | 3 | bar | | | | ft2 |
@@ -6408,7 +6409,6 @@
2007 | 7 | bar | | | | ft2 |
2008 | 8 | bar | | | | ft2 |
2009 | 9 | bar | | | | ft2 |
- 2010 | 0 | bar | | | | ft2 |
SELECT ctid, * FROM ft2;
@@ -6470,15 +6470,6 @@
(0,103) | 57 | 407 | 00057_update7 | Fri Feb 27 00:00:00 1970 PST | Fri Feb 27 00:00:00 1970 | 7 | 7
| foo
(0,104) | 67 | 407 | 00067_update7 | Mon Mar 09 00:00:00 1970 PST | Mon Mar 09 00:00:00 1970 | 7 | 7
| foo
(0,105) | 77 | 407 | 00077_update7 | Thu Mar 19 00:00:00 1970 PST | Thu Mar 19 00:00:00 1970 | 7 | 7
| foo
- (0,106) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2
| foo
- (0,107) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2
| foo
- (0,108) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2
| foo
- (0,109) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2
| foo
- (0,110) | 49 | 509 | 00049_update9 | Thu Feb 19 00:00:00 1970 PST | Thu Feb 19 00:00:00 1970 | 9 | ft2
| foo
- (0,111) | 59 | 509 | 00059_update9 | Sun Mar 01 00:00:00 1970 PST | Sun Mar 01 00:00:00 1970 | 9 | ft2
| foo
- (0,112) | 69 | 509 | 00069_update9 | Wed Mar 11 00:00:00 1970 PST | Wed Mar 11 00:00:00 1970 | 9 | ft2
| foo
- (0,113) | 79 | 509 | 00079_update9 | Sat Mar 21 00:00:00 1970 PST | Sat Mar 21 00:00:00 1970 | 9 | ft2
| foo
- (0,114) | 89 | 509 | 00089_update9 | Tue Mar 31 00:00:00 1970 PST | Tue Mar 31 00:00:00 1970 | 9 | ft2
| foo
(1,1) | 98 | 8 | 00098 | Thu Apr 09 00:00:00 1970 PST | Thu Apr 09 00:00:00 1970 | 8 | 8
| foo
(1,3) | 100 | 0 | 00100 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0
| foo
(1,4) | 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1
| foo
...
- (12,137) | 2007 | 7 | bar | | | | ft2 |
- (12,138) | 2008 | 8 | bar | | | | ft2 |
- (12,139) | 2009 | 9 | bar | | | | ft2 |
- (12,140) | 2010 | 0 | bar | | | | ft2 |
+ (11,80) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2
| foo
+ (11,81) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2
| foo
+ (11,82) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2
| foo
+ (11,83) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2
| foo
...
+ (13,126) | 1013 | 403 | 0001300013_update3 | | | | ft2 |
+ (13,127) | 1019 | 609 | 0001900019_update9 | | | | ft2 |
+ (13,128) | 1103 | 503 | ccc_update3 | | | | ft2 |
+ (14,1) | 2010 | 0 | bar | | | | ft2 |
(829 rows)
That is, for all the UPDATE failures I could see, the tuple with c1 == 2010
has ctid == (14, 1).
Interestingly enough, with "log_autovacuum_min_duration = 0" added to
TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
indirectly.
With autovacuum = off, all of these fluctuations go away.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2025-03-14%2003%3A47%3A50
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Attachments:
postgres_fdw-repeatable.patchtext/x-patch; charset=UTF-8; name=postgres_fdw-repeatable.patchDownload
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb4ed3059c4..7e6b9035571 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1,6 +1,13 @@
-- ===================================================================
-- create FDW objects
-- ===================================================================
+SET client_min_messages TO 'warning';
+DROP EXTENSION IF EXISTS postgres_fdw CASCADE;
+DROP SCHEMA IF EXISTS "S 1", import_source, import_dest1, import_dest2, import_dest3, import_dest4, import_dest5 CASCADE;
+DROP TYPE IF EXISTS user_enum, "Colors", typ1 CASCADE;
+DROP TABLE IF EXISTS loct_empty, loct3, child_tbl, loc1, gloc1, trig_null, loct4, loc2, fprt1, fprt1_p1, fprt1_p2, fprt2, fprt2_p1, fprt2_p2, pagg_tab, pagg_tab_p1, pagg_tab_p2, pagg_tab_p3, ploc1, ploc2 CASCADE;
+DROP FUNCTION IF EXISTS trigger_func, trig_row_before_insupdate, trig_null, br_insert_trigfunc CASCADE;
+RESET client_min_messages;
CREATE EXTENSION postgres_fdw;
CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
DO $d$
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index d45e9f8ab52..b3731df6528 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2,6 +2,14 @@
-- create FDW objects
-- ===================================================================
+SET client_min_messages TO 'warning';
+DROP EXTENSION IF EXISTS postgres_fdw CASCADE;
+DROP SCHEMA IF EXISTS "S 1", import_source, import_dest1, import_dest2, import_dest3, import_dest4, import_dest5 CASCADE;
+DROP TYPE IF EXISTS user_enum, "Colors", typ1 CASCADE;
+DROP TABLE IF EXISTS loct_empty, loct3, child_tbl, loc1, gloc1, trig_null, loct4, loc2, fprt1, fprt1_p1, fprt1_p2, fprt2, fprt2_p1, fprt2_p2, pagg_tab, pagg_tab_p1, pagg_tab_p2, pagg_tab_p3, ploc1, ploc2 CASCADE;
+DROP FUNCTION IF EXISTS trigger_func, trig_row_before_insupdate, trig_null, br_insert_trigfunc CASCADE;
+RESET client_min_messages;
+
CREATE EXTENSION postgres_fdw;
CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
On 2025/03/23 23:00, Alexander Lakhin wrote:
Hello hackers,
A recent buildfarm failure [1] with the following diagnostics:
72/72 postgresql:postgres_fdw-running / postgres_fdw-running/regress ERROR 19.04s exit status 1postgres_fdw-running/regress/results/postgres_fdw.out --- /home/bf/bf-build/culicidae/HEAD/pgsql/contrib/postgres_fdw/expected/postgres_fdw.out 2025-03-11 15:21:27.681846597 +0000 +++ /home/bf/bf-build/culicidae/HEAD/pgsql.build/testrun/postgres_fdw-running/regress/results/postgres_fdw.out 2025-03-14 04:02:32.573999799 +0000 @@ -6392,6 +6392,7 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ------+----+-----+----+----+----+------------+---- + 2010 | 0 | bar | | | | ft2 | 2001 | 1 | bar | | | | ft2 | 2002 | 2 | bar | | | | ft2 | 2003 | 3 | bar | | | | ft2 | @@ -6401,7 +6402,6 @@ 2007 | 7 | bar | | | | ft2 | 2008 | 8 | bar | | | | ft2 | 2009 | 9 | bar | | | | ft2 | - 2010 | 0 | bar | | | | ft2 | (10 rows)EXPLAIN (verbose, costs off)
shows that the UPDATE ... RETURNING query might be unstable due to lack of
ORDER BY.I managed to reproduce this failure locally with the attached patch
applied to make the test repeatable and with:
sed 's/REGRESS = postgres_fdw.*/REGRESS = $(shell printf "postgres_fdw %.0s" `seq 50`)/' -i contrib/postgres_fdw/Makefile
(Running 10 instances of the test in parallel eases reproducing as well.)I also added SELECT ctid, * FROM ft2; just after the query in question and found out that the results of the SELECT more unstable, but for the UPDATE to produce the difference, the tuple with c1 == 2010 must reside on a new page. For example: --- /home/vagrant/postgresql/contrib/postgres_fdw_2/expected/postgres_fdw.out 2025-03-22 05:21:39.414773284 +0000 +++ /home/vagrant/postgresql/contrib/postgres_fdw_2/results/postgres_fdw.out 2025-03-23 04:43:34.608281935 +0000 @@ -6399,6 +6399,7 @@ UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *; c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 ------+----+-----+----+----+----+------------+---- + 2010 | 0 | bar | | | | ft2 | 2001 | 1 | bar | | | | ft2 | 2002 | 2 | bar | | | | ft2 | 2003 | 3 | bar | | | | ft2 | @@ -6408,7 +6409,6 @@ 2007 | 7 | bar | | | | ft2 | 2008 | 8 | bar | | | | ft2 | 2009 | 9 | bar | | | | ft2 | - 2010 | 0 | bar | | | | ft2 |SELECT ctid, * FROM ft2; @@ -6470,15 +6470,6 @@ (0,103) | 57 | 407 | 00057_update7 | Fri Feb 27 00:00:00 1970 PST | Fri Feb 27 00:00:00 1970 | 7 | 7 | foo (0,104) | 67 | 407 | 00067_update7 | Mon Mar 09 00:00:00 1970 PST | Mon Mar 09 00:00:00 1970 | 7 | 7 | foo (0,105) | 77 | 407 | 00077_update7 | Thu Mar 19 00:00:00 1970 PST | Thu Mar 19 00:00:00 1970 | 7 | 7 | foo - (0,106) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2 | foo - (0,107) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2 | foo - (0,108) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2 | foo - (0,109) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2 | foo - (0,110) | 49 | 509 | 00049_update9 | Thu Feb 19 00:00:00 1970 PST | Thu Feb 19 00:00:00 1970 | 9 | ft2 | foo - (0,111) | 59 | 509 | 00059_update9 | Sun Mar 01 00:00:00 1970 PST | Sun Mar 01 00:00:00 1970 | 9 | ft2 | foo - (0,112) | 69 | 509 | 00069_update9 | Wed Mar 11 00:00:00 1970 PST | Wed Mar 11 00:00:00 1970 | 9 | ft2 | foo - (0,113) | 79 | 509 | 00079_update9 | Sat Mar 21 00:00:00 1970 PST | Sat Mar 21 00:00:00 1970 | 9 | ft2 | foo - (0,114) | 89 | 509 | 00089_update9 | Tue Mar 31 00:00:00 1970 PST | Tue Mar 31 00:00:00 1970 | 9 | ft2 | foo (1,1) | 98 | 8 | 00098 | Thu Apr 09 00:00:00 1970 PST | Thu Apr 09 00:00:00 1970 | 8 | 8 | foo (1,3) | 100 | 0 | 00100 | Thu Jan 01 00:00:00 1970 PST | Thu Jan 01 00:00:00 1970 | 0 | 0 | foo (1,4) | 101 | 1 | 00101 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo ...- (12,137) | 2007 | 7 | bar | | | | ft2 | - (12,138) | 2008 | 8 | bar | | | | ft2 | - (12,139) | 2009 | 9 | bar | | | | ft2 | - (12,140) | 2010 | 0 | bar | | | | ft2 | + (11,80) | 9 | 509 | 00009_update9 | Sat Jan 10 00:00:00 1970 PST | Sat Jan 10 00:00:00 1970 | 9 | ft2 | foo + (11,81) | 19 | 509 | 00019_update9 | Tue Jan 20 00:00:00 1970 PST | Tue Jan 20 00:00:00 1970 | 9 | ft2 | foo + (11,82) | 29 | 509 | 00029_update9 | Fri Jan 30 00:00:00 1970 PST | Fri Jan 30 00:00:00 1970 | 9 | ft2 | foo + (11,83) | 39 | 509 | 00039_update9 | Mon Feb 09 00:00:00 1970 PST | Mon Feb 09 00:00:00 1970 | 9 | ft2 | foo ... + (13,126) | 1013 | 403 | 0001300013_update3 | | | | ft2 | + (13,127) | 1019 | 609 | 0001900019_update9 | | | | ft2 | + (13,128) | 1103 | 503 | ccc_update3 | | | | ft2 | + (14,1) | 2010 | 0 | bar | | | | ft2 | (829 rows)That is, for all the UPDATE failures I could see, the tuple with c1 == 2010
has ctid == (14, 1).Interestingly enough, with "log_autovacuum_min_duration = 0" added to
TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
indirectly.
Thanks for reporting and investigating this issue!
With autovacuum = off, all of these fluctuations go away.
So you are suggesting disabling autovacuum during the postgres_fdw regression test?
Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this:
WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ...
Thought?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Sun, Mar 23, 2025 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
Interestingly enough, with "log_autovacuum_min_duration = 0" added to
TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
indirectly.With autovacuum = off, all of these fluctuations go away.
If autovacuum is somehow corrupting the table, then isn't there a bug
that needs to be fixed? Also, are you only able to reproduce this on
master?
- Melanie
Hello Masao-san and Melanie,
Thank you for your attention to this!
23.03.2025 19:26, Fujii Masao wrote:
With autovacuum = off, all of these fluctuations go away.
So you are suggesting disabling autovacuum during the postgres_fdw regression test?
No, I'd prefer to have the test stable with working autovacuum. I just
wanted to point out that this failure is definitely caused by autovacuum
despite the lack of corresponding messages in the postmaster log.
Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this:
WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ...
Thought?
Yeah, maybe such a trick will do.
23.03.2025 19:30, Melanie Plageman wrote:
On Sun, Mar 23, 2025 at 10:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
With autovacuum = off, all of these fluctuations go away.
If autovacuum is somehow corrupting the table, then isn't there a bug
that needs to be fixed? Also, are you only able to reproduce this on
master?
Yes, I reproduced this on master, but I don't think the table is
corrupted in this case, it just contains tuples in other order.
Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Hi,
On 2025-03-24 02:26:35 +0900, Fujii Masao wrote:
With autovacuum = off, all of these fluctuations go away.
So you are suggesting disabling autovacuum during the postgres_fdw regression test?
I don't think that's a good fix in this case.
Just my idea to stabilize the test with "RETURNING *" is to use WITH, like this:
WITH tmp AS (UPDATE ... RETURNING *) SELECT * FROM tmp ORDER BY ...
+1
On 2025-03-23 16:00:00 +0200, Alexander Lakhin wrote:
Interestingly enough, with "log_autovacuum_min_duration = 0" added to
TEMP_CONFIG, I can't see "automatic vacuum/analyze" messages related
to ft2/ "S 1"."T 1", so autovacuum somehow affects contents of this table
indirectly.With autovacuum = off, all of these fluctuations go away.
Hm. Is it possible that autovacuum started but didn't finish?
FWIW, I observed some oddities around autovacuum=0 recently, but didn't have
time to track the source down yet. E.g. running pg_trgm's tests against a
server with autovacuum=0 results in test failures due to plan changes, despite
no AVs getting launched.
Greetings,
Andres