A failure in prepared_xacts test
Yesterday I noticed a failure on cirrus-ci for the 'Right Semi Join'
patch. The failure can be found at [1]https://api.cirrus-ci.com/v1/artifact/task/6220592364388352/testrun/build/testrun/regress-running/regress/regression.diffs, and it looks like:
--- /tmp/cirrus-ci-build/src/test/regress/expected/prepared_xacts.out
2024-04-27 00:41:25.831297000 +0000
+++
/tmp/cirrus-ci-build/build/testrun/regress-running/regress/results/prepared_xacts.out
2024-04-27 00:45:50.261369000 +0000
@@ -83,8 +83,9 @@
SELECT gid FROM pg_prepared_xacts;
gid
------
+ gxid
foo3
-(1 row)
+(2 rows)
Upon closer look, it seems that this issue is not caused by the patch
about 'Right Semi Join', because this query, which initially included
two left joins, can actually be reduced to a function scan after
removing these two useless left joins. It seems that no semi-joins
would be involved.
EXPLAIN SELECT gid FROM pg_prepared_xacts;
QUERY PLAN
----------------------------------------------------------------------------
Function Scan on pg_prepared_xact p (cost=0.00..10.00 rows=1000 width=32)
(1 row)
Does anyone have any clue to this failure?
FWIW, after another run of this test, the failure just disappears. Does
it suggest that the test case is flaky?
[1]: https://api.cirrus-ci.com/v1/artifact/task/6220592364388352/testrun/build/testrun/regress-running/regress/regression.diffs
https://api.cirrus-ci.com/v1/artifact/task/6220592364388352/testrun/build/testrun/regress-running/regress/regression.diffs
Thanks
Richard
On Mon, Apr 29, 2024 at 09:12:48AM +0800, Richard Guo wrote:
Does anyone have any clue to this failure?
FWIW, after another run of this test, the failure just disappears. Does
it suggest that the test case is flaky?
If you grep the source tree, you'd notice that a prepared transaction
named gxid only exists in the 2PC tests of ECPG, in
src/interfaces/ecpg/test/sql/twophase.pgc. So the origin of the
failure comes from a race condition due to test parallelization,
because the scan of pg_prepared_xacts affects all databases with
installcheck, and in your case it means that the scan of
pg_prepared_xacts was running in parallel of the ECPG tests with an
installcheck.
The only location in the whole tree where we want to do predictible
scans of pg_prepared_xacts is prepared_xacts.sql, so rather than
playing with 2PC transactions across a bunch of tests, I think that we
should do two things, both touching prepared_xacts.sql:
- The 2PC transactions run in the main regression test suite should
use names that would be unlikely used elsewhere.
- Limit the scans of pg_prepared_xacts on these name patterns to avoid
interferences.
See for example the attached with both expected outputs updated
depending on the value set for max_prepared_transactions in the
backend. There may be an argument in back-patching that, but I don't
recall seeing this failure in the CI, so perhaps that's not worth
bothering with. What do you think?
--
Michael
Attachments:
2pc-tests.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/test/regress/expected/prepared_xacts.out b/src/test/regress/expected/prepared_xacts.out
index ba8e3ccc6c..515a2ada9d 100644
--- a/src/test/regress/expected/prepared_xacts.out
+++ b/src/test/regress/expected/prepared_xacts.out
@@ -17,7 +17,7 @@ SELECT * FROM pxtest1;
bbb
(1 row)
-PREPARE TRANSACTION 'foo1';
+PREPARE TRANSACTION 'regress_foo1';
SELECT * FROM pxtest1;
foobar
--------
@@ -25,21 +25,21 @@ SELECT * FROM pxtest1;
(1 row)
-- Test pg_prepared_xacts system view
-SELECT gid FROM pg_prepared_xacts;
- gid
-------
- foo1
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
+ gid
+--------------
+ regress_foo1
(1 row)
-- Test ROLLBACK PREPARED
-ROLLBACK PREPARED 'foo1';
+ROLLBACK PREPARED 'regress_foo1';
SELECT * FROM pxtest1;
foobar
--------
aaa
(1 row)
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -54,14 +54,14 @@ SELECT * FROM pxtest1;
ddd
(2 rows)
-PREPARE TRANSACTION 'foo2';
+PREPARE TRANSACTION 'regress_foo2';
SELECT * FROM pxtest1;
foobar
--------
aaa
(1 row)
-COMMIT PREPARED 'foo2';
+COMMIT PREPARED 'regress_foo2';
SELECT * FROM pxtest1;
foobar
--------
@@ -79,18 +79,18 @@ SELECT * FROM pxtest1;
eee
(2 rows)
-PREPARE TRANSACTION 'foo3';
-SELECT gid FROM pg_prepared_xacts;
- gid
-------
- foo3
+PREPARE TRANSACTION 'regress_foo3';
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
+ gid
+--------------
+ regress_foo3
(1 row)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO pxtest1 VALUES ('fff');
-- This should fail, because the gid foo3 is already in use
-PREPARE TRANSACTION 'foo3';
-ERROR: transaction identifier "foo3" is already in use
+PREPARE TRANSACTION 'regress_foo3';
+ERROR: transaction identifier "regress_foo3" is already in use
SELECT * FROM pxtest1;
foobar
--------
@@ -98,7 +98,7 @@ SELECT * FROM pxtest1;
ddd
(2 rows)
-ROLLBACK PREPARED 'foo3';
+ROLLBACK PREPARED 'regress_foo3';
SELECT * FROM pxtest1;
foobar
--------
@@ -116,11 +116,11 @@ SELECT * FROM pxtest1;
eee
(2 rows)
-PREPARE TRANSACTION 'foo4';
-SELECT gid FROM pg_prepared_xacts;
- gid
-------
- foo4
+PREPARE TRANSACTION 'regress_foo4';
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
+ gid
+--------------
+ regress_foo4
(1 row)
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
@@ -136,15 +136,15 @@ INSERT INTO pxtest1 VALUES ('fff');
ERROR: could not serialize access due to read/write dependencies among transactions
DETAIL: Reason code: Canceled on identification as a pivot, during write.
HINT: The transaction might succeed if retried.
-PREPARE TRANSACTION 'foo5';
-SELECT gid FROM pg_prepared_xacts;
- gid
-------
- foo4
+PREPARE TRANSACTION 'regress_foo5';
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
+ gid
+--------------
+ regress_foo4
(1 row)
-ROLLBACK PREPARED 'foo4';
-SELECT gid FROM pg_prepared_xacts;
+ROLLBACK PREPARED 'regress_foo4';
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -165,7 +165,7 @@ SELECT pg_advisory_xact_lock_shared(1);
(1 row)
-PREPARE TRANSACTION 'foo6'; -- fails
+PREPARE TRANSACTION 'regress_foo6'; -- fails
ERROR: cannot PREPARE while holding both session-level and transaction-level locks on the same object
-- Test subtransactions
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
@@ -176,7 +176,7 @@ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ROLLBACK TO a;
SAVEPOINT b;
INSERT INTO pxtest2 VALUES (3);
-PREPARE TRANSACTION 'regress-one';
+PREPARE TRANSACTION 'regress_sub1';
CREATE TABLE pxtest3(fff int);
-- Test shared invalidation
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
@@ -192,7 +192,7 @@ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
1
(1 row)
-PREPARE TRANSACTION 'regress-two';
+PREPARE TRANSACTION 'regress_sub2';
-- No such cursor
FETCH 1 FROM foo;
ERROR: cursor "foo" does not exist
@@ -202,11 +202,11 @@ ERROR: relation "pxtest2" does not exist
LINE 1: SELECT * FROM pxtest2;
^
-- There should be two prepared transactions
-SELECT gid FROM pg_prepared_xacts;
- gid
--------------
- regress-one
- regress-two
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
+ gid
+--------------
+ regress_sub1
+ regress_sub2
(2 rows)
-- pxtest3 should be locked because of the pending DROP
@@ -217,11 +217,11 @@ rollback;
-- Disconnect, we will continue testing in a different backend
\c -
-- There should still be two prepared transactions
-SELECT gid FROM pg_prepared_xacts;
- gid
--------------
- regress-one
- regress-two
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
+ gid
+--------------
+ regress_sub1
+ regress_sub2
(2 rows)
-- pxtest3 should still be locked because of the pending DROP
@@ -230,7 +230,7 @@ lock table pxtest3 in access share mode nowait;
ERROR: could not obtain lock on relation "pxtest3"
rollback;
-- Commit table creation
-COMMIT PREPARED 'regress-one';
+COMMIT PREPARED 'regress_sub1';
\d pxtest2
Table "public.pxtest2"
Column | Type | Collation | Nullable | Default
@@ -245,20 +245,20 @@ SELECT * FROM pxtest2;
(2 rows)
-- There should be one prepared transaction
-SELECT gid FROM pg_prepared_xacts;
- gid
--------------
- regress-two
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
+ gid
+--------------
+ regress_sub2
(1 row)
-- Commit table drop
-COMMIT PREPARED 'regress-two';
+COMMIT PREPARED 'regress_sub2';
SELECT * FROM pxtest3;
ERROR: relation "pxtest3" does not exist
LINE 1: SELECT * FROM pxtest3;
^
-- There should be no prepared transactions
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
diff --git a/src/test/regress/expected/prepared_xacts_1.out b/src/test/regress/expected/prepared_xacts_1.out
index 2cd50ad947..7168f86bf9 100644
--- a/src/test/regress/expected/prepared_xacts_1.out
+++ b/src/test/regress/expected/prepared_xacts_1.out
@@ -17,7 +17,7 @@ SELECT * FROM pxtest1;
bbb
(1 row)
-PREPARE TRANSACTION 'foo1';
+PREPARE TRANSACTION 'regress_foo1';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
SELECT * FROM pxtest1;
@@ -27,21 +27,21 @@ SELECT * FROM pxtest1;
(1 row)
-- Test pg_prepared_xacts system view
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
-- Test ROLLBACK PREPARED
-ROLLBACK PREPARED 'foo1';
-ERROR: prepared transaction with identifier "foo1" does not exist
+ROLLBACK PREPARED 'regress_foo1';
+ERROR: prepared transaction with identifier "regress_foo1" does not exist
SELECT * FROM pxtest1;
foobar
--------
aaa
(1 row)
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -56,7 +56,7 @@ SELECT * FROM pxtest1;
ddd
(2 rows)
-PREPARE TRANSACTION 'foo2';
+PREPARE TRANSACTION 'regress_foo2';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
SELECT * FROM pxtest1;
@@ -65,8 +65,8 @@ SELECT * FROM pxtest1;
aaa
(1 row)
-COMMIT PREPARED 'foo2';
-ERROR: prepared transaction with identifier "foo2" does not exist
+COMMIT PREPARED 'regress_foo2';
+ERROR: prepared transaction with identifier "regress_foo2" does not exist
SELECT * FROM pxtest1;
foobar
--------
@@ -82,10 +82,10 @@ SELECT * FROM pxtest1;
aaa
(1 row)
-PREPARE TRANSACTION 'foo3';
+PREPARE TRANSACTION 'regress_foo3';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -93,7 +93,7 @@ SELECT gid FROM pg_prepared_xacts;
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO pxtest1 VALUES ('fff');
-- This should fail, because the gid foo3 is already in use
-PREPARE TRANSACTION 'foo3';
+PREPARE TRANSACTION 'regress_foo3';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
SELECT * FROM pxtest1;
@@ -102,8 +102,8 @@ SELECT * FROM pxtest1;
aaa
(1 row)
-ROLLBACK PREPARED 'foo3';
-ERROR: prepared transaction with identifier "foo3" does not exist
+ROLLBACK PREPARED 'regress_foo3';
+ERROR: prepared transaction with identifier "regress_foo3" does not exist
SELECT * FROM pxtest1;
foobar
--------
@@ -119,10 +119,10 @@ SELECT * FROM pxtest1;
aaa
(1 row)
-PREPARE TRANSACTION 'foo4';
+PREPARE TRANSACTION 'regress_foo4';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -136,17 +136,17 @@ SELECT * FROM pxtest1;
-- This should fail, because the two transactions have a write-skew anomaly
INSERT INTO pxtest1 VALUES ('fff');
-PREPARE TRANSACTION 'foo5';
+PREPARE TRANSACTION 'regress_foo5';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
-ROLLBACK PREPARED 'foo4';
-ERROR: prepared transaction with identifier "foo4" does not exist
-SELECT gid FROM pg_prepared_xacts;
+ROLLBACK PREPARED 'regress_foo4';
+ERROR: prepared transaction with identifier "regress_foo4" does not exist
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -167,7 +167,7 @@ SELECT pg_advisory_xact_lock_shared(1);
(1 row)
-PREPARE TRANSACTION 'foo6'; -- fails
+PREPARE TRANSACTION 'regress_foo6'; -- fails
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
-- Test subtransactions
@@ -179,7 +179,7 @@ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ROLLBACK TO a;
SAVEPOINT b;
INSERT INTO pxtest2 VALUES (3);
-PREPARE TRANSACTION 'regress-one';
+PREPARE TRANSACTION 'regress_sub1';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
CREATE TABLE pxtest3(fff int);
@@ -197,7 +197,7 @@ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
1
(1 row)
-PREPARE TRANSACTION 'regress-two';
+PREPARE TRANSACTION 'regress_sub2';
ERROR: prepared transactions are disabled
HINT: Set max_prepared_transactions to a nonzero value.
-- No such cursor
@@ -209,7 +209,7 @@ ERROR: relation "pxtest2" does not exist
LINE 1: SELECT * FROM pxtest2;
^
-- There should be two prepared transactions
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -221,7 +221,7 @@ rollback;
-- Disconnect, we will continue testing in a different backend
\c -
-- There should still be two prepared transactions
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
@@ -231,29 +231,29 @@ begin;
lock table pxtest3 in access share mode nowait;
rollback;
-- Commit table creation
-COMMIT PREPARED 'regress-one';
-ERROR: prepared transaction with identifier "regress-one" does not exist
+COMMIT PREPARED 'regress_sub1';
+ERROR: prepared transaction with identifier "regress_sub1" does not exist
\d pxtest2
SELECT * FROM pxtest2;
ERROR: relation "pxtest2" does not exist
LINE 1: SELECT * FROM pxtest2;
^
-- There should be one prepared transaction
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
-- Commit table drop
-COMMIT PREPARED 'regress-two';
-ERROR: prepared transaction with identifier "regress-two" does not exist
+COMMIT PREPARED 'regress_sub2';
+ERROR: prepared transaction with identifier "regress_sub2" does not exist
SELECT * FROM pxtest3;
fff
-----
(0 rows)
-- There should be no prepared transactions
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
gid
-----
(0 rows)
diff --git a/src/test/regress/sql/prepared_xacts.sql b/src/test/regress/sql/prepared_xacts.sql
index 2f0bb55bb4..ade3a2672a 100644
--- a/src/test/regress/sql/prepared_xacts.sql
+++ b/src/test/regress/sql/prepared_xacts.sql
@@ -17,30 +17,30 @@ INSERT INTO pxtest1 VALUES ('aaa');
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE pxtest1 SET foobar = 'bbb' WHERE foobar = 'aaa';
SELECT * FROM pxtest1;
-PREPARE TRANSACTION 'foo1';
+PREPARE TRANSACTION 'regress_foo1';
SELECT * FROM pxtest1;
-- Test pg_prepared_xacts system view
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-- Test ROLLBACK PREPARED
-ROLLBACK PREPARED 'foo1';
+ROLLBACK PREPARED 'regress_foo1';
SELECT * FROM pxtest1;
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-- Test COMMIT PREPARED
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO pxtest1 VALUES ('ddd');
SELECT * FROM pxtest1;
-PREPARE TRANSACTION 'foo2';
+PREPARE TRANSACTION 'regress_foo2';
SELECT * FROM pxtest1;
-COMMIT PREPARED 'foo2';
+COMMIT PREPARED 'regress_foo2';
SELECT * FROM pxtest1;
@@ -48,19 +48,19 @@ SELECT * FROM pxtest1;
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE pxtest1 SET foobar = 'eee' WHERE foobar = 'ddd';
SELECT * FROM pxtest1;
-PREPARE TRANSACTION 'foo3';
+PREPARE TRANSACTION 'regress_foo3';
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
INSERT INTO pxtest1 VALUES ('fff');
-- This should fail, because the gid foo3 is already in use
-PREPARE TRANSACTION 'foo3';
+PREPARE TRANSACTION 'regress_foo3';
SELECT * FROM pxtest1;
-ROLLBACK PREPARED 'foo3';
+ROLLBACK PREPARED 'regress_foo3';
SELECT * FROM pxtest1;
@@ -68,22 +68,22 @@ SELECT * FROM pxtest1;
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE pxtest1 SET foobar = 'eee' WHERE foobar = 'ddd';
SELECT * FROM pxtest1;
-PREPARE TRANSACTION 'foo4';
+PREPARE TRANSACTION 'regress_foo4';
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM pxtest1;
-- This should fail, because the two transactions have a write-skew anomaly
INSERT INTO pxtest1 VALUES ('fff');
-PREPARE TRANSACTION 'foo5';
+PREPARE TRANSACTION 'regress_foo5';
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-ROLLBACK PREPARED 'foo4';
+ROLLBACK PREPARED 'regress_foo4';
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-- Clean up
DROP TABLE pxtest1;
@@ -92,7 +92,7 @@ DROP TABLE pxtest1;
BEGIN;
SELECT pg_advisory_lock(1);
SELECT pg_advisory_xact_lock_shared(1);
-PREPARE TRANSACTION 'foo6'; -- fails
+PREPARE TRANSACTION 'regress_foo6'; -- fails
-- Test subtransactions
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
@@ -103,7 +103,7 @@ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
ROLLBACK TO a;
SAVEPOINT b;
INSERT INTO pxtest2 VALUES (3);
-PREPARE TRANSACTION 'regress-one';
+PREPARE TRANSACTION 'regress_sub1';
CREATE TABLE pxtest3(fff int);
@@ -116,7 +116,7 @@ BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
DECLARE foo CURSOR FOR SELECT * FROM pxtest4;
-- Fetch 1 tuple, keeping the cursor open
FETCH 1 FROM foo;
-PREPARE TRANSACTION 'regress-two';
+PREPARE TRANSACTION 'regress_sub2';
-- No such cursor
FETCH 1 FROM foo;
@@ -125,7 +125,7 @@ FETCH 1 FROM foo;
SELECT * FROM pxtest2;
-- There should be two prepared transactions
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-- pxtest3 should be locked because of the pending DROP
begin;
@@ -136,7 +136,7 @@ rollback;
\c -
-- There should still be two prepared transactions
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-- pxtest3 should still be locked because of the pending DROP
begin;
@@ -144,19 +144,19 @@ lock table pxtest3 in access share mode nowait;
rollback;
-- Commit table creation
-COMMIT PREPARED 'regress-one';
+COMMIT PREPARED 'regress_sub1';
\d pxtest2
SELECT * FROM pxtest2;
-- There should be one prepared transaction
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-- Commit table drop
-COMMIT PREPARED 'regress-two';
+COMMIT PREPARED 'regress_sub2';
SELECT * FROM pxtest3;
-- There should be no prepared transactions
-SELECT gid FROM pg_prepared_xacts;
+SELECT gid FROM pg_prepared_xacts WHERE gid ~ '^regress_' ORDER BY gid;
-- Clean up
DROP TABLE pxtest2;
Hello Richard,
29.04.2024 04:12, Richard Guo wrote:
Does anyone have any clue to this failure?
FWIW, after another run of this test, the failure just disappears. Does
it suggest that the test case is flaky?
I think this could be caused by the ecpg test twophase executed
simultaneously with the test prepared_xacts thanks to meson's jobs
parallelization.
Best regards,
Alexander
Michael Paquier <michael@paquier.xyz> writes:
If you grep the source tree, you'd notice that a prepared transaction
named gxid only exists in the 2PC tests of ECPG, in
src/interfaces/ecpg/test/sql/twophase.pgc. So the origin of the
failure comes from a race condition due to test parallelization,
because the scan of pg_prepared_xacts affects all databases with
installcheck, and in your case it means that the scan of
pg_prepared_xacts was running in parallel of the ECPG tests with an
installcheck.
Up to now, we've only worried about whether tests running in parallel
within a single test suite can interact. It's quite scary to think
that the meson setup has expanded the possibility of interactions
to our entire source tree. Maybe that was a bad idea and we should
fix the meson infrastructure to not do that. I fear that otherwise,
we'll get bit regularly by very-low-probability bugs of this kind.
regards, tom lane
On Mon, Apr 29, 2024 at 01:11:00AM -0400, Tom Lane wrote:
Up to now, we've only worried about whether tests running in parallel
within a single test suite can interact. It's quite scary to think
that the meson setup has expanded the possibility of interactions
to our entire source tree. Maybe that was a bad idea and we should
fix the meson infrastructure to not do that. I fear that otherwise,
we'll get bit regularly by very-low-probability bugs of this kind.
I don't disagree with your point, still I'm not sure that this can be
made entirely bullet-proof. Anyway, I think that we should still
improve this test and make it more robust for parallel operations:
installcheck fails equally on HEAD if there is a prepared transaction
on the backend where the tests run, and that seems like a bad idea to
me to rely on cluster-wide scans for what should be a "local" test.
--
Michael
Hello Tom and Michael,
29.04.2024 08:11, Tom Lane wrote:
Michael Paquier <michael@paquier.xyz> writes:
If you grep the source tree, you'd notice that a prepared transaction
named gxid only exists in the 2PC tests of ECPG, in
src/interfaces/ecpg/test/sql/twophase.pgc. So the origin of the
failure comes from a race condition due to test parallelization,
because the scan of pg_prepared_xacts affects all databases with
installcheck, and in your case it means that the scan of
pg_prepared_xacts was running in parallel of the ECPG tests with an
installcheck.Up to now, we've only worried about whether tests running in parallel
within a single test suite can interact. It's quite scary to think
that the meson setup has expanded the possibility of interactions
to our entire source tree. Maybe that was a bad idea and we should
fix the meson infrastructure to not do that. I fear that otherwise,
we'll get bit regularly by very-low-probability bugs of this kind.
Yes, I'm afraid of the same. For example, the test failure [1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2024-04-17%2016%3A33%3A23 is of that
ilk, I guess.
[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2024-04-17%2016%3A33%3A23
Best regards,
Alexander
Michael Paquier <michael@paquier.xyz> writes:
I don't disagree with your point, still I'm not sure that this can be
made entirely bullet-proof. Anyway, I think that we should still
improve this test and make it more robust for parallel operations:
installcheck fails equally on HEAD if there is a prepared transaction
on the backend where the tests run, and that seems like a bad idea to
me to rely on cluster-wide scans for what should be a "local" test.
True, it's antithetical to the point of an "installcheck" test if
unrelated actions in another database can break it. So I'm fine
with tightening up prepared_xacts's query. I just wonder how far
we want to try to carry this.
(BTW, on the same logic, should ecpg's twophase.pgc be using a
prepared-transaction name that's less generic than "gxid"?)
regards, tom lane
On Mon, Apr 29, 2024 at 01:32:40AM -0400, Tom Lane wrote:
(BTW, on the same logic, should ecpg's twophase.pgc be using a
prepared-transaction name that's less generic than "gxid"?)
I've hesitated a few seconds about that before sending my patch, but
refrained because this stuff does not care about the contents of
pg_prepared_xacts. I'd be OK to use something like an "ecpg_regress"
or something similar there.
--
Michael
On Mon, Apr 29, 2024 at 1:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Up to now, we've only worried about whether tests running in parallel
within a single test suite can interact. It's quite scary to think
that the meson setup has expanded the possibility of interactions
to our entire source tree. Maybe that was a bad idea and we should
fix the meson infrastructure to not do that. I fear that otherwise,
we'll get bit regularly by very-low-probability bugs of this kind.
I have the same concern. I suspect that the scan of pg_prepared_xacts
is not the only test that could cause problems when running in parallel
to other tests from the entire source tree.
Thanks
Richard
On Mon, Apr 29, 2024 at 2:58 PM Michael Paquier <michael@paquier.xyz> wrote:
On Mon, Apr 29, 2024 at 01:32:40AM -0400, Tom Lane wrote:
(BTW, on the same logic, should ecpg's twophase.pgc be using a
prepared-transaction name that's less generic than "gxid"?)I've hesitated a few seconds about that before sending my patch, but
refrained because this stuff does not care about the contents of
pg_prepared_xacts. I'd be OK to use something like an "ecpg_regress"
or something similar there.
I noticed that some TAP tests from recovery and subscription would
select the count from pg_prepared_xacts. I wonder if these tests would
be affected if there are any prepared transactions on the backend.
Thanks
Richard
On Mon, Apr 29, 2024 at 05:11:19PM +0800, Richard Guo wrote:
I noticed that some TAP tests from recovery and subscription would
select the count from pg_prepared_xacts. I wonder if these tests would
be affected if there are any prepared transactions on the backend.
TAP tests run in isolation of the rest with their own clusters
initialized from a copy initdb'd (rather than initdb because that's
much cheaper), so these scans are OK left alone.
--
Michael
Richard Guo <guofenglinux@gmail.com> writes:
I noticed that some TAP tests from recovery and subscription would
select the count from pg_prepared_xacts. I wonder if these tests would
be affected if there are any prepared transactions on the backend.
TAP tests shouldn't be at risk, because there is no "make
installcheck" equivalent for them. Each TAP test creates its own
database instance (or maybe several), so that instance won't have
anything else going on.
regards, tom lane
On Mon, Apr 29, 2024 at 09:45:16AM -0400, Tom Lane wrote:
TAP tests shouldn't be at risk, because there is no "make
installcheck" equivalent for them. Each TAP test creates its own
database instance (or maybe several), so that instance won't have
anything else going on.
There are a few more 2PC transactions in test_decoding (no
installcheck), temp.sql, test_extensions.sql and pg_stat_statements's
utility.sql (no installcheck) but their GIDs are not that bad.
twophase_stream.sql has a GID "test1", which is kind of generic, but
it won't run in parallel. At the end, only addressing the
prepared_xacts.sql and the ECPG bits looked enough to me, so I've
tweaked these with 7e61e4cc7cfc and called it a day.
I'd be curious about any discussion involving the structure of the
meson tests.
--
Michael
On Mon, Apr 29, 2024 at 9:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
I noticed that some TAP tests from recovery and subscription would
select the count from pg_prepared_xacts. I wonder if these tests would
be affected if there are any prepared transactions on the backend.TAP tests shouldn't be at risk, because there is no "make
installcheck" equivalent for them. Each TAP test creates its own
database instance (or maybe several), so that instance won't have
anything else going on.
Thank you for the explanation. I wasn't aware of this before.
Thanks
Richard
On Tue, Apr 30, 2024 at 6:43 AM Michael Paquier <michael@paquier.xyz> wrote:
I'd be curious about any discussion involving the structure of the
meson tests.
+1. I'm kind of worried that the expansion of parallelization could
lead to more instances of instability. Alexander mentioned one such
case at [1]/messages/by-id/cbf0156f-5aa1-91db-5802-82435dda03e6@gmail.com. I haven't looked into it though.
[1]: /messages/by-id/cbf0156f-5aa1-91db-5802-82435dda03e6@gmail.com
/messages/by-id/cbf0156f-5aa1-91db-5802-82435dda03e6@gmail.com
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
+1. I'm kind of worried that the expansion of parallelization could
lead to more instances of instability. Alexander mentioned one such
case at [1]. I haven't looked into it though.
[1] /messages/by-id/cbf0156f-5aa1-91db-5802-82435dda03e6@gmail.com
The mechanism there is pretty obvious: a plancache flush happened
at just the wrong (right?) time and caused the output to change,
as indeed the comment acknowledges:
-- currently, this fails due to cached plan for "r.f1 + 1" expression
-- (but if debug_discard_caches is on, it will succeed)
I wonder if we shouldn't just remove that test case as being
too unstable -- especially since it's not proving much anyway.
regards, tom lane
Hi all,
Attached is a patch that fixes bug when calling strncmp function, in
which case the third argument (authmethod - strchr(authmethod, ' '))
may be negative, which is not as expected..
With Regards,
Jingxian Li.
Attachments:
v1-0001-Fix-bug-when-calling-strncmp-in-check_authmethod_valid.patchapplication/octet-stream; name=v1-0001-Fix-bug-when-calling-strncmp-in-check_authmethod_valid.patchDownload
From 6270167720c70b765422053089a5faf4ad11f36f Mon Sep 17 00:00:00 2001
From: Jingxian Li <aqktjcm@qq.com>
Date: Tue, 30 Apr 2024 10:08:02 +0800
Subject: [PATCH] fix a code error when calling strncmp function, in which case
the third argument (authmethod - strchr(authmethod, ' ')) may be negative,
which is not as expected.
---
src/bin/initdb/initdb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 30e17bd1d1..17a8df0e29 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2514,7 +2514,7 @@ check_authmethod_valid(const char *authmethod, const char *const *valid_methods,
return;
/* with space = param */
if (strchr(authmethod, ' '))
- if (strncmp(authmethod, *p, (authmethod - strchr(authmethod, ' '))) == 0)
+ if (strncmp(authmethod, *p, (strchr(authmethod, ' ') - authmethod)) == 0)
return;
}
--
2.37.1.windows.1
On Tue, Apr 30, 2024 at 10:41 AM Jingxian Li <aqktjcm@qq.com> wrote:
Attached is a patch that fixes bug when calling strncmp function, in
which case the third argument (authmethod - strchr(authmethod, ' '))
may be negative, which is not as expected..
Nice catch. I think you're right from a quick glance.
Thanks
Richard
On 30 Apr 2024, at 04:41, Jingxian Li <aqktjcm@qq.com> wrote:
Attached is a patch that fixes bug when calling strncmp function, in
which case the third argument (authmethod - strchr(authmethod, ' '))
may be negative, which is not as expected..
The calculation is indeed incorrect, but the lack of complaints of it being
broken made me wonder why this exist in the first place. This dates back to
e7029b212755, just shy of 2 decades old, which added --auth with support for
strings with auth-options to ident and pam like --auth 'pam <servicename>' and
'ident sameuser'. Support for options to ident was removed in 01c1a12a5bb4 but
options to pam is still supported (although not documented), but was AFAICT
broken in commit 8a02339e9ba3 some 12 years ago with this strncmp().
- if (strncmp(authmethod, *p, (authmethod - strchr(authmethod, ' '))) == 0)
+ if (strncmp(authmethod, *p, (strchr(authmethod, ' ') - authmethod)) == 0)
This with compare "pam postgresql" with "pam" and not "pam " so the length
should be "(strchr(authmethod, ' ') - authmethod + 1)" since "pam " is a method
separate from "pam" in auth_methods_{host|local}. We don't want to allow "md5
" as that's not a method in the array of valid methods.
But, since it's been broken in all supported versions of postgres and has
AFAICT never been documented to exist, should we fix it or just remove it? We
don't support auth-options for any other methods, like clientcert to cert for
example. If we fix it we should also document that it works IMHO.
--
Daniel Gustafsson
On 29.04.24 07:11, Tom Lane wrote:
Up to now, we've only worried about whether tests running in parallel
within a single test suite can interact. It's quite scary to think
that the meson setup has expanded the possibility of interactions
to our entire source tree. Maybe that was a bad idea and we should
fix the meson infrastructure to not do that. I fear that otherwise,
we'll get bit regularly by very-low-probability bugs of this kind.
I don't think there is anything fundamentally different in the
parallelism setups of the make-based and the meson-based tests. There
are just different implementation details that might affect the likely
orderings and groupings.
Hi Daniel,
Thank you for explaining the ins and outs of this problem.
On 2024/4/30 17:14, Daniel Gustafsson wrote:
On 30 Apr 2024, at 04:41, Jingxian Li <aqktjcm@qq.com> wrote:
Attached is a patch that fixes bug when calling strncmp function, in
which case the third argument (authmethod - strchr(authmethod, ' '))
may be negative, which is not as expected..The calculation is indeed incorrect, but the lack of complaints of it being
broken made me wonder why this exist in the first place. This dates back to
e7029b212755, just shy of 2 decades old, which added --auth with support for
strings with auth-options to ident and pam like --auth 'pam <servicename>' and
'ident sameuser'. Support for options to ident was removed in 01c1a12a5bb4 but
options to pam is still supported (although not documented), but was AFAICT
broken in commit 8a02339e9ba3 some 12 years ago with this strncmp().- if (strncmp(authmethod, *p, (authmethod - strchr(authmethod, ' '))) == 0) + if (strncmp(authmethod, *p, (strchr(authmethod, ' ') - authmethod)) == 0)This with compare "pam postgresql" with "pam" and not "pam " so the length
should be "(strchr(authmethod, ' ') - authmethod + 1)" since "pam " is a method
separate from "pam" in auth_methods_{host|local}. We don't want to allow "md5
" as that's not a method in the array of valid methods.But, since it's been broken in all supported versions of postgres and has
AFAICT never been documented to exist, should we fix it or just remove it? We
don't support auth-options for any other methods, like clientcert to cert for
example. If we fix it we should also document that it works IMHO.
You mentioned that auth-options are not supported for auth methods except pam,
but I found that some methods (such as ldap and radius etc.) also requires aut-options,
and there are no corresponding auth methods ending with space (such as "ldap " and
radius ") present in auth_methods_host and auth_methods_local arrays.
--
Jingxian Li
On 7 May 2024, at 06:46, Jingxian Li <aqktjcm@qq.com> wrote:
But, since it's been broken in all supported versions of postgres and has
AFAICT never been documented to exist, should we fix it or just remove it? We
don't support auth-options for any other methods, like clientcert to cert for
example. If we fix it we should also document that it works IMHO.You mentioned that auth-options are not supported for auth methods except pam,
but I found that some methods (such as ldap and radius etc.) also requires aut-options,
and there are no corresponding auth methods ending with space (such as "ldap " and
radius ") present in auth_methods_host and auth_methods_local arrays.
Correct, only pam and ident were ever supported (yet not documented) and ident
was removed a long time ago.
Searching the archives I was unable to find any complaints, and this has been
broken for the entire window of supported releases, so I propose we remove it
as per the attached patch. If anyone is keen on making this work again for all
the types where it makes sense, it can be resurrected (probably with a better
implementation).
Any objections to fixing this in 17 by removing it? (cc:ing Michael from the RMT)
--
Daniel Gustafsson
Attachments:
v2-0001-Remove-auth-options-support-from-initdb.patchapplication/octet-stream; name=v2-0001-Remove-auth-options-support-from-initdb.patch; x-unix-mode=0644Download
From 17cc0811b7eebd93f8e063a5b025180c95badf56 Mon Sep 17 00:00:00 2001
From: Daniel Gustafsson <dgustafsson@postgresql.org>
Date: Mon, 13 May 2024 09:55:48 +0200
Subject: [PATCH v2] Remove auth-options support from initdb
When --auth was added to initdb in commit e7029b212755 it had support
for auth options separated by space from the auth type, like:
--auth pam <servicename>
--auth ident sameuser
Passing an option to the ident auth type was removed in 01c1a12a5bb4
which left the pam auth-options support in place. 8a02339e9ba3 broke
this by inverting a calculation in the strncmp arguments, which went
unnoticed for a long time. The ability to pass options to the auth
type was never documented.
Rather than fixing the support for an undocumented feature which has
been broken for all supported versions, and which only supports one
out of many auth types which can take options, it is removed.
Reported-by: Jingxian Li <aqktjcm@qq.com>
Discussion: https://postgr.es/m/tencent_29731C7C7E6A2F9FB807C3A1DC3D81293C06@qq.com
---
src/bin/initdb/initdb.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 30e17bd1d1..5e89b3c8e8 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -102,7 +102,7 @@ static const char *const auth_methods_host[] = {
"sspi",
#endif
#ifdef USE_PAM
- "pam", "pam ",
+ "pam",
#endif
#ifdef USE_BSD_AUTH
"bsd",
@@ -118,7 +118,7 @@ static const char *const auth_methods_host[] = {
static const char *const auth_methods_local[] = {
"trust", "reject", "scram-sha-256", "md5", "password", "peer", "radius",
#ifdef USE_PAM
- "pam", "pam ",
+ "pam",
#endif
#ifdef USE_BSD_AUTH
"bsd",
@@ -2512,10 +2512,6 @@ check_authmethod_valid(const char *authmethod, const char *const *valid_methods,
{
if (strcmp(authmethod, *p) == 0)
return;
- /* with space = param */
- if (strchr(authmethod, ' '))
- if (strncmp(authmethod, *p, (authmethod - strchr(authmethod, ' '))) == 0)
- return;
}
pg_fatal("invalid authentication method \"%s\" for \"%s\" connections",
--
2.39.3 (Apple Git-146)
Hi,
Searching the archives I was unable to find any complaints, and this has been
broken for the entire window of supported releases, so I propose we remove it
as per the attached patch. If anyone is keen on making this work again for all
the types where it makes sense, it can be resurrected (probably with a better
implementation).Any objections to fixing this in 17 by removing it? (cc:ing Michael from the RMT)
+1 Something that is not documented or used by anyone (apparently) and
is broken should just be removed.
--
Best regards,
Aleksander Alekseev
On Mon, May 13, 2024 at 01:01:21PM +0300, Aleksander Alekseev wrote:
Any objections to fixing this in 17 by removing it? (cc:ing Michael from the RMT)
+1 Something that is not documented or used by anyone (apparently) and
is broken should just be removed.
8a02339e9ba3 sounds like an argument good enough to prove there is no
demand in the field for being able to support options through initdb
--auth, and this does not concern only pam. If somebody is interested
in that, that could always be done later. My take is that this would
be simpler if implemented through a separate option, leaving the
checks between the options and the auth method up to the postmaster
when loading pg_hba.conf at startup.
Hence, no objections to clean up that now. Thanks for asking.
--
Michael