installcheck-world concurrency issues
Hi,
while working on installcheck support with meson, that currently running
installcheck-world fails regularly with meson and occasionally with make.
A way to quite reliably reproduce this with make is
make -s -j48 -C contrib/ USE_MODULE_DB=1 installcheck-adminpack-recurse installcheck-passwordcheck-recurse
that will fail with diffs like:
diff -du10 /home/andres/src/postgresql/contrib/passwordcheck/expected/passwordcheck.out /home/andres/build/postgres/dev-assert/vpath/contrib/passwordcheck/res>
--- /home/andres/src/postgresql/contrib/passwordcheck/expected/passwordcheck.out 2022-10-03 15:56:57.900326662 -0700
+++ /home/andres/build/postgres/dev-assert/vpath/contrib/passwordcheck/results/passwordcheck.out 2022-10-03 15:56:59.930329973 -0700
@@ -1,19 +1,22 @@
LOAD 'passwordcheck';
CREATE USER regress_user1;
-- ok
ALTER USER regress_user1 PASSWORD 'a_nice_long_password';
+ERROR: tuple concurrently deleted
-- error: too short
ALTER USER regress_user1 PASSWORD 'tooshrt';
-ERROR: password is too short
+ERROR: role "regress_user1" does not exist
-- error: contains user name
ALTER USER regress_user1 PASSWORD 'xyzregress_user1';
-ERROR: password must not contain user name
+ERROR: role "regress_user1" does not exist
-- error: contains only letters
LOAD 'passwordcheck';
CREATE USER regress_user1;
-- ok
ALTER USER regress_user1 PASSWORD 'a_nice_long_password';
+ERROR: tuple concurrently deleted
-- error: too short
ALTER USER regress_user1 PASSWORD 'tooshrt';
-ERROR: password is too short
+ERROR: role "regress_user1" does not exist
-- error: contains user name
That's not surprising, given the common name of "regress_user1".
The attached patch fixes a number of instances of this issue. With it I got
through ~5 iterations of installcheck-world on ac, and >30 iterations with
meson.
There's a few further roles that seem to pose some danger goign forward:
./contrib/file_fdw/sql/file_fdw.sql:CREATE ROLE regress_no_priv_user LOGIN; -- has priv but no user mapping
./contrib/postgres_fdw/sql/postgres_fdw.sql:CREATE ROLE regress_view_owner SUPERUSER;
./contrib/postgres_fdw/sql/postgres_fdw.sql:CREATE ROLE regress_nosuper NOSUPERUSER;
./contrib/passwordcheck/sql/passwordcheck.sql:CREATE USER regress_passwordcheck_user1;
./contrib/citext/sql/create_index_acl.sql:CREATE ROLE regress_minimal;
./src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql:CREATE ROLE regress_r1;
./src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql:CREATE ROLE regress_s1;
./src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql:CREATE ROLE regress_role_joe;
./src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql:CREATE USER regress_test_user;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol0 SUPERUSER LOGIN;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrolx SUPERUSER LOGIN;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol2 SUPERUSER;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol1 SUPERUSER LOGIN IN ROLE regress_testrol2;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_role_haspriv;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_role_nopriv;
./src/test/modules/unsafe_tests/sql/guc_privs.sql:CREATE ROLE regress_admin SUPERUSER;
./src/test/modules/test_ddl_deparse/sql/alter_function.sql:CREATE ROLE regress_alter_function_role;
BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
somehow? Seems not great to run it as part of installcheck-world, if we don't
want to run it as part of installcheck.
A second issue I noticed is that advisory_lock.sql often fails, because the
pg_locks queries don't restrict to the current database. Patch attached.
I haven't seen that with autoconf installcheck-world, presumably because of
this:
# There are too many interdependencies between the subdirectories, so
# don't attempt parallel make here.
.NOTPARALLEL:
With those two patches applied, I got through 10 iterations of running all
regress / isolation tests concurrently with meson without failures.
I attached the meson patch as well, but just because I used it to to get to
these patches.
Greetings,
Andres Freund
Attachments:
On Mon, Oct 03, 2022 at 04:41:11PM -0700, Andres Freund wrote:
There's a few further roles that seem to pose some danger goign forward:
I have never seen that myself, but 0001 is a nice cleanup.
generated.sql includes a user named "regress_user11". Perhaps that's
worth renaming while on it?
BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
somehow? Seems not great to run it as part of installcheck-world, if we don't
want to run it as part of installcheck.c
Indeed.
A second issue I noticed is that advisory_lock.sql often fails, because the
pg_locks queries don't restrict to the current database. Patch attached.
As in prepared_xacts.sql or just advisory locks taken in an installed
cluster? Or both?
I attached the meson patch as well, but just because I used it to to get to
these patches.
I am still studying a lot of this area, but it seems like all the
spots requiring a custom configuration (aka NO_INSTALLCHECK) are
covered. --setup running is working here with 0003.
--
Michael
Hi,
On 2022-10-04 17:05:40 +0900, Michael Paquier wrote:
On Mon, Oct 03, 2022 at 04:41:11PM -0700, Andres Freund wrote:
There's a few further roles that seem to pose some danger goign forward:
I have never seen that myself, but 0001 is a nice cleanup.
generated.sql includes a user named "regress_user11". Perhaps that's
worth renaming while on it?
I think regress_* without a "namespace" is what's src/test/regress uses, so
there's not really a need?
A second issue I noticed is that advisory_lock.sql often fails, because the
pg_locks queries don't restrict to the current database. Patch attached.As in prepared_xacts.sql or just advisory locks taken in an installed
cluster? Or both?
There's various isolation tests, including several in src/test/isolation, that
use advisory locks.
prepared_xacts.sql shouldn't be an issue, because it's scheduled in a separate
group.
I attached the meson patch as well, but just because I used it to to get to
these patches.I am still studying a lot of this area, but it seems like all the
spots requiring a custom configuration (aka NO_INSTALLCHECK) are
covered. --setup running is working here with 0003.
Thanks for checking.
Greetings,
Andres Freund
On 04.10.22 01:41, Andres Freund wrote:
BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
somehow? Seems not great to run it as part of installcheck-world, if we don't
want to run it as part of installcheck.
I think there are different levels and kinds of unsafeness. The ssl and
kerberos tests start open server processes on your machine. The
modules/unsafe_tests just make a mess of your postgres instance. The
latter isn't a problem when run against a temp instance.
Hi,
On 2022-10-05 08:16:37 +0200, Peter Eisentraut wrote:
On 04.10.22 01:41, Andres Freund wrote:
BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
somehow? Seems not great to run it as part of installcheck-world, if we don't
want to run it as part of installcheck.I think there are different levels and kinds of unsafeness. The ssl and
kerberos tests start open server processes on your machine. The
modules/unsafe_tests just make a mess of your postgres instance. The latter
isn't a problem when run against a temp instance.
I agree - but I suspect our definition of danger is reversed. For me breaking
an existing cluster is a lot more likely to incur "real world" danger than
starting a throway instance listening to tcp on localhost...
Greetings,
Andres Freund
On Tue, Oct 04, 2022 at 11:35:53AM -0700, Andres Freund wrote:
On 2022-10-04 17:05:40 +0900, Michael Paquier wrote:
I am still studying a lot of this area, but it seems like all the
spots requiring a custom configuration (aka NO_INSTALLCHECK) are
covered. --setup running is working here with 0003.Thanks for checking.
For the archives' sake: this has been applied as of 6a20b04 and
c3315a7.
--
Michael
On Thu, Oct 06, 2022 at 01:52:46PM +0900, Michael Paquier wrote:
For the archives' sake: this has been applied as of 6a20b04 and
c3315a7.
Corey (added in CC.), has noticed that the issue fixed by c3315a7 in
16~ for advisory locks is not complicated to reach, leading to
failures in some of our automated internal stuff. A cherry-pick of
c3315a7 works cleanly across 12~15. Would there be any objections if
I were to backpatch this part down to 12?
The problems fixed by 6a20b04 have not really been an issue here,
hence I'd rather let things be as they are for the conflicting role
names.
--
Michael
On Tue, Sep 24, 2024 at 11:24:46AM +0900, Michael Paquier wrote:
Corey (added in CC.), has noticed that the issue fixed by c3315a7 in
16~ for advisory locks is not complicated to reach, leading to
failures in some of our automated internal stuff. A cherry-pick of
c3315a7 works cleanly across 12~15. Would there be any objections if
I were to backpatch this part down to 12?The problems fixed by 6a20b04 have not really been an issue here,
hence I'd rather let things be as they are for the conflicting role
names.
Hearing nothing, done.
--
Michael