ALTER DEFAULT PRIVILEGES is buggy, and so is its testing

Started by Tom Laneover 7 years ago2 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Buildfarm members crake and xenodermus recently fell over with
very similar symptoms in the pg_upgrade test:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2018-11-07%2021%3A47%3A29

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 6350; 0 0 ACL SCHEMA "regress_rls_schema" andrew
pg_restore: [archiver (db)] could not execute query: ERROR: role "33757" does not exist
Command was: GRANT ALL ON SCHEMA "regress_rls_schema" TO "33757";

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=xenodermus&dt=2018-11-08%2023%3A00%3A01

pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 5357; 0 0 ACL SCHEMA "mvtest_mvschema" bf
pg_restore: [archiver (db)] could not execute query: ERROR: role "33894" does not exist
Command was: GRANT ALL ON SCHEMA "mvtest_mvschema" TO "33894";

The bogus numeric user "names" are presumably there because aclitemout
will just fall back to printing a referenced role's OID if it fails to
look up the role's name; hence, what we're looking at here is grants
made to since-dropped roles, which somehow didn't get cleaned up.

Now, looking at the regression test scripts that create these particular
schemas, that's rather mind-boggling: they never do any grant or revoke
at all on those schemas, so how'd there come to be any ACL entries?

I think that the answer is that these particular scripts run in parallel
with the "privileges" script, in which we find this:

ALTER DEFAULT PRIVILEGES GRANT ALL ON SCHEMAS TO regress_priv_user2;

What seems to be happening is

1. Given the right concurrent timing, these schemas absorb a GRANT
to regress_priv_user2 during the window in which the above is active.

2. The GRANT apparently doesn't get entered into pg_shdepend, because
when privileges.sql drops the role, there is no complaint. (It's easy
to reproduce this bug without any parallelism.)

3. The regression tests proper don't notice the dangling ACL, but
dump/restore sure does.

The reason we are just seeing this now, no doubt, is Andres having
changed the pg_upgrade test to run the core regression tests in
parallel. I'm sure it's happened a whole lot before without anyone
noticing.

So it seems like testing ALTER DEFAULT PRIVILEGES in a script that
runs in parallel with anything else is a damfool idea. But there
is also a live bug of failure to correctly record privileges granted
this way. Without that bug, we'd have had our noses rubbed in the
parallel race conditions long ago.

regards, tom lane

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: ALTER DEFAULT PRIVILEGES is buggy, and so is its testing

I wrote:

So it seems like testing ALTER DEFAULT PRIVILEGES in a script that
runs in parallel with anything else is a damfool idea. But there
is also a live bug of failure to correctly record privileges granted
this way. Without that bug, we'd have had our noses rubbed in the
parallel race conditions long ago.

It turns out the bug only applies to schemas and types; the other
callers of get_user_default_acl() already had ad-hoc code to deal
with the problem. I made that a bit less ad-hoc by creating a
subroutine to do the work and documenting the need to call it.

To fix the tests' race condition, it seems to be sufficient to
wrap a transaction around the section of privileges.sql that
grants and then revokes global default privileges. The other
uses of ALTER DEFAULT PRIVILEGES in the tests are targeted narrowly
enough to ensure that concurrent tests shouldn't pick them up.

In the attached proposed patch for HEAD, since I had to adjust
the API of GenerateTypeDependencies anyway, I changed it to pass
the new pg_type row and get most of the old arguments out of that,
as the argument list was getting pretty unwieldy. I suppose we
could make a back-branch patch that doesn't change that API and
instead makes the callers responsible for recording ACL dependencies,
but ugh. Does anyone think it's likely that third-party code is
calling GenerateTypeDependencies?

regards, tom lane

Attachments:

fix-default-privileges-dependencies-1.patchtext/x-diff; charset=us-ascii; name=fix-default-privileges-dependencies-1.patchDownload+210-182