Invalid dump file after drop of role that previously created extension containing a table.

Started by Aleš Zelenýover 2 years ago4 messagesbugs
Jump to latest
#1Aleš Zelený
zeleny.ales@gmail.com

Hello,

The bug is reproducible on PG16, but we have detected it during
pg_upgrade 14 -> 15.
Testcase is provided (Pg 14, 15, 16, Ubuntu 22.04).

The upgrade failure and the test case description will follow:

Check run was OK:
=================
-bash-4.2$ /usr/pgsql-15/bin/pg_upgrade -d /pgsql/cluster/14/data/ -D
/pgsql/cluster/15/data/ -b /usr/pgsql-14/bin -B /usr/pgsql-15/bin
--link --check
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok

*Clusters are compatible*

The upgrade itself fails (--retain was used to allow the next analysis steps):
==============================================================================
-bash-4.2$ /usr/pgsql-15/bin/pg_upgrade -d /pgsql/cluster/14/data/ -D
/pgsql/cluster/15/data/ -b /usr/pgsql-14/bin -B /usr/pgsql-15/bin
--link --retain
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_xact to new server ok
Setting oldest XID for new cluster ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
betsys
*failure*

Consult the last few lines of
"/pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/log/pg_upgrade_dump_16451.log"
for
the probable cause of the failure.
Failure, exiting

The error message in the log:
=============================
pg_restore: creating ACL "cron.SEQUENCE "jobid_seq""
pg_restore: creating ACL "cron.TABLE "job""
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 6678; 0 0 ACL TABLE "job" sazky
pg_restore: error: could not execute query: ERROR: role "16441" does not exist
Command was: SELECT pg_catalog.binary_upgrade_set_record_init_privs(true);

Get schema-only SQL script from the dump file whose import has failed
during pg_upgrade:
========================================================================================
/usr/pgsql-15/bin/pg_restore -s -f
/pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/dump/pg_upgrade_dump_16451.schema_only.sql
/pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/dump/pg_upgrade_dump_16451.custom

Check for the "16441" string in the SQL file produced from the dump
created by pg_upgrade:
==========================================================================================
-bash-4.2$ grep "16441"
/pgsql/cluster/15/data/pg_upgrade_output.d/20230921T191028.441/dump/pg_upgrade_dump_16451.schema_only.sql
GRANT ALL ON TABLE "cron"."job" TO "16441";
REVOKE ALL ON TABLE "cron"."job" FROM "16441";

Observation:
============
The pg_dump beyond other tables reads data from the
pg_catalog.pg_init_privs table to construct the above-listed commands
to preserve ACLs from the database being upgraded (actually, these are
part of ordinary pg_dump anyway).

In our case, I've realized that 16441 is an OID value for a previously
dropped login role (database user).

Testcase description:
====================
1) An extension (I've used pg_cron as an example because it contains a
table) is created by a database user (login role), and the initial
privileges at extension creation are stored for the extension object
(table in my test case) in the pg_catalog.pg_init_privs table.
2) Change the database user objects ownership from step 1 to another
database user -> this step keeps the pg_catalog.pg_init_privs table
content for the extension table from step 1 untouched.
3) Drop the database user used in step 1 and as its entry is deleted
from the catalog, all that remains is the OID of the deleted database
user in the pg_catalog.pg_init_privs table, later used by pg_dump.

I was able to reproduce the issue on several PostgreSQL versions:
=================================================================

PostgreSQL 16.0 (Ubuntu 16.0-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
PostgreSQL 15.4 (Ubuntu 15.4-2.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
PostgreSQL 14.9 (Ubuntu 14.9-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

Result from the test case:
==========================
--------------------------------------------------------------------------------
Print DUMP differences before and after drop of role user1
--------------------------------------------------------------------------------
80c80
< REVOKE ALL ON FUNCTION cron.alter_job(job_id bigint, schedule text,
command text, database text, username text, active boolean) FROM
user1;
---

REVOKE ALL ON FUNCTION cron.alter_job(job_id bigint, schedule text, command text, database text, username text, active boolean) FROM "16384";

88c88
< REVOKE ALL ON FUNCTION cron.schedule_in_database(job_name text,
schedule text, command text, database text, username text, active
boolean) FROM user1;
---

REVOKE ALL ON FUNCTION cron.schedule_in_database(job_name text, schedule text, command text, database text, username text, active boolean) FROM "16384";

96c96
< REVOKE ALL ON SEQUENCE cron.jobid_seq FROM user1;
---

REVOKE ALL ON SEQUENCE cron.jobid_seq FROM "16384";

106c106
< REVOKE ALL ON TABLE cron.job FROM user1;
---

REVOKE ALL ON TABLE cron.job FROM "16384";

116c116
< REVOKE ALL ON SEQUENCE cron.runid_seq FROM user1;
---

REVOKE ALL ON SEQUENCE cron.runid_seq FROM "16384";

126c126
< REVOKE ALL ON TABLE cron.job_run_details FROM user1;
---

REVOKE ALL ON TABLE cron.job_run_details FROM "16384";

The test case script and log files are attached.

Kind regards Ales Zeleny

Attachments:

testcase_files.tar.gzapplication/gzip; name=testcase_files.tar.gzDownload
#2Stephen Frost
sfrost@snowman.net
In reply to: Aleš Zelený (#1)
Re: Invalid dump file after drop of role that previously created extension containing a table.

Greetings,

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

Testcase description:
====================
1) An extension (I've used pg_cron as an example because it contains a
table) is created by a database user (login role), and the initial
privileges at extension creation are stored for the extension object
(table in my test case) in the pg_catalog.pg_init_privs table.
2) Change the database user objects ownership from step 1 to another
database user -> this step keeps the pg_catalog.pg_init_privs table
content for the extension table from step 1 untouched.
3) Drop the database user used in step 1 and as its entry is deleted
from the catalog, all that remains is the OID of the deleted database
user in the pg_catalog.pg_init_privs table, later used by pg_dump.

Hrmpf. Yeah, seems like if we're going to allow extensions and
extension objects to be impacted by REASSIGN OWNED and such then we need
to be sure to update pg_init_privs accordingly. At least that's my
first thought seeing this. Would welcome thoughts from others on this
though.

Thanks,

Stephen

#3Aleš Zelený
zeleny.ales@gmail.com
In reply to: Stephen Frost (#2)
Re: Invalid dump file after drop of role that previously created extension containing a table.

Hello,

my personal understanding of the pg_init_privs usage is to achieve some
level of idempotency (some level -> it perfectly works at least if the role
that created the extension is the same one that is importing the dump -
other combinations are producing more or less different results).

So the pg_dump attempts to combine the past time state from pg_init_privs
with the present state at the dump creation time.
Revoking all privileges from extension tables (and other objects) might
solve it partially if all privileges for the extension objects are dumped
and therefore applied on restore. Sadly, except (at least the following
one) an issue - the dump file does not contain the extension version so if
a newer extension version is created during restoration from a dump file
there might be additional privileges defined by the new extension version
and thus it is a bad idea to "revoke all" since they will be lost.

While it might take some time to get a sustainable solution, a non-ideal
workaround I could imagine could be that the dump will check whether the
role(s) stored in the pg_init_privs still exists at the time of dump
creation and in case they are no longer the revoke statement will not be
stored in the dump.

Kind regards Ales Zeleny

čt 21. 9. 2023 v 21:30 odesílatel Stephen Frost <sfrost@snowman.net> napsal:

Show quoted text

Greetings,

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

Testcase description:
====================
1) An extension (I've used pg_cron as an example because it contains a
table) is created by a database user (login role), and the initial
privileges at extension creation are stored for the extension object
(table in my test case) in the pg_catalog.pg_init_privs table.
2) Change the database user objects ownership from step 1 to another
database user -> this step keeps the pg_catalog.pg_init_privs table
content for the extension table from step 1 untouched.
3) Drop the database user used in step 1 and as its entry is deleted
from the catalog, all that remains is the OID of the deleted database
user in the pg_catalog.pg_init_privs table, later used by pg_dump.

Hrmpf. Yeah, seems like if we're going to allow extensions and
extension objects to be impacted by REASSIGN OWNED and such then we need
to be sure to update pg_init_privs accordingly. At least that's my
first thought seeing this. Would welcome thoughts from others on this
though.

Thanks,

Stephen

#4Aleš Zelený
zeleny.ales@gmail.com
In reply to: Aleš Zelený (#3)
Re: Invalid dump file after drop of role that previously created extension containing a table.

Hello,

during other upgrades, I've faced the same issue (with different
extensions) on 3 out of 9 databases I've been upgrading (of course, it
happened in the production environment).

So far, I've decided to add a check query to the database upgrade workflow:

select initprivs::text::aclitem[] from pg_init_privs;

It fails in case a dropped role OID is persisted in the
pg_catalog.pg_init_privs on databases being upgraded, would it make sense
to extend pg_upgrade check mode by a query like this (probably better
one...)?

Kind regards Ales Zeleny

pá 22. 9. 2023 v 14:56 odesílatel Aleš Zelený <zeleny.ales@gmail.com>
napsal:

Show quoted text

Hello,

my personal understanding of the pg_init_privs usage is to achieve some
level of idempotency (some level -> it perfectly works at least if the role
that created the extension is the same one that is importing the dump -
other combinations are producing more or less different results).

So the pg_dump attempts to combine the past time state from pg_init_privs
with the present state at the dump creation time.
Revoking all privileges from extension tables (and other objects) might
solve it partially if all privileges for the extension objects are dumped
and therefore applied on restore. Sadly, except (at least the following
one) an issue - the dump file does not contain the extension version so if
a newer extension version is created during restoration from a dump file
there might be additional privileges defined by the new extension version
and thus it is a bad idea to "revoke all" since they will be lost.

While it might take some time to get a sustainable solution, a non-ideal
workaround I could imagine could be that the dump will check whether the
role(s) stored in the pg_init_privs still exists at the time of dump
creation and in case they are no longer the revoke statement will not be
stored in the dump.

Kind regards Ales Zeleny

čt 21. 9. 2023 v 21:30 odesílatel Stephen Frost <sfrost@snowman.net>
napsal:

Greetings,

* Aleš Zelený (zeleny.ales@gmail.com) wrote:

Testcase description:
====================
1) An extension (I've used pg_cron as an example because it contains a
table) is created by a database user (login role), and the initial
privileges at extension creation are stored for the extension object
(table in my test case) in the pg_catalog.pg_init_privs table.
2) Change the database user objects ownership from step 1 to another
database user -> this step keeps the pg_catalog.pg_init_privs table
content for the extension table from step 1 untouched.
3) Drop the database user used in step 1 and as its entry is deleted
from the catalog, all that remains is the OID of the deleted database
user in the pg_catalog.pg_init_privs table, later used by pg_dump.

Hrmpf. Yeah, seems like if we're going to allow extensions and
extension objects to be impacted by REASSIGN OWNED and such then we need
to be sure to update pg_init_privs accordingly. At least that's my
first thought seeing this. Would welcome thoughts from others on this
though.

Thanks,

Stephen