pg_upgrade allows itself to be run twice

Started by Justin Pryzbyalmost 4 years ago9 messageshackers
Jump to latest
#1Justin Pryzby
pryzby@telsasoft.com

Now that I've gotten your attention..

I expect pg_upgrade to fail if I run it twice in a row.

It would be reasonable if it were to fail during the "--check" phase,
maybe by failing like this:
| New cluster database "..." is not empty: found relation "..."

But that fails to happen if the cluster has neither tables nor matviews, in
which case, it passes the check phase and then fails like this:

Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_clog to new server ok
Setting oldest XID for new cluster ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster connection to server on socket "/home/pryzbyj/src/postgres/.s.PGSQL.50432" failed: FATAL: could not open relation with OID 2610
Failure, exiting

I'll concede that a cluster which has no tables sounds a lot like a toy, but I
find it disturbing that nothing prevents running the process twice, up to the
point that it's evidently corrupted the catalog.

While looking at this, I noticed that starting postgres --single immediately
after initdb allows creating objects with OIDs below FirstNormalObjectId
(thereby defeating pg_upgrade's check). I'm not familiar with the behavioral
differences of single user mode, and couldn't find anything in the
documentation.

#2Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#1)
Re: pg_upgrade allows itself to be run twice

On Sat, Jun 25, 2022 at 11:04:37AM -0500, Justin Pryzby wrote:

I expect pg_upgrade to fail if I run it twice in a row.

Yep.

It would be reasonable if it were to fail during the "--check" phase,
maybe by failing like this:
| New cluster database "..." is not empty: found relation "..."

So, we get a complaint that the new cluster is not empty after one
pg_upgrade run with a new command of pg_upgrade, with or without
--check. This happens in check_new_cluster(), where we'd fatal if a
relation uses a namespace different than pg_catalog.

But that fails to happen if the cluster has neither tables nor matviews, in
which case, it passes the check phase and then fails like this:

Indeed, as of get_rel_infos().

I'll concede that a cluster which has no tables sounds a lot like a toy, but I
find it disturbing that nothing prevents running the process twice, up to the
point that it's evidently corrupted the catalog.

I have heard of cases where instances were only used with a set of
foreign tables, for example. Not sure that this is spread enough to
worry about, but this would fail as much as your case.

While looking at this, I noticed that starting postgres --single immediately
after initdb allows creating objects with OIDs below FirstNormalObjectId
(thereby defeating pg_upgrade's check). I'm not familiar with the behavioral
differences of single user mode, and couldn't find anything in the
documentation.

This one comes from NextOID in the control data file after a fresh
initdb, and GetNewObjectId() would enforce that in a postmaster
environment to be FirstNormalObjectId when assigning the first user
OID. Would you imply an extra step at the end of initdb to update the
control data file of the new cluster to reflect FirstNormalObjectId?
--
Michael

#3Justin Pryzby
pryzby@telsasoft.com
In reply to: Michael Paquier (#2)
Re: pg_upgrade allows itself to be run twice

On Wed, Jun 29, 2022 at 01:17:33PM +0900, Michael Paquier wrote:

On Sat, Jun 25, 2022 at 11:04:37AM -0500, Justin Pryzby wrote:

I'll concede that a cluster which has no tables sounds a lot like a toy, but I
find it disturbing that nothing prevents running the process twice, up to the
point that it's evidently corrupted the catalog.

I have heard of cases where instances were only used with a set of
foreign tables, for example. Not sure that this is spread enough to
worry about, but this would fail as much as your case.

foreign tables have OIDs too.

ts=# SELECT * FROM pg_class WHERE relkind ='f';
oid | 1554544611

While looking at this, I noticed that starting postgres --single immediately
after initdb allows creating objects with OIDs below FirstNormalObjectId
(thereby defeating pg_upgrade's check). I'm not familiar with the behavioral
differences of single user mode, and couldn't find anything in the
documentation.

This one comes from NextOID in the control data file after a fresh
initdb, and GetNewObjectId() would enforce that in a postmaster
environment to be FirstNormalObjectId when assigning the first user
OID. Would you imply an extra step at the end of initdb to update the
control data file of the new cluster to reflect FirstNormalObjectId?

I added a call to reset xlog, similar to what's in pg_upgrade.
Unfortunately, I don't see an easy way to silence it.

--
Justin

Attachments:

0001-wip-initdb-should-advance-the-OID-counter-to-FirstNo.patchtext/x-diff; charset=us-asciiDownload+12-32
#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#3)
Re: pg_upgrade allows itself to be run twice

rebased and updated

Robert thought that it might be reasonable for someone to initdb, and
then connect and make some modifications, and then pg_upgrade.
/messages/by-id/CA+TgmoYwaXh_wRRa2CqL4XpM4r6YEbq1+ec=+8b7Dr7-T_tT+Q@mail.gmail.com

But the DBs are dropped by pg_upgrade, so that seems to serve no
purpose, except for shared relations (and global objects?). In the case
of shared relations, it seems unsafe (even though my test didn't cause
an immediate explosion).

So rather than continuing to allow arbitrary changes between initdb and
pg_upgrade, I propose to prohibit all changes. I'd consider allowing an
"inclusive" list of allowable changes, if someone were to propose such a
thing - but since DBs are dropped, I'm not sure what it could include.

Attachments:

v2-0001-wip-initdb-should-advance-the-OID-counter-to-Firs.patchtext/x-diff; charset=us-asciiDownload+16-17
#5Peter Eisentraut
peter_e@gmx.net
In reply to: Justin Pryzby (#3)
Re: pg_upgrade allows itself to be run twice

On 07.07.22 08:22, Justin Pryzby wrote:

This one comes from NextOID in the control data file after a fresh
initdb, and GetNewObjectId() would enforce that in a postmaster
environment to be FirstNormalObjectId when assigning the first user
OID. Would you imply an extra step at the end of initdb to update the
control data file of the new cluster to reflect FirstNormalObjectId?

I added a call to reset xlog, similar to what's in pg_upgrade.
Unfortunately, I don't see an easy way to silence it.

I think it would be better to update the control file directly instead
of going through pg_resetwal. (See
src/include/common/controldata_utils.h for the required functions.)

However, I don't know whether we need to add special provisions that
guard against people using postgres --single in complicated ways. Many
consider the single-user mode deprecated outside of initdb use.

#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Peter Eisentraut (#5)
Re: pg_upgrade allows itself to be run twice

On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote:

On 07.07.22 08:22, Justin Pryzby wrote:

This one comes from NextOID in the control data file after a fresh
initdb, and GetNewObjectId() would enforce that in a postmaster
environment to be FirstNormalObjectId when assigning the first user
OID. Would you imply an extra step at the end of initdb to update the
control data file of the new cluster to reflect FirstNormalObjectId?

I added a call to reset xlog, similar to what's in pg_upgrade.
Unfortunately, I don't see an easy way to silence it.

I think it would be better to update the control file directly instead of
going through pg_resetwal. (See src/include/common/controldata_utils.h for
the required functions.)

However, I don't know whether we need to add special provisions that guard
against people using postgres --single in complicated ways. Many consider
the single-user mode deprecated outside of initdb use.

Thanks for looking.

One other thing I noticed (by accident!) is that pg_upgrade doesn't
prevent itself from trying to upgrade a cluster on top of itself:

| $ /usr/pgsql-15/bin/initdb -D pg15.dat -N
| $ /usr/pgsql-15/bin/pg_upgrade -D pg15.dat -d pg15.dat -b /usr/pgsql-15/bin
| Performing Upgrade
| ------------------
| Analyzing all rows in the new cluster ok
| Freezing all rows in the new cluster ok
| Deleting files from new pg_xact ok
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Copying old pg_xact to new server
| *failure*
|
| Consult the last few lines of "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" for

| command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" 2>&1
| cp: cannot stat 'pg15.dat/pg_xact': No such file or directory

This may be of little concern since it's upgrading a version to itself, which
only applies to developers.

--
Justin

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Justin Pryzby (#6)
Re: pg_upgrade allows itself to be run twice

On 01.11.22 14:07, Justin Pryzby wrote:

On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote:

On 07.07.22 08:22, Justin Pryzby wrote:

This one comes from NextOID in the control data file after a fresh
initdb, and GetNewObjectId() would enforce that in a postmaster
environment to be FirstNormalObjectId when assigning the first user
OID. Would you imply an extra step at the end of initdb to update the
control data file of the new cluster to reflect FirstNormalObjectId?

I added a call to reset xlog, similar to what's in pg_upgrade.
Unfortunately, I don't see an easy way to silence it.

I think it would be better to update the control file directly instead of
going through pg_resetwal. (See src/include/common/controldata_utils.h for
the required functions.)

However, I don't know whether we need to add special provisions that guard
against people using postgres --single in complicated ways. Many consider
the single-user mode deprecated outside of initdb use.

Thanks for looking.

I think the above is a "returned with feedback" at this point.

One other thing I noticed (by accident!) is that pg_upgrade doesn't
prevent itself from trying to upgrade a cluster on top of itself:

| $ /usr/pgsql-15/bin/initdb -D pg15.dat -N
| $ /usr/pgsql-15/bin/pg_upgrade -D pg15.dat -d pg15.dat -b /usr/pgsql-15/bin
| Performing Upgrade
| ------------------
| Analyzing all rows in the new cluster ok
| Freezing all rows in the new cluster ok
| Deleting files from new pg_xact ok
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| Copying old pg_xact to new server
| *failure*
|
| Consult the last few lines of "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" for

| command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" 2>&1
| cp: cannot stat 'pg15.dat/pg_xact': No such file or directory

This may be of little concern since it's upgrading a version to itself, which
only applies to developers.

I think this would be worth addressing nonetheless, for robustness. For
comparison, "cp" and "mv" will error if you give source and destination
that are the same file.

#8Justin Pryzby
pryzby@telsasoft.com
In reply to: Peter Eisentraut (#7)
Re: pg_upgrade allows itself to be run twice

On Thu, Dec 01, 2022 at 10:30:16AM +0100, Peter Eisentraut wrote:

On 01.11.22 14:07, Justin Pryzby wrote:

On Tue, Nov 01, 2022 at 01:54:35PM +0100, Peter Eisentraut wrote:

On 07.07.22 08:22, Justin Pryzby wrote:

This one comes from NextOID in the control data file after a fresh
initdb, and GetNewObjectId() would enforce that in a postmaster
environment to be FirstNormalObjectId when assigning the first user
OID. Would you imply an extra step at the end of initdb to update the
control data file of the new cluster to reflect FirstNormalObjectId?

I added a call to reset xlog, similar to what's in pg_upgrade.
Unfortunately, I don't see an easy way to silence it.

I think it would be better to update the control file directly instead of
going through pg_resetwal. (See src/include/common/controldata_utils.h for
the required functions.)

However, I don't know whether we need to add special provisions that guard
against people using postgres --single in complicated ways. Many consider
the single-user mode deprecated outside of initdb use.

Thanks for looking.

To be clear, I don't think it's worth doing anything special just to
avoid creating special OIDs if someone runs postgres --single
immediately after initdb.

However, setting FirstNormalOid in initdb itself (rather than in the
next invocation of postgres, if it isn't in single-user-mode) was the
mechanism by which to avoid the original problem that pg_upgrade can be
run twice, if there are no non-system relations. That still seems
desirable to fix somehow.

I think the above is a "returned with feedback" at this point.

One other thing I noticed (by accident!) is that pg_upgrade doesn't
prevent itself from trying to upgrade a cluster on top of itself:

| command: cp -Rf "pg15.dat/pg_xact" "pg15.dat/pg_xact" >> "pg15.dat/pg_upgrade_output.d/20221101T055916.486/log/pg_upgrade_utility.log" 2>&1
| cp: cannot stat 'pg15.dat/pg_xact': No such file or directory

This may be of little concern since it's upgrading a version to itself, which
only applies to developers.

I think this would be worth addressing nonetheless, for robustness. For
comparison, "cp" and "mv" will error if you give source and destination that
are the same file.

I addressed this in 002.

--
Justin

Attachments:

0001-wip-initdb-should-advance-the-OID-counter-to-FirstNo.patchtext/x-diff; charset=us-asciiDownload+17-17
0002-pg_upgrade-prohibit-attempts-to-upgrade-a-cluster-on.patchtext/x-diff; charset=us-asciiDownload+4-1
#9Michael Paquier
michael@paquier.xyz
In reply to: Justin Pryzby (#8)
Re: pg_upgrade allows itself to be run twice

On Fri, Dec 16, 2022 at 07:38:09AM -0600, Justin Pryzby wrote:

However, setting FirstNormalOid in initdb itself (rather than in the
next invocation of postgres, if it isn't in single-user-mode) was the
mechanism by which to avoid the original problem that pg_upgrade can be
run twice, if there are no non-system relations. That still seems
desirable to fix somehow.

+       if (new_cluster.controldata.chkpnt_nxtoid != FirstNormalObjectId)
+               pg_fatal("New cluster is not pristine: OIDs have been assigned since initdb (%u != %u)\n",
+                       new_cluster.controldata.chkpnt_nxtoid, FirstNormalObjectId);

On wraparound FirstNormalObjectId would be the first value we use for
nextOid. Okay, that's very unlikely going to happen, still, strictly
speaking, that could be incorrect.

I think this would be worth addressing nonetheless, for robustness. For
comparison, "cp" and "mv" will error if you give source and destination that
are the same file.

I addressed this in 002.

+       if (strcmp(make_absolute_path(old_cluster.pgdata),
+                  make_absolute_path(new_cluster.pgdata)) == 0)
+               pg_fatal("cannot upgrade a cluster on top of itself");

Shouldn't this be done after adjust_data_dir(), which is the point
where we'll know the actual data folders we are working on by querying
postgres -C data_directory?
--
Michael