pg_upgrade's interaction with pg_resetwal seems confusing

Started by Amit Kapilaover 2 years ago14 messages

amit.kapila16@gmail.com

over 2 years ago

In pg_upgrade, we reset WAL archives (remove WAL), transaction id,
etc. in copy_xact_xlog_xid() for the new cluster. Then, we create new
objects in the new cluster, and again towards the end of the upgrade
we invoke pg_resetwal with the -o option to reset the next OID. Now,
along with resetting the OID, pg_resetwal will again reset the WAL. I
am not sure if that is intentional and it may not have any problem
today except that it seems redundant to reset WAL again.

However, this can be problematic for the ongoing work to upgrade the
logical replication slots [1]https://commitfest.postgresql.org/45/4273/. We want to create/migrate logical slots
in the new cluster before calling the final pg_resetwal (which resets
the next OID) to ensure that there is no new WAL inserted by
background processes or otherwise between resetwal location and
creation of slots. So, we thought that we would compute the next WAL
location by doing the computation similar to what pg_resetwal does to
reset a new WAL location, create slots using that location, and pass
the same to pg_resetwal using the -l option. However, that doesn't
work because pg_resetwal uses the passed -l option only as a hint but
can reset the later WAL if present which can remove the WAL position
we have decided as restart_lsn (point to start reading WAL) for slots.
So, we came up with another idea that we will reset the WAL just
before creating slots and use that location to create slots and then
invent a new option in pg_resetwal where it won't reset the WAL.

Now, as mentioned in the first paragraph, it seems we anyway don't
need to reset the WAL at the end when setting the next OID for the new
cluster with the -o option. If that is true, then I think even without
slots work it will be helpful to have such an option in pg_resetwal.

Thoughts?

[1]: https://commitfest.postgresql.org/45/4273/

--
With Regards,
Amit Kapila.

Robert Haas

robertmhaas@gmail.com

over 2 years ago

In reply to: Amit Kapila (#1)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Now, as mentioned in the first paragraph, it seems we anyway don't
need to reset the WAL at the end when setting the next OID for the new
cluster with the -o option. If that is true, then I think even without
slots work it will be helpful to have such an option in pg_resetwal.

Thoughts?

I wonder if we should instead provide a way to reset the OID counter
with a function call inside the database, gated by IsBinaryUpgrade.
Having something like pg_resetwal --but-dont-actually-reset-the-wal
seems both self-contradictory and vulnerable to abuse that we might be
better off not inviting.

--
Robert Haas
EDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Robert Haas (#2)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Fri, Oct 13, 2023 at 12:00 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Now, as mentioned in the first paragraph, it seems we anyway don't
need to reset the WAL at the end when setting the next OID for the new
cluster with the -o option. If that is true, then I think even without
slots work it will be helpful to have such an option in pg_resetwal.

Thoughts?

I wonder if we should instead provide a way to reset the OID counter
with a function call inside the database, gated by IsBinaryUpgrade.

I think the challenge in doing so would be that when the server is
running, a concurrent checkpoint can also update the OID counter value
in the control file. See below code:

CreateCheckPoint()
{
...
LWLockAcquire(OidGenLock, LW_SHARED);
checkPoint.nextOid = ShmemVariableCache->nextOid;
if (!shutdown)
checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);
...
UpdateControlFile()
...
}

Now, we can try to pass some startup options like checkpoint_timeout
with a large value to ensure that checkpoint won't interfere but not
sure if that would be bulletproof. Instead, how about allowing
pg_upgrade to update the control file of the new cluster (with the
required value of OID) following the same method as pg_resetwal does
in RewriteControlFile()?

Having something like pg_resetwal --but-dont-actually-reset-the-wal
seems both self-contradictory and vulnerable to abuse that we might be
better off not inviting.

Fair point.

--
With Regards,
Amit Kapila.

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Amit Kapila (#3)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Fri, Oct 13, 2023 at 9:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 13, 2023 at 12:00 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Now, as mentioned in the first paragraph, it seems we anyway don't
need to reset the WAL at the end when setting the next OID for the new
cluster with the -o option. If that is true, then I think even without
slots work it will be helpful to have such an option in pg_resetwal.

Thoughts?

I wonder if we should instead provide a way to reset the OID counter
with a function call inside the database, gated by IsBinaryUpgrade.

I think the challenge in doing so would be that when the server is
running, a concurrent checkpoint can also update the OID counter value
in the control file. See below code:

CreateCheckPoint()
{
...
LWLockAcquire(OidGenLock, LW_SHARED);
checkPoint.nextOid = ShmemVariableCache->nextOid;
if (!shutdown)
checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);
...
UpdateControlFile()
...
}

But is this a problem? basically, we will set the
ShmemVariableCache->nextOid counter to the value that we want in the
IsBinaryUpgrade-specific function. And then the shutdown checkpoint
will flush that value to the control file and that is what we want no?
I mean instead of resetwal directly modifying the control file we
will modify that value in the server using the binary_upgrade function
and then have that value flush to the disk by shutdown checkpoint.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

over 2 years ago

In reply to: Dilip Kumar (#4)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Fri, Oct 13, 2023 at 10:37 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Oct 13, 2023 at 9:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Oct 13, 2023 at 12:00 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 12, 2023 at 7:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

Now, as mentioned in the first paragraph, it seems we anyway don't
need to reset the WAL at the end when setting the next OID for the new
cluster with the -o option. If that is true, then I think even without
slots work it will be helpful to have such an option in pg_resetwal.

Thoughts?

I wonder if we should instead provide a way to reset the OID counter
with a function call inside the database, gated by IsBinaryUpgrade.

I think the challenge in doing so would be that when the server is
running, a concurrent checkpoint can also update the OID counter value
in the control file. See below code:

CreateCheckPoint()
{
...
LWLockAcquire(OidGenLock, LW_SHARED);
checkPoint.nextOid = ShmemVariableCache->nextOid;
if (!shutdown)
checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);
...
UpdateControlFile()
...
}

But is this a problem? basically, we will set the
ShmemVariableCache->nextOid counter to the value that we want in the
IsBinaryUpgrade-specific function. And then the shutdown checkpoint
will flush that value to the control file and that is what we want no?

I think that can work. Basically, we need to do something like what
SetNextObjectId() does and then let the shutdown checkpoint update the
actual value in the control file.

I mean instead of resetwal directly modifying the control file we
will modify that value in the server using the binary_upgrade function
and then have that value flush to the disk by shutdown checkpoint.

True, the alternative to consider is to let pg_upgrade update the
control file by itself with the required value of OID. The point I am
slightly worried about doing via server-side function is that some
online and or shutdown checkpoint can update other values in the
control file as well whereas if we do via pg_upgrade, we can have
better control over just updating the OID.

--
With Regards,
Amit Kapila.

Dilip Kumar

dilipbalaut@gmail.com

over 2 years ago

In reply to: Amit Kapila (#5)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

But is this a problem? basically, we will set the
ShmemVariableCache->nextOid counter to the value that we want in the
IsBinaryUpgrade-specific function. And then the shutdown checkpoint
will flush that value to the control file and that is what we want no?

I think that can work. Basically, we need to do something like what
SetNextObjectId() does and then let the shutdown checkpoint update the
actual value in the control file.

Right.

I mean instead of resetwal directly modifying the control file we
will modify that value in the server using the binary_upgrade function
and then have that value flush to the disk by shutdown checkpoint.

True, the alternative to consider is to let pg_upgrade update the
control file by itself with the required value of OID. The point I am
slightly worried about doing via server-side function is that some
online and or shutdown checkpoint can update other values in the
control file as well whereas if we do via pg_upgrade, we can have
better control over just updating the OID.

Yeah, thats a valid point.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Amit Kapila

amit.kapila16@gmail.com

about 2 years ago

In reply to: Dilip Kumar (#6)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Fri, Oct 13, 2023 at 2:03 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:

On Fri, Oct 13, 2023 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

But is this a problem? basically, we will set the
ShmemVariableCache->nextOid counter to the value that we want in the
IsBinaryUpgrade-specific function. And then the shutdown checkpoint
will flush that value to the control file and that is what we want no?

I think that can work. Basically, we need to do something like what
SetNextObjectId() does and then let the shutdown checkpoint update the
actual value in the control file.

Right.

I mean instead of resetwal directly modifying the control file we
will modify that value in the server using the binary_upgrade function
and then have that value flush to the disk by shutdown checkpoint.

True, the alternative to consider is to let pg_upgrade update the
control file by itself with the required value of OID. The point I am
slightly worried about doing via server-side function is that some
online and or shutdown checkpoint can update other values in the
control file as well whereas if we do via pg_upgrade, we can have
better control over just updating the OID.

Yeah, thats a valid point.

But OTOH, just updating the control file via pg_upgrade may not be
sufficient because we should keep the shutdown checkpoint record also
updated with that value. See how pg_resetwal achieves it via
RewriteControlFile() and WriteEmptyXLOG(). So, considering both the
approaches it seems better to go with a server-side function as Robert
suggested.

--
With Regards,
Amit Kapila.

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Amit Kapila (#7)

1 attachment(s)

RE: pg_upgrade's interaction with pg_resetwal seems confusing

Dear hackers,

I mean instead of resetwal directly modifying the control file we
will modify that value in the server using the binary_upgrade function
and then have that value flush to the disk by shutdown checkpoint.

True, the alternative to consider is to let pg_upgrade update the
control file by itself with the required value of OID. The point I am
slightly worried about doing via server-side function is that some
online and or shutdown checkpoint can update other values in the
control file as well whereas if we do via pg_upgrade, we can have
better control over just updating the OID.

Yeah, thats a valid point.

But OTOH, just updating the control file via pg_upgrade may not be
sufficient because we should keep the shutdown checkpoint record also
updated with that value. See how pg_resetwal achieves it via
RewriteControlFile() and WriteEmptyXLOG(). So, considering both the
approaches it seems better to go with a server-side function as Robert
suggested.

Based on these discussion, I implemented a server-side approach. An attached patch
adds a upgrade function which sets ShmemVariableCache->nextOid. It is called at
the end of the upgrade. Comments and name of issue_warnings_and_set_wal_level()
is also updated because they become outdated.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

0001-pg_upgrade-use-upgrade-function-to-restore-OID.patchapplication/octet-stream; name=0001-pg_upgrade-use-upgrade-function-to-restore-OID.patchDownload

From 7b90ae99398d1876ac4b1a1a7906342e124f5312 Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 13 Oct 2023 05:34:08 +0000
Subject: [PATCH] pg_upgrade: use upgrade function to restore OID

Previously, the OID counter is restored by invoking pg_resetwal with the -o
option, at the end of upgrade. This is not problematic for now, but WAL removals
are not happy if we want to uprade logical replication slot. Therefore, a new
upgrade function is introduced to reset next OID.

Note that we only update the on-memory data to avoid concurrent update of
control with the chekpointer. It is harmless becasue the value would be set at
shutdown checkpoint.
---
 src/backend/utils/adt/pg_upgrade_support.c | 17 +++++++++++
 src/bin/pg_upgrade/check.c                 | 34 ++++++++++++++++++----
 src/bin/pg_upgrade/pg_upgrade.c            | 13 +--------
 src/bin/pg_upgrade/pg_upgrade.h            |  2 +-
 src/include/catalog/pg_proc.dat            |  4 +++
 5 files changed, 52 insertions(+), 18 deletions(-)

diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..ae3339ee31 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -261,3 +261,20 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+binary_upgrade_set_next_oid(PG_FUNCTION_ARGS)
+{
+	Oid			nextOid = PG_GETARG_OID(0);
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
+
+	ShmemVariableCache->nextOid = nextOid;
+	ShmemVariableCache->oidCount = 0;
+
+	LWLockRelease(OidGenLock);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..5824520ea2 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -245,17 +245,41 @@ report_clusters_compatible(void)
 }
 
 
+static void
+set_next_oid_to_new_cluster(Oid next_oid)
+{
+	DbInfo	   *db = &new_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	prep_status("Setting next OID for new cluster");
+
+	conn = connectToServer(&new_cluster, db->db_name);
+
+	PQclear(executeQueryOrDie(conn, "SELECT binary_upgrade_set_next_oid(%u);",
+							  next_oid));
+	PQfinish(conn);
+
+	check_ok();
+}
+
+
 void
-issue_warnings_and_set_wal_level(void)
+issue_warnings_and_set_oid(void)
 {
 	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
+	 * We unconditionally start/stop the new server because the OID counter
+	 * will be restored from the old server.
 	 */
 	start_postmaster(&new_cluster, true);
 
+	/*
+	 * Assuming OIDs are only used in system tables, there is no need to
+	 * restore the OID counter because we have not transferred any OIDs from
+	 * the old system, but we do it anyway just in case.  We do it late here
+	 * because there is no need to have the schema load use new oids.
+	 */
+	set_next_oid_to_new_cluster(old_cluster.controldata.chkpnt_nxtoid);
+
 	/* Reindex hash indexes for old < 10.0 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 		old_9_6_invalidate_hash_indexes(&new_cluster, false);
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..d28c543b75 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -175,17 +175,6 @@ main(int argc, char **argv)
 	transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
 								 old_cluster.pgdata, new_cluster.pgdata);
 
-	/*
-	 * Assuming OIDs are only used in system tables, there is no need to
-	 * restore the OID counter because we have not transferred any OIDs from
-	 * the old system, but we do it anyway just in case.  We do it late here
-	 * because there is no need to have the schema load use new oids.
-	 */
-	prep_status("Setting next OID for new cluster");
-	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -o %u \"%s\"",
-			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtoid,
-			  new_cluster.pgdata);
 	check_ok();
 
 	if (user_opts.do_sync)
@@ -201,7 +190,7 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
+	issue_warnings_and_set_oid();
 
 	pg_log(PG_REPORT,
 		   "\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..8003950a8c 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -345,7 +345,7 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
+void		issue_warnings_and_set_oid(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 72ea4aa8b8..1f0d1bddc2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,10 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_set_next_oid', provolatile => 'v',
+  proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
+  prosrc => 'binary_upgrade_set_next_oid' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#8)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Fri, 13 Oct 2023 at 17:15, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear hackers,

I mean instead of resetwal directly modifying the control file we
will modify that value in the server using the binary_upgrade function
and then have that value flush to the disk by shutdown checkpoint.

True, the alternative to consider is to let pg_upgrade update the
control file by itself with the required value of OID. The point I am
slightly worried about doing via server-side function is that some
online and or shutdown checkpoint can update other values in the
control file as well whereas if we do via pg_upgrade, we can have
better control over just updating the OID.

Yeah, thats a valid point.

But OTOH, just updating the control file via pg_upgrade may not be
sufficient because we should keep the shutdown checkpoint record also
updated with that value. See how pg_resetwal achieves it via
RewriteControlFile() and WriteEmptyXLOG(). So, considering both the
approaches it seems better to go with a server-side function as Robert
suggested.

Based on these discussion, I implemented a server-side approach. An attached patch
adds a upgrade function which sets ShmemVariableCache->nextOid. It is called at
the end of the upgrade. Comments and name of issue_warnings_and_set_wal_level()
is also updated because they become outdated.

Few comments:
1) Most of the code in binary_upgrade_set_next_oid is similar to
SetNextObjectId, but SetNextObjectId has the error handling to report
an error if an invalid nextOid is specified:
if (ShmemVariableCache->nextOid > nextOid)
elog(ERROR, "too late to advance OID counter to %u, it is now %u",
nextOid, ShmemVariableCache->nextOid);

Is this check been left out from binary_upgrade_set_next_oid function
intentionally? Have you left this because it could be like a dead
code. If so, should we have an assert for this here?

2) How about changing issue_warnings_and_set_oid function name to
issue_warnings_and_set_next_oid?
void
-issue_warnings_and_set_wal_level(void)
+issue_warnings_and_set_oid(void)
{

3) We have removed these comments, is there any change to the rsync
instructions? If so we could update the comments accordingly.
- * We unconditionally start/stop the new server because
pg_resetwal -o set
- * wal_level to 'minimum'. If the user is upgrading standby
servers using
- * the rsync instructions, they will need pg_upgrade to write its final
- * WAL record showing wal_level as 'replica'.

Regards,
Vignesh

#10

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: vignesh C (#9)

1 attachment(s)

RE: pg_upgrade's interaction with pg_resetwal seems confusing

Dear Vignesh,

Thank you for reviewing! PSA new version.

Few comments:
1) Most of the code in binary_upgrade_set_next_oid is similar to
SetNextObjectId, but SetNextObjectId has the error handling to report
an error if an invalid nextOid is specified:
if (ShmemVariableCache->nextOid > nextOid)
elog(ERROR, "too late to advance OID counter to %u, it is now %u",
nextOid, ShmemVariableCache->nextOid);

Is this check been left out from binary_upgrade_set_next_oid function
intentionally? Have you left this because it could be like a dead
code. If so, should we have an assert for this here?

Yeah, they were removed intentionally, but I did rethink that they could be
combined. ereport() would be skipped during the upgrade mode. Thought?

Regarding the first ereport(ERROR), it just requires that we are doing initdb.

As for the second ereport(ERROR), it requires that next OID is not rollbacked.
The restriction seems OK during the initialization, but it is not appropriate for
upgrading: wraparound of OID counter might be occurred on old cluster but we try
to restore the counter anyway.

2) How about changing issue_warnings_and_set_oid function name to
issue_warnings_and_set_next_oid?
void
-issue_warnings_and_set_wal_level(void)
+issue_warnings_and_set_oid(void)
{

Fixed.

3) We have removed these comments, is there any change to the rsync
instructions? If so we could update the comments accordingly.
- * We unconditionally start/stop the new server because
pg_resetwal -o set
- * wal_level to 'minimum'. If the user is upgrading standby
servers using
- * the rsync instructions, they will need pg_upgrade to write its final
- * WAL record showing wal_level as 'replica'.

Hmm, I thought comments for rsync seemed outdated so that removed. I still think
this is not needed. Since controlfile->wal_level is not updated to 'minimal'
anymore, the unconditional startup is not required for physical standby.

[1]: https://www.postgresql.org/docs/devel/pgupgrade.html#:~:text=the%20next%20step.-,Run%20rsync,-When%20using%20link

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v2-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patchapplication/octet-stream; name=v2-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patchDownload

From a63cc9c2e6bbd95f753794237489b5dc5e36b40d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 13 Oct 2023 05:34:08 +0000
Subject: [PATCH v2] pg_upgrade: use upgrade function to restore OID

Previously, the OID counter is restored by invoking pg_resetwal with the -o
option, at the end of upgrade. This is not problematic for now, but WAL removals
are not happy if we want to uprade logical replication slot. Therefore, a new
upgrade function is introduced to reset next OID.

Note that we only update the on-memory data to avoid concurrent update of
control with the chekpointer. It is harmless becasue the value would be set at
shutdown checkpoint.
---
 src/backend/access/transam/varsup.c        | 12 ++++----
 src/backend/utils/adt/pg_upgrade_support.c | 12 ++++++++
 src/bin/pg_upgrade/check.c                 | 34 ++++++++++++++++++----
 src/bin/pg_upgrade/pg_upgrade.c            | 13 +--------
 src/bin/pg_upgrade/pg_upgrade.h            |  2 +-
 src/include/access/transam.h               |  1 +
 src/include/catalog/pg_proc.dat            |  4 +++
 7 files changed, 54 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index 334adac09e..d18ef4c7cf 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -589,20 +589,20 @@ GetNewObjectId(void)
 /*
  * SetNextObjectId
  *
- * This may only be called during initdb; it advances the OID counter
- * to the specified value.
+ * This may only be called during initdb or pg_upgrade; it advances the OID
+ * counter to the specified value.
  */
-static void
+void
 SetNextObjectId(Oid nextOid)
 {
-	/* Safety check, this is only allowable during initdb */
-	if (IsPostmasterEnvironment)
+	/* Safety check, this is only allowable during initdb or pg_upgrade */
+	if (!IsBinaryUpgrade && IsPostmasterEnvironment)
 		elog(ERROR, "cannot advance OID counter anymore");
 
 	/* Taking the lock is, therefore, just pro forma; but do it anyway */
 	LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
 
-	if (ShmemVariableCache->nextOid > nextOid)
+	if (!IsBinaryUpgrade && ShmemVariableCache->nextOid > nextOid)
 		elog(ERROR, "too late to advance OID counter to %u, it is now %u",
 			 nextOid, ShmemVariableCache->nextOid);
 
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..0695564ea4 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -261,3 +261,15 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+binary_upgrade_set_next_oid(PG_FUNCTION_ARGS)
+{
+	Oid			nextOid = PG_GETARG_OID(0);
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	SetNextObjectId(nextOid);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..9aa0051941 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -245,17 +245,41 @@ report_clusters_compatible(void)
 }
 
 
+static void
+set_next_oid_to_new_cluster(Oid next_oid)
+{
+	DbInfo	   *db = &new_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	prep_status("Setting next OID for new cluster");
+
+	conn = connectToServer(&new_cluster, db->db_name);
+
+	PQclear(executeQueryOrDie(conn, "SELECT binary_upgrade_set_next_oid(%u);",
+							  next_oid));
+	PQfinish(conn);
+
+	check_ok();
+}
+
+
 void
-issue_warnings_and_set_wal_level(void)
+issue_warnings_and_set_next_oid(void)
 {
 	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
+	 * We unconditionally start/stop the new server because the OID counter
+	 * will be restored from the old server.
 	 */
 	start_postmaster(&new_cluster, true);
 
+	/*
+	 * Assuming OIDs are only used in system tables, there is no need to
+	 * restore the OID counter because we have not transferred any OIDs from
+	 * the old system, but we do it anyway just in case.  We do it late here
+	 * because there is no need to have the schema load use new oids.
+	 */
+	set_next_oid_to_new_cluster(old_cluster.controldata.chkpnt_nxtoid);
+
 	/* Reindex hash indexes for old < 10.0 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 		old_9_6_invalidate_hash_indexes(&new_cluster, false);
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..f406306139 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -175,17 +175,6 @@ main(int argc, char **argv)
 	transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
 								 old_cluster.pgdata, new_cluster.pgdata);
 
-	/*
-	 * Assuming OIDs are only used in system tables, there is no need to
-	 * restore the OID counter because we have not transferred any OIDs from
-	 * the old system, but we do it anyway just in case.  We do it late here
-	 * because there is no need to have the schema load use new oids.
-	 */
-	prep_status("Setting next OID for new cluster");
-	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -o %u \"%s\"",
-			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtoid,
-			  new_cluster.pgdata);
 	check_ok();
 
 	if (user_opts.do_sync)
@@ -201,7 +190,7 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
+	issue_warnings_and_set_next_oid();
 
 	pg_log(PG_REPORT,
 		   "\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..cb924c7be7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -345,7 +345,7 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
+void		issue_warnings_and_set_next_oid(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d3055..33d329ccce 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -294,6 +294,7 @@ extern void AdvanceOldestClogXid(TransactionId oldest_datfrozenxid);
 extern bool ForceTransactionIdLimitUpdate(void);
 extern Oid	GetNewObjectId(void);
 extern void StopGeneratingPinnedObjectIds(void);
+extern void SetNextObjectId(Oid nextOid);
 
 #ifdef USE_ASSERT_CHECKING
 extern void AssertTransactionIdInAllowableRange(TransactionId xid);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 72ea4aa8b8..1f0d1bddc2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,10 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_set_next_oid', provolatile => 'v',
+  proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
+  prosrc => 'binary_upgrade_set_next_oid' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

#11

Alvaro Herrera

alvherre@alvh.no-ip.org

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#10)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

Note that this patch falsifies the comment in SetNextObjectId that
taking the lock is pro forma only -- it no longer is, since in upgrade
mode there can be multiple subprocesses running -- so I think it should
be updated.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#12

vignesh C

vignesh21@gmail.com

about 2 years ago

In reply to: Hayato Kuroda (Fujitsu) (#10)

Re: pg_upgrade's interaction with pg_resetwal seems confusing

On Mon, 16 Oct 2023 at 10:33, Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:

Dear Vignesh,

Thank you for reviewing! PSA new version.

Few comments:
1) Most of the code in binary_upgrade_set_next_oid is similar to
SetNextObjectId, but SetNextObjectId has the error handling to report
an error if an invalid nextOid is specified:
if (ShmemVariableCache->nextOid > nextOid)
elog(ERROR, "too late to advance OID counter to %u, it is now %u",
nextOid, ShmemVariableCache->nextOid);

Is this check been left out from binary_upgrade_set_next_oid function
intentionally? Have you left this because it could be like a dead
code. If so, should we have an assert for this here?

Yeah, they were removed intentionally, but I did rethink that they could be
combined. ereport() would be skipped during the upgrade mode. Thought?

Regarding the first ereport(ERROR), it just requires that we are doing initdb.

As for the second ereport(ERROR), it requires that next OID is not rollbacked.
The restriction seems OK during the initialization, but it is not appropriate for
upgrading: wraparound of OID counter might be occurred on old cluster but we try
to restore the counter anyway.

2) How about changing issue_warnings_and_set_oid function name to
issue_warnings_and_set_next_oid?
void
-issue_warnings_and_set_wal_level(void)
+issue_warnings_and_set_oid(void)
{

Fixed.

3) We have removed these comments, is there any change to the rsync
instructions? If so we could update the comments accordingly.
- * We unconditionally start/stop the new server because
pg_resetwal -o set
- * wal_level to 'minimum'. If the user is upgrading standby
servers using
- * the rsync instructions, they will need pg_upgrade to write its final
- * WAL record showing wal_level as 'replica'.

Hmm, I thought comments for rsync seemed outdated so that removed. I still think
this is not needed. Since controlfile->wal_level is not updated to 'minimal'
anymore, the unconditional startup is not required for physical standby.

We can update the commit message with the details of the same, it will
help to understand that it is intentionally done.

There are couple of typos with the new patch:
1) "uprade logical replication slot" should be "upgrade logical
replication slot":
Previously, the OID counter is restored by invoking pg_resetwal with the -o
option, at the end of upgrade. This is not problematic for now, but WAL removals
are not happy if we want to uprade logical replication slot. Therefore, a new
upgrade function is introduced to reset next OID.
2) "becasue the value" should be "because the value":
Note that we only update the on-memory data to avoid concurrent update of
control with the chekpointer. It is harmless becasue the value would be set at
shutdown checkpoint.

Regards,
Vignesh

#13

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: Alvaro Herrera (#11)

1 attachment(s)

RE: pg_upgrade's interaction with pg_resetwal seems confusing

Dear Alvaro,

Thank you for updating! PSA new version.

Note that this patch falsifies the comment in SetNextObjectId that
taking the lock is pro forma only -- it no longer is, since in upgrade
mode there can be multiple subprocesses running -- so I think it should
be updated.

Indeed, some comments were updated.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachments:

v3-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patchapplication/octet-stream; name=v3-0001-pg_upgrade-use-upgrade-function-to-restore-OID.patchDownload

From d5e22cdf3d6c8187562b583ae54bd2844b0df16d Mon Sep 17 00:00:00 2001
From: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Date: Fri, 13 Oct 2023 05:34:08 +0000
Subject: [PATCH v3] pg_upgrade: use upgrade function to restore OID

Previously, the OID counter is restored by invoking pg_resetwal with the -o
option, at the end of upgrade. This is not problematic for now, but WAL removals
are not happy if we want to upgrade logical replication slot.

Based on above, this commit adds a new upgrade function to reset next OID.
SetNextObjectId() has a ratchet mechanism to avoid a rollback of OIDs, but it is
ignored during the upgrade: wraparound of the OID counter might occur on the old
cluster, but pg_upgrade tries to restore the counter to the new cluster anyway.

Also, an obsolated comment was removed. Controlfile will not be updated so that
we do not have to take care anything for primary server.

Note that we only update the on-memory data to avoid concurrent update of
control with the chekpointer. It is harmless because the value would be set at
shutdown checkpoint.
---
 src/backend/access/transam/varsup.c        | 23 ++++++++++-----
 src/backend/utils/adt/pg_upgrade_support.c | 12 ++++++++
 src/bin/pg_upgrade/check.c                 | 34 ++++++++++++++++++----
 src/bin/pg_upgrade/pg_upgrade.c            | 13 +--------
 src/bin/pg_upgrade/pg_upgrade.h            |  2 +-
 src/include/access/transam.h               |  1 +
 src/include/catalog/pg_proc.dat            |  4 +++
 7 files changed, 64 insertions(+), 25 deletions(-)

diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index abfee48317..6e5013eea3 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -589,20 +589,29 @@ GetNewObjectId(void)
 /*
  * SetNextObjectId
  *
- * This may only be called during initdb; it advances the OID counter
- * to the specified value.
+ * This may only be called during initdb or pg_upgrade; it advances the OID
+ * counter to the specified value.
  */
-static void
+void
 SetNextObjectId(Oid nextOid)
 {
-	/* Safety check, this is only allowable during initdb */
-	if (IsPostmasterEnvironment)
+	/* Safety check, this is only allowable during initdb or pg_upgrade */
+	if (!IsBinaryUpgrade && IsPostmasterEnvironment)
 		elog(ERROR, "cannot advance OID counter anymore");
 
-	/* Taking the lock is, therefore, just pro forma; but do it anyway */
+	/*
+	 * Taking the lock. This may be not needed during the initdb, but do it
+	 * anyway.
+	 */
 	LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
 
-	if (ShmemVariableCache->nextOid > nextOid)
+	/*
+	 * Make sure that the input is greater than the current next OID. This must
+	 * sbe skipped during the pg_upgrade: wraparound of the OID counter might
+	 * occur on the old cluster, but pg_upgrade tries to restore the counter to
+	 * the new cluster anyway.
+	 */
+	if (!IsBinaryUpgrade && ShmemVariableCache->nextOid > nextOid)
 		elog(ERROR, "too late to advance OID counter to %u, it is now %u",
 			 nextOid, ShmemVariableCache->nextOid);
 
diff --git a/src/backend/utils/adt/pg_upgrade_support.c b/src/backend/utils/adt/pg_upgrade_support.c
index 0186636d9f..0695564ea4 100644
--- a/src/backend/utils/adt/pg_upgrade_support.c
+++ b/src/backend/utils/adt/pg_upgrade_support.c
@@ -261,3 +261,15 @@ binary_upgrade_set_missing_value(PG_FUNCTION_ARGS)
 
 	PG_RETURN_VOID();
 }
+
+Datum
+binary_upgrade_set_next_oid(PG_FUNCTION_ARGS)
+{
+	Oid			nextOid = PG_GETARG_OID(0);
+
+	CHECK_IS_BINARY_UPGRADE;
+
+	SetNextObjectId(nextOid);
+
+	PG_RETURN_VOID();
+}
diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 21a0ff9e42..9aa0051941 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -245,17 +245,41 @@ report_clusters_compatible(void)
 }
 
 
+static void
+set_next_oid_to_new_cluster(Oid next_oid)
+{
+	DbInfo	   *db = &new_cluster.dbarr.dbs[0];
+	PGconn	   *conn;
+
+	prep_status("Setting next OID for new cluster");
+
+	conn = connectToServer(&new_cluster, db->db_name);
+
+	PQclear(executeQueryOrDie(conn, "SELECT binary_upgrade_set_next_oid(%u);",
+							  next_oid));
+	PQfinish(conn);
+
+	check_ok();
+}
+
+
 void
-issue_warnings_and_set_wal_level(void)
+issue_warnings_and_set_next_oid(void)
 {
 	/*
-	 * We unconditionally start/stop the new server because pg_resetwal -o set
-	 * wal_level to 'minimum'.  If the user is upgrading standby servers using
-	 * the rsync instructions, they will need pg_upgrade to write its final
-	 * WAL record showing wal_level as 'replica'.
+	 * We unconditionally start/stop the new server because the OID counter
+	 * will be restored from the old server.
 	 */
 	start_postmaster(&new_cluster, true);
 
+	/*
+	 * Assuming OIDs are only used in system tables, there is no need to
+	 * restore the OID counter because we have not transferred any OIDs from
+	 * the old system, but we do it anyway just in case.  We do it late here
+	 * because there is no need to have the schema load use new oids.
+	 */
+	set_next_oid_to_new_cluster(old_cluster.controldata.chkpnt_nxtoid);
+
 	/* Reindex hash indexes for old < 10.0 */
 	if (GET_MAJOR_VERSION(old_cluster.major_version) <= 906)
 		old_9_6_invalidate_hash_indexes(&new_cluster, false);
diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
index 96bfb67167..f406306139 100644
--- a/src/bin/pg_upgrade/pg_upgrade.c
+++ b/src/bin/pg_upgrade/pg_upgrade.c
@@ -175,17 +175,6 @@ main(int argc, char **argv)
 	transfer_all_new_tablespaces(&old_cluster.dbarr, &new_cluster.dbarr,
 								 old_cluster.pgdata, new_cluster.pgdata);
 
-	/*
-	 * Assuming OIDs are only used in system tables, there is no need to
-	 * restore the OID counter because we have not transferred any OIDs from
-	 * the old system, but we do it anyway just in case.  We do it late here
-	 * because there is no need to have the schema load use new oids.
-	 */
-	prep_status("Setting next OID for new cluster");
-	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
-			  "\"%s/pg_resetwal\" -o %u \"%s\"",
-			  new_cluster.bindir, old_cluster.controldata.chkpnt_nxtoid,
-			  new_cluster.pgdata);
 	check_ok();
 
 	if (user_opts.do_sync)
@@ -201,7 +190,7 @@ main(int argc, char **argv)
 
 	create_script_for_old_cluster_deletion(&deletion_script_file_name);
 
-	issue_warnings_and_set_wal_level();
+	issue_warnings_and_set_next_oid();
 
 	pg_log(PG_REPORT,
 		   "\n"
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 842f3b6cd3..cb924c7be7 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -345,7 +345,7 @@ void		output_check_banner(bool live_check);
 void		check_and_dump_old_cluster(bool live_check);
 void		check_new_cluster(void);
 void		report_clusters_compatible(void);
-void		issue_warnings_and_set_wal_level(void);
+void		issue_warnings_and_set_next_oid(void);
 void		output_completion_banner(char *deletion_script_file_name);
 void		check_cluster_versions(void);
 void		check_cluster_compatibility(bool live_check);
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index f5af6d3055..33d329ccce 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -294,6 +294,7 @@ extern void AdvanceOldestClogXid(TransactionId oldest_datfrozenxid);
 extern bool ForceTransactionIdLimitUpdate(void);
 extern Oid	GetNewObjectId(void);
 extern void StopGeneratingPinnedObjectIds(void);
+extern void SetNextObjectId(Oid nextOid);
 
 #ifdef USE_ASSERT_CHECKING
 extern void AssertTransactionIdInAllowableRange(TransactionId xid);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c92d0631a0..7b82c3467d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -11379,6 +11379,10 @@
   proname => 'binary_upgrade_set_next_pg_tablespace_oid', provolatile => 'v',
   proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
   prosrc => 'binary_upgrade_set_next_pg_tablespace_oid' },
+{ oid => '8046', descr => 'for use by pg_upgrade',
+  proname => 'binary_upgrade_set_next_oid', provolatile => 'v',
+  proparallel => 'u', prorettype => 'void', proargtypes => 'oid',
+  prosrc => 'binary_upgrade_set_next_oid' },
 
 # conversion functions
 { oid => '4302',
-- 
2.27.0

#14

Hayato Kuroda (Fujitsu)

kuroda.hayato@fujitsu.com

about 2 years ago

In reply to: vignesh C (#12)

RE: pg_upgrade's interaction with pg_resetwal seems confusing

Dear Vignesh,

Thank you for reviewing! New patch can be available in [1]/messages/by-id/TYAPR01MB5866DAFE000F8677C49ACD66F5D8A@TYAPR01MB5866.jpnprd01.prod.outlook.com.

We can update the commit message with the details of the same, it will
help to understand that it is intentionally done.

Both comments and a commit message were updated related.

There are couple of typos with the new patch:
1) "uprade logical replication slot" should be "upgrade logical
replication slot":
Previously, the OID counter is restored by invoking pg_resetwal with the -o
option, at the end of upgrade. This is not problematic for now, but WAL removals
are not happy if we want to uprade logical replication slot. Therefore, a new
upgrade function is introduced to reset next OID.

Fixed.

2) "becasue the value" should be "because the value":
Note that we only update the on-memory data to avoid concurrent update of
control with the chekpointer. It is harmless becasue the value would be set at
shutdown checkpoint.

Fixed.

[1]: /messages/by-id/TYAPR01MB5866DAFE000F8677C49ACD66F5D8A@TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED